Database Essay

profileAVI231
SQL_vs_NoSQL_BigData.pdf

I.J. Information Technology and Computer Science, 2016, 12, 59-66

Published Online December 2016 in MECS (http://www.mecs -press.org/)

DOI: 10.5815/ijitcs.2016.12.07

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

SQL Versus NoSQL Movement with Big Data

Analytics

Sitalaks hmi Ve nkatraman School of Engineering, Construction and Design (IT), Melbourne Polytechnic, VIC 3072, Australia

E-mail: [email protected]

Kiran Fahd, Samue l Kas pi School of Engineering, Construction and Design (IT), Melbourne Polytechnic, VIC 3072, Australia

E-mail: [email protected], [email protected]

Ramanathan Ve nkatraman

National University of Singapore, Singapore

E-mail: [email protected]

Abstract—Two ma in revolutions in data manage ment

have occurred recently, name ly Big Data analytics and

NoSQL databases. Even though they have evolved with

diffe rent purposes, their independent developments

comple ment each other and their convergence would

benefit businesses tremendously in ma king real-t ime

decisions using volumes of co mple x data sets that could

be both structured and unstructured. While on one hand

many software solutions have emerged in supporting Big

Data analytics, on the other, many NoSQL database

packages have arrived in the market. However, they lack

an independent benchmarking and co mparat ive

evaluation. The a im of this paper is to provide an

understanding of their contexts and an in -depth study to

compare the features of four ma in NoSQL data models

that have evolved. The performance compa rison of

traditional SQL with No SQL databases for Big Data

analytics shows that NoSQL database poses to be a better

option for business situations that require simplicity,

adaptability, high performance analytics and distributed

scalability of large data. This paper concludes that the

NoSQL move ment should be leveraged for Big Data

analytics and would coe xist with re lational (SQL)

databases .

Index Terms—Structured Query Language (SQL), Non

SQL (NoSQL), Big Data, Big Data Analytics, Re lational

Database, SQL Database, NoSQL Database.

I. INT RODUCT ION

As the technology environment transforms and faces

new challenges, businesses increasingly realize the need

to evaluate new approaches and databases to ma nage

their data to support changing business requirements and

growing co mple xity and e xpansion of their applicat ions

[1]. Re lational database has been the default choice for

data model adoption in businesses worldwide over the

past thirty years with Structured Query Language (SQL)

as the standard language designed to perform the basic

data operations. However, with the e xplosion of data

volume, SQL-based data querying lose effic iency, and in

particular, managing larger databases has become a ma jor

challenge [2]. In addit ion, re lational databases exh ibit a

variety of limitations in meeting the recent Big Data

analytics require ment in businesses. While clusters -based

architecture has emerged as a solution for la rge databases,

SQL is not designed to suit clusters and this mis match

has led to think of alternate solutions. There are

mis matches between persistent data model and in -

me mo ry data structures, and servers based on SQL

standards are now prone to me mory footprint, security

risks and performance issues.

NoSQL (Non SQL) databases with a set of new data

manage ment features, on the other hand, are more

fle xib le and horizontally scalable. They a re considered as

alternatives to overcome the limitations of the current

SQL-dominated persistence landscape and hence they are

also known as non-relational databases [3]. The main

goal for the NoSQL move ment is to allo w easy storage

and retrieval of data, regardless of its structure and

content, which is possible due to the non -existence of a

rig id data structure in non-relat ional databases. NoSQL

databases exhib it horizontal scalability by taking

advantage of new clusters and several low -cost servers. In

addition, they are envisaged to automatically manage data

administration including fault recovery and these

capabilit ies would result in huge cost savings. Though

non-relational databases are providing different features

and advantages, they were init ially characterised by lack

of data consistency and non-ability to query stored

records using SQL. With the e mergence of NoSQL

databases new features and optimisation characteristics

are evolving to overcome these limitations as well.

However, their total capabilities are still not disclosed [4].

Also, due to the increasing differences in NoSQL

database offerings and their non-standard features,

businesses are not clear on what is the stand to take.

In this paper, we first provide an overview of the

60 SQL Versus NoSQL M ovement with Big Data Analytics

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

present context o f Big Data ana lytics and NoSQL

databases. Ne xt, we discuss the four ma in data models of

non-relational databases and compare them with SQL

databases. There are a variety of No SQL databases and

which one is more appropriate for wh ich business

operation re ma ins an unanswered question so far. We

compare the different data models of NoSQL in terms of

their features and the NoSQL databases available in the

ma rket that support those features. The different data

man ipulation mechanis ms and optimisation techniques

adopted by NoSQL databases could result in their

diffe rence in performance. We discuss how these factors

play a ma jor ro le in Big Data analytics and identify the

associated challenges. We also consider the coexistence

of NoSQL databases with re lational databases and discuss

their relevance in different business contexts.

II. RELAT ED WORK: T HE CONT EXT OF NOSQL

DAT ABASES WIT H BIG DAT A ANALYT ICS

Fro m the recent trends reported in literature [5][6], it is

evident that in today's context, there is an e xponential

growth of data volume that are structured as well as

unstructured (Big Data) fro m a variety of data sources,

such as social media, e -ma ils, te xt documents, GPS data,

sensor data, surveillance data, etc. with increasing

Internet usage. Hence, we can say that Big Data is

characterised by structured, semi-structured, and

unstructured data collected fro m digita l and non-digital

resources. The ma in cha llenge is the effective use of this

Big Data that represents the data source for effic ient

decision-ma king by adopting suitable data min ing

techniques [7][8].

Based on our literature survey, we have identified that

the current challenges presented by Big Data are due to

the following general characteristics e xperienced by

businesses:

 High data Veloc ity – rapid ly and continuously

updated data streams fro m d ifferent sources and

locations.

 Data Va riety – structured, semi-structured and unstructured data storage.

 Data Vo lu me – huge nu mber of datasets with sizes of several terabytes or petabytes.

 Data Co mp le xity – data organized in several

different locations or data centres.

It is important for businesses to perform Big Data

analytics, which is the process of e xa min ing la rge data

sets containing a variety of data types. Using Big Data

Analytics, businesses are able to a rrive at more accurate

analysis of huge a mounts of data to uncover hidden

patterns, unknown correlations, market trends, customer

preferences and other useful business information [2][9].

In order to support timely and effect ive decision ma king,

Big Data analytics re lies on large volu mes of data that

requires clusters for data storage. However, sinc e

relational databases are not designed for clusters, and

e xhibit performance issues with regard to Big Data

analytics, businesses are considering the need for the

NoSQL movement [10].

The schema of NoSQL is not fixed. It uses varied

interfaces to store and analyse sheer volume of user-

generated content, personal data and spatial data being

generated by modern applications, clou d computing and

smart devices. [1][11].

In this context, NoSQL database presents a preferred

solution than SQL database primarily for its ability to

cater to the horizontal partitioning of data, fle xib le data

processing and improved performance. Large Internet

companies (Facebook, Lin kedIn, A ma zon and Google),

which cannot process services by using existing re lational

databases, had researched and led to the advent of

NoSQL to solve their proble m of dealing with

continuously increasing data, optimised data utilizat ion

and horizontal scalability of large data. No SQL databases

are a better option for the information systems that

require h igh performance and dynamic scalability more

than the requirements of reliability, highly distributed

nature of the three-tier Internet architecture systems and

cloud computing [1][3][11]. There fore, it is necessary to

investigate further and compare SQ L versus NoSQL as

well as the salient differences in the performance of

NoSQL data models in supporting the necessary features

for Big Data analytics. This paper presents these

investigations and findings in today's Big Data context.

III. NOSQL DAT A MODELS

There are many NoSQL databases available, however,

they fall under four data models described below

[3][11][12]. Each category has its own specific attributes

but there are cross overs between the different data

models. Generally, a ll NoSQL databases are built to be

distributed and scaled horizontally.

Key-Va lue Store Database – Key-Va lue store is a

simp le but effic ient and powerful NoSQL database. The

data is stored in two parts, a string that represents the key

and the actual data that represents the value, thus creating

a “key-value” pair. This results in values being indexed

by keys for retrieval, a concept simila r to hash tables. In

other words, the store allo ws the user to request the

values according to the key specified. It can handle

structured or unstructured data. It offers high concurrency

and scalability as we ll as rap id lookups, but little

consistency.

Such Key-Value store databases can be used to

develop forums and online shopping carts and websites

where user sessions are required to be stored. So me

notable exa mp les are A mazon‟s Dyan moDB, Apache‟s

Cassandra, Azure Table Storage (AT S), Orac le Berke ley

DB, and Basho Technologies‟ Ria k. A ma zon offers fu lly

managed No SQL store service DynamoDB for the

purpose of internet scale applications . It is a distributed

key-value storage system wh ich provides fast, reliable

and cost-effective data access and high availability and

durability due to its replica feature.

One of the advantages of Key-Value store database is

its high insert/read rates compared to traditional SQL

SQL Versus NoSQL M ovement with Big Data Analytics 61

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

database. This is achieved by saving more than one entry

to the store as shown in the example below:

@db.bulk_save([

{"hot" => "and spicy"},

{"cold" => "yet loving"},

{"other" => ["set","of","keys"]}

])

Colu mn Oriented (o r wide -colu mn) Store Databases –

In colu mn store databases, columns are defined for each

row instead of being predefined by the table structure

having uniform sized co lu mns for each row. Such stores

have a two-level aggregate structure, a key and a row

aggregate, which is a group of columns. Any column can

be added to any row, and rows can have very different

columns. In other words, each row has diffe rent

number of colu mns that are stored. It can also store data

tables as sections of columns of data. Data can be vie wed

as either row-oriented where each row is an aggregate, or

column-o riented where each colu mn fa mily defines a

record type. Each key is associated with one or more

columns and a key for each colu mn fa mily is used for

rapid data retrieval with less I/O activity thereby offering

very high performance. These databases provide high

scalability as they store data in highly distributed

architectures.

Wide-colu mn databases is ideal to be used for data

mining and analytic applicat ions with Big Data.

Exa mples of some colu mn-oriented store providers are

Facebook‟s high-performance Cassandra, Apache Hbase,

Google ‟s Big Table and HyperTable. Google ‟s Big Table

is high performance wide-colu mn database that can deal

with vast amount of data. It is developed on Google File

System GFS using C/C++. It is used by multip le Google

applications like YouTube and Gma il that have varied

latency demand of the database. It is not distributed

outside Google besides the usage inside Google's App

Engine. Big Tab le is designed for easy scalability across

thousands of machines, thus, it is tolerant to hardware

failures.

Document Store Databases – Document database

e xtends the basic key-value database concept and stores

comple x data in document form such as XML, PDF or

JSON documents. A document store is typically schema-

less where each document can contain different fie lds of

any length. Documents are accessed or identified by

using a unique key wh ich may be simple string, URI

string or path string. Docu ment databases are more

comple x databases but offer h igh performance, horizontal

scalability and schema fle xib ility which a llo w storing

virtually any structure required by any application.

Document oriented databases are suitable for content

manage ment systems and blog application s. So me

e xa mples of providers using document oriented databases

are 10gen‟s MongoDB, Apache CouchDB, Basho

Technologies‟ Ria k, Azure 's Docu mentDB and AWS

Dynamo DB. MongoDB is developed by 10gen using

C++ and is a structure free, c ross -platform document

oriented database. It uses Grid File System to store large

files such as images and videos in BSON (Binary JSON)

format. It prov ides effic ient performance, h igh

consistency and high persistence but it is not very reliable

and is resource hungry.

Graph Store – Graph database focuses on relationships

between data. It uses the graph theory approach to store

the data and optimises the search by using index free

adjacency technique. It is designed for data whose

relationships are we ll represented by graph structures

consisting of nodes, edges and properties. A node

represents an object (an entity in the database), an edge

describes the relationship between the objects and the

property is the node on the other end of the relationship.

In inde x free adjacency technique, each node consists of a

pointer which directly points to the adjacent node as

shown in Fig. 1.

These stores provide fast performance, A CID

compliance and rollback support. These databases are

suitable to develop social-networking applications,

bioinformat ics applications, content manage ment systems

and cloud management services. Exa mp les of notable

Graph databases are Neo Technology‟s Neo4j , Orient

DB, Apache Giraph and Titan.

Apache Giraph is an open source large-scale graph

processing system and imp le mentation of Google Pregel

(a graph processing architecture which has vertex-centric

approach). It is designed for high scalability to overcome

the crucial need for scalable platforms and parallel

architectures that can process the bulk data p roduced by

modern applications such as social networks and

knowledge bases. For e xa mp le, it is currently used at

Facebook, Lin kedIn and Twitter to analyse the graph

formed by users and their connections. Giraph is a

distributed and fault-tolerant system and offers features

such as, master co mputation, sharded aggregators, edge -

oriented input and out-of-core computation.

Fig.1. Graph algorit hm.

IV. HIGH LEVEL COMP ARISON BET WEEN NOSQL AND SQL

DAT ABASES

Based on the features of each type of database recently

reported in the literature [1][3][11][13], we performed a

high level co mparison between SQL (re lational) and

NoSQL (non-re lational) databases and the summary of

findings is given in Table 1.

We considered aspects such as database type, schema,

data model used, scaling model availab le, transactional

capabilit ies, data man ipulation method used, and popular

62 SQL Versus NoSQL M ovement with Big Data Analytics

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

database software available in the market in order to

compare SQL databases versus NoSQL databases. Some

e xa mples are a lso given in Table 1 for a better

understanding of their diffe rences . Overall, Tab le 1

provides the high level diffe rences in the key features and

properties exhib ited by relational and non -relational

databases, which would support businesses in making

decisions about using SQL or NoSQL database options in

various Big Data application scenarios .

T able 1. Relat ional Versus NoSQL Dat abases - High Level Differences

Relational Databases NoSQL Databases

Data base

Type

One SQL DBMS product

(marginal variations)

Four general types: key- value, document and wide-

column and graph st ores

S ch ema

 Based on pre-defined foreign-key relationships bet ween t ables in an explicit dat abase schema

 St rict definit ion of schema and dat a t ypes is required before insert ing dat a

 Any updat e alt ers t he

ent ire dat abase.

 Dynamic db schema  Do not force schema

definit ion in advance  Different dat a can be

st ored t ogether as required

 Allows modificat ion of t he schema freely wit h

no downt ime.

Data

Mode l s

 Dat a records are st ored as row and columns in different tables joined via

relat ionships  Explicit defined dat a types

of columns t o store a specific piece of dat a

 For example, SQL engine joins t wo separate t ables t he "employees" and "depart ments" t ogether to

find out t he department of an employee.

 Support s all t ypes of dat a – st ruct ured, semi- st ruct ured, and

unst ruct ured  Different products offer

different and flexible dat a models. For

example:. Document st ore t ype organizes all relat ed dat a using references and

embedded document s t ools.

S cal i n g

Mode l

 Vert ical Scaling  Dat a resides on a single

node and capacit y is added t o exist ing resources(data st orage or I/O capacity)

 Horizontal Scaling  Modern approach of

part it ioning of t he dat a across addit ional servers or cloud inst ances as required.

Tran s -

acti on

C apab-

i l i ti e s

 Based on ACID t ransact ional propert ies, such as atomicity, consistency, isolat ion,

durabilit y to ensure high dat a reliabilit y and dat a int egrit y.

 At omic t ransactions

 Degrade t he performance

 Support s AID t ransactions and CAP T heorem of dist ributed syst ems supports

consist ency of dat a across all nodes of a NoSQL dat abase

 t here is at omicity at t he

single document .

Data

Mani pul

ati on

 St ruct ured Query Language – SQL DML

St atement s are used to manipulat e dat a e.g.

SELECT cust omer_name

FROM cust omers WHERE cust omer_age>18;

 Query dat a efficiently.  Object - Oriented APIs

are used e.g. db.cust omers.find( {cust omer_age: {$gt :

18 }} { cust omer_name:1 })

S oftware Oracle, MySQL, DB2, SQLServer

 Mongodb, Riak,

Couchbase, Ret hinkdb, Redis, Aerospike, Leveldb, Hbase, Cassandra, Neo4j,

Elast icsearch, Lucene

V. PERFORMANCE OF NOSQL AND SQL DAT ABASES FOR

BIG DAT A ANALYT ICS

The most important reason in moving towards NoSQL

fro m re lational database is due to require ments of

performance imp rovements. Choi et al. [1] found that a

NoSQL database such as MongoDB provided mo re stable

and faster performance at the e xpense of data consistency.

The tests were done on an internal blog system based on

an open source project. It was found that MongoDB

stored posts 850% faster than a SQL database. It has been

suggested that NoSQL should be used in environ ments

which a re concerned with data availability rather than

consistency.

Fotache & Cogean [14] describe the use of MongoDB

in mobile applications. Ce rtain mu ltiple update operations

like Upsert are easier and faster to perform with NoSQL

than SQL database. The use of cloud computing along

with NoSQL is said to increase the performance

especially in the data layer for mobile platforms.

Ullah [15] co mpared performance of both re lational

database management system (RDBMS) and NoSQL

database where Resource Description Fra me work (RDF)

based Trip le store was used as the NoSQL database. It

was noted that NoSQL database was slower than the

relation database due to the mass amount of me mory

usage by the NoSQL database. Reading a large a mount of

data takes toll on the database and because of the

unstructured format of NoSQL database the storage of

thousand records requires a huge amount of storage

whereas the RDBMS uses less amount of storage. For

e xa mple , searching red berry in the database took 5255

ms in the NoSQL database while it only took 165.43 ms

to search it in RDBMS.

Floratou et al. [4] performed the Yahoo Cloud Serving

Benchma rk (YCSB) test on RDBMS and MongoDB.

They tested SQL client sharded database against

MongoDB auto and client sharded databases. The tests

found that SQL client sharded database was able to attain

higher throughput and lower latency in most of the

benchmarks. The reason for higher performance is SQL is

attributed to the fact that majority of the read requests are

made to pages in the buffer pool whereas NoSQL

databases tend to read shards located at different nodes.

The study has tried to prove that RDBMS still has the

processing power to handle larger wo rkloads similar to

NoSQL.

There are many advantages of NoSQL databases over

SQL databases like easy scalability, fle xib le schema,

lower cost and efficient and high performance. Having

said that, there are some weaknesses of NoSQL over SQL

databases to [12][16]. These are summarised below:

 NoSQL is new and immature; therefore, there is lack of familiarity and limited expertise.

 NoSQL databases scale horizontally by giv ing up either consistency or availability.

 There is no standard query and manipulation

language in all NoSQL databas es.

 There is no standard interface for NoSQL databases

SQL Versus NoSQL M ovement with Big Data Analytics 63

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

 It is d ifficult to e xport a ll data in distributed ones

(Cassandra) compared to non -distributed ones

(MongoDB).

 NoSQL databases are challenging to install and

difficult to maintain.

We have identified the following situations when

NoSQL should be more suitable than SQL in the context

of Big Data analytics:

1. Simp licity of use – current Big Data technologies

are co mple x requiring highly skilled technical

e xpertise, wh ile NoSQL offers simplicity that

would improve the productivity of both developers

and users. The simple, s mall, intuitive and easy to

learn NoSQL stacks can suit businesses that

require Big Data analytics to adopt a clean

NoSQL-like APIs.

2. Adaptability to change – when business

require ments and data models change warranting

fle xib le Big Data analytics, NoSQL that supports

fle xib le data schemas are idea l to integrate siloed

and disparate backend systems.

3. Efficiency for analytics functionality – The

foundation data structure of ma jority of NoSQL

technology is the Javascript Object Notation

(JSON) data format that caters to both schema-on-

read and schema-on-write efficiently for data

warehousing functionality. For e xa mp le, NoSQL

Big Data Warehouse, SonarW for JSON ma kes

analytics functionality effic ient for Big Data

applications.

4. Distributed scalability – with mo re and more

distributed nature of systems and transactions,

fle xib le data beco mes the norm and strict schema

approach is unsuitable. With schema evolution,

NoSQL p rovides the necessary scalability for Big

Data platforms to perform distributed queries

faster.

T able 2. Comparison of NoSQL Dat a Models

NoS Q L Data

Mode l s NoS Q L Database s Pe rforman ce S cal abi l i ty Fl e xi bi l i ty C ompl e xi ty Fu n cti on al i ty

Ke y-Val u e

DyanmoDB,

Cassandra, AT S, Riak

Berkeley DB,

High High High None Variable (None)

W i de -C ol u mn Cassandra, Hbase, Big

T able, HyperT able High High Moderat e Low Minimal

Docu me n t MongoD, CouchDB,

Riak, DynamoDB High

Variable

(High) High Low Variable (Low)

Graph Neo4j, Orient DB,

Giraph, T it an. Vari-able Variable High High Graph T heory

VI. COMP ARISON OF NOSQL DAT A MODELS

NoSQL databases vary in their performance depending

on their data model [17]. We co mpare the key attributes

of the four types of NoSQL data mode ls and summarise

them in Table 2.

As shown in Table 2, we have considered key

attributes such as, performance, scalability, fle xib ility,

comple xity and functionality fo r co mparing the four data

models supported by the popular NoSQL database

software that are available in the market.

Fig. 2 shows CAP theore m that fo rms a visual guide to

NoSQL databases under each NoSQL data model [16],

which is based on consistency, availability and partition

tolerance features. With NoSQL databases, there are now

other options for storing different kinds of data where

typically d istributed set of servers have to fit two of the

three require ments of the CAP theorem, wh ich is usually

a deciding factor in what technology could be used.

Ba zar & Losif [3] co mpared the performance of

MongoDB, Cassandra and Couchbase databases, each

possessing different features and functionalities. The tests

were conducted using the YCSB tool.

Fig.2. CAP t heorem for NoSQL dat abases.

VII. RESULT S

The benchmark tests found that Couchbase produced

the lowest latencies for interactive database applications.

Couchbase is able to process more operations per second

with a lowe r average latency in read ing and writing data

than both MongoDb and Cassandra. Docu ment level

64 SQL Versus NoSQL M ovement with Big Data Analytics

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

locking in Couchbase database is the prima ry reason for

faster read and write operations. Cassandra is faster in

writing than MongoDb but both of the m have almost

equal reading speed. It is also mentioned that each

NoSQL database is suitable to specific application

environments and cannot be considered a comp lete

solution for every workload and use case.

Another case study by Klein et a l. [18] looked at the

use of NoSQL database MongoDB, Ria k and Couchbase

in a distributed healthcare organisation. These databases

use different NoSQL data models including key -va lue

(Riak), column (Cassandra) and document (MongoDB).

Cassandra produced the overall best performance for

all types of database operations (Reading, Writ ing, and

Updating). Riak‟s performance was degraded due to its

internal thread pool c reating a pool for each client session

instead of creating a shared pool for a ll c lient sessions.

Cassandra had the highest average latencies but also

produced the best throughput results. This was firstly due

to the indexing features that allowed Cassandra to

retrieve the most recent written records efficiently,

especially compared to Ria k. Secondly, the hash -based

sharding allowed Cassandra to distribute the request for

storage to be load better than MongoDB.

Prasad & Gohil [11] discussed the use of diffe rent

NoSQL databases for different work environ ments. It is

reported that the performance of NoSQL databases is

increased because of the use of a collection of processors

in the distributed system. MongoDB and Cassandra are

considered the best databases to be used in cases where

data is frequently written but rarely read. The NoSQL

databases are ment ioned to be victims of Consistency,

Availability and Partit ioning (CAP) theore m. Th is means

that a trade-off is always made e.g. the database can

either be consistent with low performance or offe rs high

availability and low consistency with fast performance

[11][17][19].

Zhikun et al. [20] suggested the use of a new database

allocation strategy based on load (DASB) in order to

increase performance of the NoSQL database. However,

the DASBL only works when it satisfies four conditions

and is unable to cater to an unbalanced system load.

Prasad et al. [11] co mpared different attributes such as

Replication, Sharding, Consistency and Failure handling.

We summa rise all these finding s in Table 3, wh ich

provides a list of the best NoSQL databases for each of

the features reported in literature.

Several doubts arise on the NoSQL pro mises and

studies have been conducted to explore the strengths and

weaknesses of NoSQL [21][22]. A recent study reviews

the trends of storage and computing tools with their

relative capabilit ies, limitations and environment they

are suitable to work with [23]. While h igh-end platforms

like IBM Netezza AMPP could cater to Big Data, due to

economic considerations, choices such as Hadoop have

proliferated world-wide resulting in the rise of NoSQL

database adoption that can integrate easily with Hadoop.

Even though HBase supports strong integration with

Hadoop using Apache Hive, it could provide a better

choice for applicat ion development only but not for rea l-

time queries and OLTP applicat ions due to very high

latency. On the other hand, graph -based platforms such

as Neo4j and Giraph form better options for storage

and computation due to their capability to model verte x-

edge scenarios in businesses that involve data

environments such as social networks and geospatial

paths .

Overall, Big Data has led to the require ment of new

generation data analytics tools [24][25] and hence it is

realistic to believe that both SQL and NoSQL databases

will coe xist. With cloud environments that support SQL

databases, fast processing of data is warranted to enable

efficient elasticity [26] and Big Data analytics that

involve current and past data as well as future predict ions.

New solutions are being proposed for cloud monitoring

with the use of NoSQL databases back-end to achieve

very quick response time.

T able 3. NoSQL Dat abases mapped t o t heir feat ures

Fe atu re s Be st NoS Q L Database s

High availabilit y Riak, Cassandra, Google Big

T able, Couch DB

P art it ion T olerance MongoDB, Cassandra, Google Big

t able, CouchDB, Riak, Hbase

High Scalabilit y Google Big t able

Consist ency MongoDB, Google Big T able,

Redis, Hbase

Aut o-Sharding MongoDB

Writ e Frequently, Read Less MongoDB, Redis, Cassandra

Fault T olerant (No Single

P oint Of Failure) Riak

Concurrency Cont rol

(MVCC)

Riak, Dynamo, CouchDB,

Cassandra, Google Big T able

Concurrency Cont rol

(Locks)

MongoDB, Redis, Google Big

T able

VIII. CONCLUSIONS

The industry has been dominated by relational o r SQL

databases for several years. Ho wever, with business

situations recently having the need to store and process

large datasets for business analytics, NoSQL database

provides the answer to overcome such challenges.

NoSQL offers s chema less data store and transactions that

allo w businesses to freely add fie lds to records without

the structured require ment of defin ing the schema a priori

which is a prime constraint in SQL databases. With the

growing need to manage large data and unstructured

business transactions via avenues such as social networks,

NoSQL graphs are we ll suited for data that has comple x

relationship structures and at the same time simp licity is

achieved through key-value stores. NoSQL data models

provide options for storing unstructured d ata to be

document-oriented, key-value pairs, colu mn-oriented or

graphs. These NoSQL storage models a re easy to

understand and imple ment and do not require comp le x

SQL Versus NoSQL M ovement with Big Data Analytics 65

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

SQL optimizat ion techniques to perform Big Data

analytics. This paper has compared SQL vers us NoSQL

databases as well as the four data models of NoSQL in

the context of Big Data analytics for business situations.

We conclude that the fle xib le data modelling of NoSQL

is well suited to support dynamic scalability and

improved performance for Big Data analytics and could

be leveraged as new categories of data architectures

coexisting with traditional SQL databases.

REFERENCES

[1] Choi, Y., Jeon, W., & Yo, S. (2014), 'Imp roving Database Sy stem Performance by App ly ing NoSQL', Journal Of

Information Processing Systems, 10(3), 355-364.

[2] M oniruzzaman, A. B., & Hossain, S. A. (2013), No SQL database: New er a of d atabases for big data analy tics -

Classification, char acteristics and comp arison. International Journal o f Database Theory and

Application, 6(4), 1-14.

[3] Bazar, C., & Losif, C. (2014), 'The Transition from RDBM S to NoSQL. A Comp arative Analy sis of Three Pop ular Non-Relational Solutions: Cassandra, M ongoDB

and Couchbase', Database Systems Journal, 5(2), 49-59.

[4] Floratou, A., Teletia, N., Dewitt, D., Patel, J., & Zhang, D. (2012), 'Can the Elep hants Handle the NoSQL

Onslaught?', VLDB Endowment, 5(12), 1712-1723. [5] M ason, R. T. (2015), 'NoSQL databases and data

modelin g techniqu es for a document -oriented NoSQL

database', Proceedings of In forming Science & IT

Education Conference (InSITE) 2015, 259-268.

[6] Pothuganti, A. (2015) 'Big Data Analy tics: Hadoop -M ap Reduce & NoSQL Databases', International Journal of

Computer Scien ce and Information Technologies, 6(1),

522-527.

[7] Smolan, R. & Erwit, J. (2012), The Human face of Big Data, Against all odds p roduction, O‟Reilly , USA.

[8] Sharda, R., Delen, D., & Turban, E. (2015), Business intelligen ce and analytics: systems for decision support

(10th ed.). Up p er Saddle River, NJ: Pearson.

[9] Ohlhorst, F. (2013), Big data analytics: Turning big data into big money. Hoboken, NJ. John Wiley and Sons.

[10] Kaur, P.D., Kaur, A. & Kaur, S. (2015), „Performance Analy sis in Bigdata‟, International Journal of In formation

Technology and Computer Science (IJITCS), 7(11), 55-61.

[11] Prasad, A, & Gohil, B. (2014), 'A Co mp arative Study of NoSQL Databases', International Journal Of Advanced

Research In Comp uter Science, 5(5), 170-176.

[12] Nay ak, A., Poriy a, A. & Poojary , D. (2013), „Typ e of NOSQL Databases and its Comp arison with Relational

Databases‟, International Journal o f Applied In formation Systems (IJAIS), 5(4) Foundation of Computer Science

FCS, New York, USA.

[13] M ongoDB (2014), „Why NoSQL?‟, https://www.mongodb.com/nosql-exp lained, [Online: accessed 20-Feb-2016]

[14] Fotache, M ., & Cogean, D. (2013), 'No SQL and SQL Databases for M obile App lications. Case Study :

M ongoDB versus PostgreSQL', Informatica Economica,

17(2), 41-58. [15] Ullah, M d A. (2015), „A Digital Library for Plant

Information with Performance Comp arison between a

Relational Database and a NoSQL Database (RDF Trip le

Store)‟, Technical Library, Paper 205.

[16] Hurst, N. (2010, ‘Visual Guide to NoSQL Systems‟, http ://blog.nahurst.com/v isual- guid e-to-nosql-systems,

[Online: accessed 5-Nov-2015]

[17] Planet Cassandra ‘NoSQL Databases Defin ed and Explain ed’, http ://www.p lanetcassandra.org/what-is-

nosql,[Online: accessed 24-M ar-2016]

[18] Klein, J., Gorton, I., Ernst, N. & Donohoe, P. (2015), 'Performance Evalu ation of NoSQL Databases: A Case

Study ', Proceedings of the 1st Workshop on Performance

Analysis of Big Data Systems, PABS’15, Austin, 5-10.

[19] M ongoDB (2015), ‘Top 5 Considerations When Evaluating NoSQL Databases’,

https://s3.amazonaws.com/ info- mon godb-

com/10 gen_Top _5_NoSQL_Considerations.p df [Online:

accessed 5-Nov-2015]

[20] Zhikun, C., Shuqian g, Y., Shu an, T., Hui, Z., Li., Ge, Z.,& Huiy u, Z. (2014), „The Data Allocation Strategy Based on

Load in NoSQL Database’, Applied Mechanics and

Materials, 513-517, 1464-1469.

[21] Leavitt, N. (2010), „Will No SQL Databases Liv e Up to Their Promise?‟, IEEE Computer 43(2) , 12-14.

[22] Subramanian, S. (2012), ‘NoSQL: An Ana lysis of th e Strengths and Weaknesses’,

https://dzone.com/articles/nosql-an aly sis-strengths-and,

[Online: accessed 15-Jan-2016]

[23] Prasad B.R. & Agarwal S. (2016), 'Co mp arative Study of Big Data Comp uting and Storage Tools: A Review',

International Journal o f Database Theory and

Application 9(1), 45-66.

[24] Warden P. (2012), Big Da ta Glossary - A Guide to th e New Generation of Data Tools, O‟Reilly , USA.

[25] Zareian S., Fokaefs, M ., Khazaei H. Litoiu M . & Zhang X. (2016), 'A Big Data Framework for C loud M onitoring',

Proceedings of the 2nd In ternationa l Workshop on BIG

Data Software Engineering (BIGDSE'16), ACM Digital Library, 58-64.

[26] Ramanathan, V. & Venkatraman, S. (2015), C loud Adoption in Enterprises: Security Issues and Strateg ies,

96-121, Book Chap ter In Haider A. and Pishdad A. (Eds.),

Business Technologies in Contemp orary Organizations: Adoption, Assimilation, and Institutionalization, IGI

Global Publishers, USA.

Authors’ Profiles

Dr. Sitalakshmi Venkatraman obtained doctoral degr ee in Comp uter Science, from

National Institute of Industrial Engineer in g,

India in 1993 and M Ed from University of

Sheffield, UK in 2001. Prior to this, she had comp leted M Sc in M athematics in 1985 and

MTech in Comp uter Science in 1987, both

from Indian Institute of Technolo gy , M adras,

India. This author is a Senior M ember (SM ) of IASCIT.

In the p ast 25 y ears, Sita's work exp erience involv es both industry and academics - develop ing turnkey p rojects for IT

industry and teachin g a variety of IT courses for tertiary

institutions, in India, Sin gap ore, New Zealand, and more

recently in Australia since 2007. Sh e curr ently works as

Lecturer (Information Technolo gy ) at the School of En gineerin g, Construction & Design, M elbourne Poly technic, Australia. She

also serves as M ember of Register of Exp erts at Australia's

Tertiary Education Quality and Standards Agency (TEQSA).

Sita has p ublished eight book ch ap ters and more than 100 research p ap ers in internationally well-known refereed journals

and conferences that include Information Scien ces, Journal of

Artificial Intelligence in Engin eering, International Journal of

66 SQL Versus NoSQL M ovement with Big Data Analytics

Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66

Business Information Systems, and Information Management &

Computer Security. She serves as Program Committee M ember

of several international confer ences and Sen ior M ember of

p rofessional societies and editorial board of three international

journals.

Kiran Fahd receiv ed the B.Eng in software

engineer in g from the National University of Emer gin g Techno lo gies, Pakistan in 2001,

and the M aster‟s degree in Enterp rise

Plannin g Sy stem - ERP from the Victoria

University , M elbourne in 2010. Since 2001,

she has worked in the cap acity of software engineer and as a teacher.

Kiran has held various lecturing p ositions in Australian and

overseas universities. She currently teaches the subjects of

Bachelor of Infor mation Technolo gy under the Software

Develop ment major at the School of En gin eerin g, Construction & Design, M elbourne Poly technic, Australia.

Dr. S amuel Kaspi earned h is PhD (Comp uter Science) from Victoria University , a M asters

of Comp uter Science from M onash University

and a Bachelor of Economics and Politics

from M onash University . He is a member of

Australian Comp uter Society (ACS) and Association for Comp uting M achinery

(ACM ).

Sam is curr ently the Information Technology Discip line

Leader and Sen ior Lecturer of IT.at the School of En gineerin g,

Construction & Design, M elbourne Poly technic, Australia. Previously , Dr Kasp i taught at Victoria University , consulted

p rivately and was the CIO of OzM iz Pty Ltd.

Sam h as been active in both teachin g and p rivate enterp rise in

the areas of software sp ecification, design and develop ment. As chief information off icer (CIO) of a small p rivate comp any he

managed the develop ment and submission of five granted and

three p ending p atents. He also managed the submission of a

successful Federal Govern ment Comet gr ant under the

Commercialisin g Emer gin g Technolo gies category . He has also had a numb er of p eer rev iewed p ublications in cludin g the

Institute of Electrical and Electronics Engineers (IEEE).

Dr. Ramanathan Venkatraman is working as M ember, Advanced

Technology App lication Practice at

National University of Singap ore. He has

served industry and academia for mor e than

32 y ears and has a wide sp ectrum of exp erien ce in the fields of IT and business

p rocess engineerin g. His current resear ch

focuses in evolvin g decision mod els for business p roblems and

more recently , he has b een contributing to frontiers of

knowledge by devising innovative ar chitectural models for ICT in domains such as Serv ice Oriented Architecture, B ig Data and

Enterp rise Cloud Comp uting. Dr Venkatraman has a strong

p ractice app roach having worked in lar ge scale IT p rojects

across Asia, US, Europ e and NZ. He has p ublished more than 20 research p ap ers in leading journ als. Ap art from research and

consulting, he also teaches adv anced technical courses for

M asters p rogram at NUS and has been a key arch itect in setting

up innovative software engineerin g and business analy tics

curriculum in the fast changing IT education scenario.

How to cite this paper: Sitalakshmi Venkatraman, Kiran Fahd,

Samu el Kasp i, Ramanathan Venkatraman, " SQL Versus

NoSQL M ovement with Big Data Analy tics", International

Journal of Information Technolo gy and Comp uter

Science(IJITCS), Vol.8, No.12, p p .59-66, 2016. DOI: 10.5815/ijitcs.2016.12.07