Database Essay
I.J. Information Technology and Computer Science, 2016, 12, 59-66
Published Online December 2016 in MECS (http://www.mecs -press.org/)
DOI: 10.5815/ijitcs.2016.12.07
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
SQL Versus NoSQL Movement with Big Data
Analytics
Sitalaks hmi Ve nkatraman School of Engineering, Construction and Design (IT), Melbourne Polytechnic, VIC 3072, Australia
E-mail: [email protected]
Kiran Fahd, Samue l Kas pi School of Engineering, Construction and Design (IT), Melbourne Polytechnic, VIC 3072, Australia
E-mail: [email protected], [email protected]
Ramanathan Ve nkatraman
National University of Singapore, Singapore
E-mail: [email protected]
Abstract—Two ma in revolutions in data manage ment
have occurred recently, name ly Big Data analytics and
NoSQL databases. Even though they have evolved with
diffe rent purposes, their independent developments
comple ment each other and their convergence would
benefit businesses tremendously in ma king real-t ime
decisions using volumes of co mple x data sets that could
be both structured and unstructured. While on one hand
many software solutions have emerged in supporting Big
Data analytics, on the other, many NoSQL database
packages have arrived in the market. However, they lack
an independent benchmarking and co mparat ive
evaluation. The a im of this paper is to provide an
understanding of their contexts and an in -depth study to
compare the features of four ma in NoSQL data models
that have evolved. The performance compa rison of
traditional SQL with No SQL databases for Big Data
analytics shows that NoSQL database poses to be a better
option for business situations that require simplicity,
adaptability, high performance analytics and distributed
scalability of large data. This paper concludes that the
NoSQL move ment should be leveraged for Big Data
analytics and would coe xist with re lational (SQL)
databases .
Index Terms—Structured Query Language (SQL), Non
SQL (NoSQL), Big Data, Big Data Analytics, Re lational
Database, SQL Database, NoSQL Database.
I. INT RODUCT ION
As the technology environment transforms and faces
new challenges, businesses increasingly realize the need
to evaluate new approaches and databases to ma nage
their data to support changing business requirements and
growing co mple xity and e xpansion of their applicat ions
[1]. Re lational database has been the default choice for
data model adoption in businesses worldwide over the
past thirty years with Structured Query Language (SQL)
as the standard language designed to perform the basic
data operations. However, with the e xplosion of data
volume, SQL-based data querying lose effic iency, and in
particular, managing larger databases has become a ma jor
challenge [2]. In addit ion, re lational databases exh ibit a
variety of limitations in meeting the recent Big Data
analytics require ment in businesses. While clusters -based
architecture has emerged as a solution for la rge databases,
SQL is not designed to suit clusters and this mis match
has led to think of alternate solutions. There are
mis matches between persistent data model and in -
me mo ry data structures, and servers based on SQL
standards are now prone to me mory footprint, security
risks and performance issues.
NoSQL (Non SQL) databases with a set of new data
manage ment features, on the other hand, are more
fle xib le and horizontally scalable. They a re considered as
alternatives to overcome the limitations of the current
SQL-dominated persistence landscape and hence they are
also known as non-relational databases [3]. The main
goal for the NoSQL move ment is to allo w easy storage
and retrieval of data, regardless of its structure and
content, which is possible due to the non -existence of a
rig id data structure in non-relat ional databases. NoSQL
databases exhib it horizontal scalability by taking
advantage of new clusters and several low -cost servers. In
addition, they are envisaged to automatically manage data
administration including fault recovery and these
capabilit ies would result in huge cost savings. Though
non-relational databases are providing different features
and advantages, they were init ially characterised by lack
of data consistency and non-ability to query stored
records using SQL. With the e mergence of NoSQL
databases new features and optimisation characteristics
are evolving to overcome these limitations as well.
However, their total capabilities are still not disclosed [4].
Also, due to the increasing differences in NoSQL
database offerings and their non-standard features,
businesses are not clear on what is the stand to take.
In this paper, we first provide an overview of the
60 SQL Versus NoSQL M ovement with Big Data Analytics
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
present context o f Big Data ana lytics and NoSQL
databases. Ne xt, we discuss the four ma in data models of
non-relational databases and compare them with SQL
databases. There are a variety of No SQL databases and
which one is more appropriate for wh ich business
operation re ma ins an unanswered question so far. We
compare the different data models of NoSQL in terms of
their features and the NoSQL databases available in the
ma rket that support those features. The different data
man ipulation mechanis ms and optimisation techniques
adopted by NoSQL databases could result in their
diffe rence in performance. We discuss how these factors
play a ma jor ro le in Big Data analytics and identify the
associated challenges. We also consider the coexistence
of NoSQL databases with re lational databases and discuss
their relevance in different business contexts.
II. RELAT ED WORK: T HE CONT EXT OF NOSQL
DAT ABASES WIT H BIG DAT A ANALYT ICS
Fro m the recent trends reported in literature [5][6], it is
evident that in today's context, there is an e xponential
growth of data volume that are structured as well as
unstructured (Big Data) fro m a variety of data sources,
such as social media, e -ma ils, te xt documents, GPS data,
sensor data, surveillance data, etc. with increasing
Internet usage. Hence, we can say that Big Data is
characterised by structured, semi-structured, and
unstructured data collected fro m digita l and non-digital
resources. The ma in cha llenge is the effective use of this
Big Data that represents the data source for effic ient
decision-ma king by adopting suitable data min ing
techniques [7][8].
Based on our literature survey, we have identified that
the current challenges presented by Big Data are due to
the following general characteristics e xperienced by
businesses:
High data Veloc ity – rapid ly and continuously
updated data streams fro m d ifferent sources and
locations.
Data Va riety – structured, semi-structured and unstructured data storage.
Data Vo lu me – huge nu mber of datasets with sizes of several terabytes or petabytes.
Data Co mp le xity – data organized in several
different locations or data centres.
It is important for businesses to perform Big Data
analytics, which is the process of e xa min ing la rge data
sets containing a variety of data types. Using Big Data
Analytics, businesses are able to a rrive at more accurate
analysis of huge a mounts of data to uncover hidden
patterns, unknown correlations, market trends, customer
preferences and other useful business information [2][9].
In order to support timely and effect ive decision ma king,
Big Data analytics re lies on large volu mes of data that
requires clusters for data storage. However, sinc e
relational databases are not designed for clusters, and
e xhibit performance issues with regard to Big Data
analytics, businesses are considering the need for the
NoSQL movement [10].
The schema of NoSQL is not fixed. It uses varied
interfaces to store and analyse sheer volume of user-
generated content, personal data and spatial data being
generated by modern applications, clou d computing and
smart devices. [1][11].
In this context, NoSQL database presents a preferred
solution than SQL database primarily for its ability to
cater to the horizontal partitioning of data, fle xib le data
processing and improved performance. Large Internet
companies (Facebook, Lin kedIn, A ma zon and Google),
which cannot process services by using existing re lational
databases, had researched and led to the advent of
NoSQL to solve their proble m of dealing with
continuously increasing data, optimised data utilizat ion
and horizontal scalability of large data. No SQL databases
are a better option for the information systems that
require h igh performance and dynamic scalability more
than the requirements of reliability, highly distributed
nature of the three-tier Internet architecture systems and
cloud computing [1][3][11]. There fore, it is necessary to
investigate further and compare SQ L versus NoSQL as
well as the salient differences in the performance of
NoSQL data models in supporting the necessary features
for Big Data analytics. This paper presents these
investigations and findings in today's Big Data context.
III. NOSQL DAT A MODELS
There are many NoSQL databases available, however,
they fall under four data models described below
[3][11][12]. Each category has its own specific attributes
but there are cross overs between the different data
models. Generally, a ll NoSQL databases are built to be
distributed and scaled horizontally.
Key-Va lue Store Database – Key-Va lue store is a
simp le but effic ient and powerful NoSQL database. The
data is stored in two parts, a string that represents the key
and the actual data that represents the value, thus creating
a “key-value” pair. This results in values being indexed
by keys for retrieval, a concept simila r to hash tables. In
other words, the store allo ws the user to request the
values according to the key specified. It can handle
structured or unstructured data. It offers high concurrency
and scalability as we ll as rap id lookups, but little
consistency.
Such Key-Value store databases can be used to
develop forums and online shopping carts and websites
where user sessions are required to be stored. So me
notable exa mp les are A mazon‟s Dyan moDB, Apache‟s
Cassandra, Azure Table Storage (AT S), Orac le Berke ley
DB, and Basho Technologies‟ Ria k. A ma zon offers fu lly
managed No SQL store service DynamoDB for the
purpose of internet scale applications . It is a distributed
key-value storage system wh ich provides fast, reliable
and cost-effective data access and high availability and
durability due to its replica feature.
One of the advantages of Key-Value store database is
its high insert/read rates compared to traditional SQL
SQL Versus NoSQL M ovement with Big Data Analytics 61
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
database. This is achieved by saving more than one entry
to the store as shown in the example below:
@db.bulk_save([
{"hot" => "and spicy"},
{"cold" => "yet loving"},
{"other" => ["set","of","keys"]}
])
Colu mn Oriented (o r wide -colu mn) Store Databases –
In colu mn store databases, columns are defined for each
row instead of being predefined by the table structure
having uniform sized co lu mns for each row. Such stores
have a two-level aggregate structure, a key and a row
aggregate, which is a group of columns. Any column can
be added to any row, and rows can have very different
columns. In other words, each row has diffe rent
number of colu mns that are stored. It can also store data
tables as sections of columns of data. Data can be vie wed
as either row-oriented where each row is an aggregate, or
column-o riented where each colu mn fa mily defines a
record type. Each key is associated with one or more
columns and a key for each colu mn fa mily is used for
rapid data retrieval with less I/O activity thereby offering
very high performance. These databases provide high
scalability as they store data in highly distributed
architectures.
Wide-colu mn databases is ideal to be used for data
mining and analytic applicat ions with Big Data.
Exa mples of some colu mn-oriented store providers are
Facebook‟s high-performance Cassandra, Apache Hbase,
Google ‟s Big Table and HyperTable. Google ‟s Big Table
is high performance wide-colu mn database that can deal
with vast amount of data. It is developed on Google File
System GFS using C/C++. It is used by multip le Google
applications like YouTube and Gma il that have varied
latency demand of the database. It is not distributed
outside Google besides the usage inside Google's App
Engine. Big Tab le is designed for easy scalability across
thousands of machines, thus, it is tolerant to hardware
failures.
Document Store Databases – Document database
e xtends the basic key-value database concept and stores
comple x data in document form such as XML, PDF or
JSON documents. A document store is typically schema-
less where each document can contain different fie lds of
any length. Documents are accessed or identified by
using a unique key wh ich may be simple string, URI
string or path string. Docu ment databases are more
comple x databases but offer h igh performance, horizontal
scalability and schema fle xib ility which a llo w storing
virtually any structure required by any application.
Document oriented databases are suitable for content
manage ment systems and blog application s. So me
e xa mples of providers using document oriented databases
are 10gen‟s MongoDB, Apache CouchDB, Basho
Technologies‟ Ria k, Azure 's Docu mentDB and AWS
Dynamo DB. MongoDB is developed by 10gen using
C++ and is a structure free, c ross -platform document
oriented database. It uses Grid File System to store large
files such as images and videos in BSON (Binary JSON)
format. It prov ides effic ient performance, h igh
consistency and high persistence but it is not very reliable
and is resource hungry.
Graph Store – Graph database focuses on relationships
between data. It uses the graph theory approach to store
the data and optimises the search by using index free
adjacency technique. It is designed for data whose
relationships are we ll represented by graph structures
consisting of nodes, edges and properties. A node
represents an object (an entity in the database), an edge
describes the relationship between the objects and the
property is the node on the other end of the relationship.
In inde x free adjacency technique, each node consists of a
pointer which directly points to the adjacent node as
shown in Fig. 1.
These stores provide fast performance, A CID
compliance and rollback support. These databases are
suitable to develop social-networking applications,
bioinformat ics applications, content manage ment systems
and cloud management services. Exa mp les of notable
Graph databases are Neo Technology‟s Neo4j , Orient
DB, Apache Giraph and Titan.
Apache Giraph is an open source large-scale graph
processing system and imp le mentation of Google Pregel
(a graph processing architecture which has vertex-centric
approach). It is designed for high scalability to overcome
the crucial need for scalable platforms and parallel
architectures that can process the bulk data p roduced by
modern applications such as social networks and
knowledge bases. For e xa mp le, it is currently used at
Facebook, Lin kedIn and Twitter to analyse the graph
formed by users and their connections. Giraph is a
distributed and fault-tolerant system and offers features
such as, master co mputation, sharded aggregators, edge -
oriented input and out-of-core computation.
Fig.1. Graph algorit hm.
IV. HIGH LEVEL COMP ARISON BET WEEN NOSQL AND SQL
DAT ABASES
Based on the features of each type of database recently
reported in the literature [1][3][11][13], we performed a
high level co mparison between SQL (re lational) and
NoSQL (non-re lational) databases and the summary of
findings is given in Table 1.
We considered aspects such as database type, schema,
data model used, scaling model availab le, transactional
capabilit ies, data man ipulation method used, and popular
62 SQL Versus NoSQL M ovement with Big Data Analytics
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
database software available in the market in order to
compare SQL databases versus NoSQL databases. Some
e xa mples are a lso given in Table 1 for a better
understanding of their diffe rences . Overall, Tab le 1
provides the high level diffe rences in the key features and
properties exhib ited by relational and non -relational
databases, which would support businesses in making
decisions about using SQL or NoSQL database options in
various Big Data application scenarios .
T able 1. Relat ional Versus NoSQL Dat abases - High Level Differences
Relational Databases NoSQL Databases
Data base
Type
One SQL DBMS product
(marginal variations)
Four general types: key- value, document and wide-
column and graph st ores
S ch ema
Based on pre-defined foreign-key relationships bet ween t ables in an explicit dat abase schema
St rict definit ion of schema and dat a t ypes is required before insert ing dat a
Any updat e alt ers t he
ent ire dat abase.
Dynamic db schema Do not force schema
definit ion in advance Different dat a can be
st ored t ogether as required
Allows modificat ion of t he schema freely wit h
no downt ime.
Data
Mode l s
Dat a records are st ored as row and columns in different tables joined via
relat ionships Explicit defined dat a types
of columns t o store a specific piece of dat a
For example, SQL engine joins t wo separate t ables t he "employees" and "depart ments" t ogether to
find out t he department of an employee.
Support s all t ypes of dat a – st ruct ured, semi- st ruct ured, and
unst ruct ured Different products offer
different and flexible dat a models. For
example:. Document st ore t ype organizes all relat ed dat a using references and
embedded document s t ools.
S cal i n g
Mode l
Vert ical Scaling Dat a resides on a single
node and capacit y is added t o exist ing resources(data st orage or I/O capacity)
Horizontal Scaling Modern approach of
part it ioning of t he dat a across addit ional servers or cloud inst ances as required.
Tran s -
acti on
C apab-
i l i ti e s
Based on ACID t ransact ional propert ies, such as atomicity, consistency, isolat ion,
durabilit y to ensure high dat a reliabilit y and dat a int egrit y.
At omic t ransactions
Degrade t he performance
Support s AID t ransactions and CAP T heorem of dist ributed syst ems supports
consist ency of dat a across all nodes of a NoSQL dat abase
t here is at omicity at t he
single document .
Data
Mani pul
ati on
St ruct ured Query Language – SQL DML
St atement s are used to manipulat e dat a e.g.
SELECT cust omer_name
FROM cust omers WHERE cust omer_age>18;
Query dat a efficiently. Object - Oriented APIs
are used e.g. db.cust omers.find( {cust omer_age: {$gt :
18 }} { cust omer_name:1 })
S oftware Oracle, MySQL, DB2, SQLServer
Mongodb, Riak,
Couchbase, Ret hinkdb, Redis, Aerospike, Leveldb, Hbase, Cassandra, Neo4j,
Elast icsearch, Lucene
V. PERFORMANCE OF NOSQL AND SQL DAT ABASES FOR
BIG DAT A ANALYT ICS
The most important reason in moving towards NoSQL
fro m re lational database is due to require ments of
performance imp rovements. Choi et al. [1] found that a
NoSQL database such as MongoDB provided mo re stable
and faster performance at the e xpense of data consistency.
The tests were done on an internal blog system based on
an open source project. It was found that MongoDB
stored posts 850% faster than a SQL database. It has been
suggested that NoSQL should be used in environ ments
which a re concerned with data availability rather than
consistency.
Fotache & Cogean [14] describe the use of MongoDB
in mobile applications. Ce rtain mu ltiple update operations
like Upsert are easier and faster to perform with NoSQL
than SQL database. The use of cloud computing along
with NoSQL is said to increase the performance
especially in the data layer for mobile platforms.
Ullah [15] co mpared performance of both re lational
database management system (RDBMS) and NoSQL
database where Resource Description Fra me work (RDF)
based Trip le store was used as the NoSQL database. It
was noted that NoSQL database was slower than the
relation database due to the mass amount of me mory
usage by the NoSQL database. Reading a large a mount of
data takes toll on the database and because of the
unstructured format of NoSQL database the storage of
thousand records requires a huge amount of storage
whereas the RDBMS uses less amount of storage. For
e xa mple , searching red berry in the database took 5255
ms in the NoSQL database while it only took 165.43 ms
to search it in RDBMS.
Floratou et al. [4] performed the Yahoo Cloud Serving
Benchma rk (YCSB) test on RDBMS and MongoDB.
They tested SQL client sharded database against
MongoDB auto and client sharded databases. The tests
found that SQL client sharded database was able to attain
higher throughput and lower latency in most of the
benchmarks. The reason for higher performance is SQL is
attributed to the fact that majority of the read requests are
made to pages in the buffer pool whereas NoSQL
databases tend to read shards located at different nodes.
The study has tried to prove that RDBMS still has the
processing power to handle larger wo rkloads similar to
NoSQL.
There are many advantages of NoSQL databases over
SQL databases like easy scalability, fle xib le schema,
lower cost and efficient and high performance. Having
said that, there are some weaknesses of NoSQL over SQL
databases to [12][16]. These are summarised below:
NoSQL is new and immature; therefore, there is lack of familiarity and limited expertise.
NoSQL databases scale horizontally by giv ing up either consistency or availability.
There is no standard query and manipulation
language in all NoSQL databas es.
There is no standard interface for NoSQL databases
SQL Versus NoSQL M ovement with Big Data Analytics 63
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
It is d ifficult to e xport a ll data in distributed ones
(Cassandra) compared to non -distributed ones
(MongoDB).
NoSQL databases are challenging to install and
difficult to maintain.
We have identified the following situations when
NoSQL should be more suitable than SQL in the context
of Big Data analytics:
1. Simp licity of use – current Big Data technologies
are co mple x requiring highly skilled technical
e xpertise, wh ile NoSQL offers simplicity that
would improve the productivity of both developers
and users. The simple, s mall, intuitive and easy to
learn NoSQL stacks can suit businesses that
require Big Data analytics to adopt a clean
NoSQL-like APIs.
2. Adaptability to change – when business
require ments and data models change warranting
fle xib le Big Data analytics, NoSQL that supports
fle xib le data schemas are idea l to integrate siloed
and disparate backend systems.
3. Efficiency for analytics functionality – The
foundation data structure of ma jority of NoSQL
technology is the Javascript Object Notation
(JSON) data format that caters to both schema-on-
read and schema-on-write efficiently for data
warehousing functionality. For e xa mp le, NoSQL
Big Data Warehouse, SonarW for JSON ma kes
analytics functionality effic ient for Big Data
applications.
4. Distributed scalability – with mo re and more
distributed nature of systems and transactions,
fle xib le data beco mes the norm and strict schema
approach is unsuitable. With schema evolution,
NoSQL p rovides the necessary scalability for Big
Data platforms to perform distributed queries
faster.
T able 2. Comparison of NoSQL Dat a Models
NoS Q L Data
Mode l s NoS Q L Database s Pe rforman ce S cal abi l i ty Fl e xi bi l i ty C ompl e xi ty Fu n cti on al i ty
Ke y-Val u e
DyanmoDB,
Cassandra, AT S, Riak
Berkeley DB,
High High High None Variable (None)
W i de -C ol u mn Cassandra, Hbase, Big
T able, HyperT able High High Moderat e Low Minimal
Docu me n t MongoD, CouchDB,
Riak, DynamoDB High
Variable
(High) High Low Variable (Low)
Graph Neo4j, Orient DB,
Giraph, T it an. Vari-able Variable High High Graph T heory
VI. COMP ARISON OF NOSQL DAT A MODELS
NoSQL databases vary in their performance depending
on their data model [17]. We co mpare the key attributes
of the four types of NoSQL data mode ls and summarise
them in Table 2.
As shown in Table 2, we have considered key
attributes such as, performance, scalability, fle xib ility,
comple xity and functionality fo r co mparing the four data
models supported by the popular NoSQL database
software that are available in the market.
Fig. 2 shows CAP theore m that fo rms a visual guide to
NoSQL databases under each NoSQL data model [16],
which is based on consistency, availability and partition
tolerance features. With NoSQL databases, there are now
other options for storing different kinds of data where
typically d istributed set of servers have to fit two of the
three require ments of the CAP theorem, wh ich is usually
a deciding factor in what technology could be used.
Ba zar & Losif [3] co mpared the performance of
MongoDB, Cassandra and Couchbase databases, each
possessing different features and functionalities. The tests
were conducted using the YCSB tool.
Fig.2. CAP t heorem for NoSQL dat abases.
VII. RESULT S
The benchmark tests found that Couchbase produced
the lowest latencies for interactive database applications.
Couchbase is able to process more operations per second
with a lowe r average latency in read ing and writing data
than both MongoDb and Cassandra. Docu ment level
64 SQL Versus NoSQL M ovement with Big Data Analytics
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
locking in Couchbase database is the prima ry reason for
faster read and write operations. Cassandra is faster in
writing than MongoDb but both of the m have almost
equal reading speed. It is also mentioned that each
NoSQL database is suitable to specific application
environments and cannot be considered a comp lete
solution for every workload and use case.
Another case study by Klein et a l. [18] looked at the
use of NoSQL database MongoDB, Ria k and Couchbase
in a distributed healthcare organisation. These databases
use different NoSQL data models including key -va lue
(Riak), column (Cassandra) and document (MongoDB).
Cassandra produced the overall best performance for
all types of database operations (Reading, Writ ing, and
Updating). Riak‟s performance was degraded due to its
internal thread pool c reating a pool for each client session
instead of creating a shared pool for a ll c lient sessions.
Cassandra had the highest average latencies but also
produced the best throughput results. This was firstly due
to the indexing features that allowed Cassandra to
retrieve the most recent written records efficiently,
especially compared to Ria k. Secondly, the hash -based
sharding allowed Cassandra to distribute the request for
storage to be load better than MongoDB.
Prasad & Gohil [11] discussed the use of diffe rent
NoSQL databases for different work environ ments. It is
reported that the performance of NoSQL databases is
increased because of the use of a collection of processors
in the distributed system. MongoDB and Cassandra are
considered the best databases to be used in cases where
data is frequently written but rarely read. The NoSQL
databases are ment ioned to be victims of Consistency,
Availability and Partit ioning (CAP) theore m. Th is means
that a trade-off is always made e.g. the database can
either be consistent with low performance or offe rs high
availability and low consistency with fast performance
[11][17][19].
Zhikun et al. [20] suggested the use of a new database
allocation strategy based on load (DASB) in order to
increase performance of the NoSQL database. However,
the DASBL only works when it satisfies four conditions
and is unable to cater to an unbalanced system load.
Prasad et al. [11] co mpared different attributes such as
Replication, Sharding, Consistency and Failure handling.
We summa rise all these finding s in Table 3, wh ich
provides a list of the best NoSQL databases for each of
the features reported in literature.
Several doubts arise on the NoSQL pro mises and
studies have been conducted to explore the strengths and
weaknesses of NoSQL [21][22]. A recent study reviews
the trends of storage and computing tools with their
relative capabilit ies, limitations and environment they
are suitable to work with [23]. While h igh-end platforms
like IBM Netezza AMPP could cater to Big Data, due to
economic considerations, choices such as Hadoop have
proliferated world-wide resulting in the rise of NoSQL
database adoption that can integrate easily with Hadoop.
Even though HBase supports strong integration with
Hadoop using Apache Hive, it could provide a better
choice for applicat ion development only but not for rea l-
time queries and OLTP applicat ions due to very high
latency. On the other hand, graph -based platforms such
as Neo4j and Giraph form better options for storage
and computation due to their capability to model verte x-
edge scenarios in businesses that involve data
environments such as social networks and geospatial
paths .
Overall, Big Data has led to the require ment of new
generation data analytics tools [24][25] and hence it is
realistic to believe that both SQL and NoSQL databases
will coe xist. With cloud environments that support SQL
databases, fast processing of data is warranted to enable
efficient elasticity [26] and Big Data analytics that
involve current and past data as well as future predict ions.
New solutions are being proposed for cloud monitoring
with the use of NoSQL databases back-end to achieve
very quick response time.
T able 3. NoSQL Dat abases mapped t o t heir feat ures
Fe atu re s Be st NoS Q L Database s
High availabilit y Riak, Cassandra, Google Big
T able, Couch DB
P art it ion T olerance MongoDB, Cassandra, Google Big
t able, CouchDB, Riak, Hbase
High Scalabilit y Google Big t able
Consist ency MongoDB, Google Big T able,
Redis, Hbase
Aut o-Sharding MongoDB
Writ e Frequently, Read Less MongoDB, Redis, Cassandra
Fault T olerant (No Single
P oint Of Failure) Riak
Concurrency Cont rol
(MVCC)
Riak, Dynamo, CouchDB,
Cassandra, Google Big T able
Concurrency Cont rol
(Locks)
MongoDB, Redis, Google Big
T able
VIII. CONCLUSIONS
The industry has been dominated by relational o r SQL
databases for several years. Ho wever, with business
situations recently having the need to store and process
large datasets for business analytics, NoSQL database
provides the answer to overcome such challenges.
NoSQL offers s chema less data store and transactions that
allo w businesses to freely add fie lds to records without
the structured require ment of defin ing the schema a priori
which is a prime constraint in SQL databases. With the
growing need to manage large data and unstructured
business transactions via avenues such as social networks,
NoSQL graphs are we ll suited for data that has comple x
relationship structures and at the same time simp licity is
achieved through key-value stores. NoSQL data models
provide options for storing unstructured d ata to be
document-oriented, key-value pairs, colu mn-oriented or
graphs. These NoSQL storage models a re easy to
understand and imple ment and do not require comp le x
SQL Versus NoSQL M ovement with Big Data Analytics 65
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
SQL optimizat ion techniques to perform Big Data
analytics. This paper has compared SQL vers us NoSQL
databases as well as the four data models of NoSQL in
the context of Big Data analytics for business situations.
We conclude that the fle xib le data modelling of NoSQL
is well suited to support dynamic scalability and
improved performance for Big Data analytics and could
be leveraged as new categories of data architectures
coexisting with traditional SQL databases.
REFERENCES
[1] Choi, Y., Jeon, W., & Yo, S. (2014), 'Imp roving Database Sy stem Performance by App ly ing NoSQL', Journal Of
Information Processing Systems, 10(3), 355-364.
[2] M oniruzzaman, A. B., & Hossain, S. A. (2013), No SQL database: New er a of d atabases for big data analy tics -
Classification, char acteristics and comp arison. International Journal o f Database Theory and
Application, 6(4), 1-14.
[3] Bazar, C., & Losif, C. (2014), 'The Transition from RDBM S to NoSQL. A Comp arative Analy sis of Three Pop ular Non-Relational Solutions: Cassandra, M ongoDB
and Couchbase', Database Systems Journal, 5(2), 49-59.
[4] Floratou, A., Teletia, N., Dewitt, D., Patel, J., & Zhang, D. (2012), 'Can the Elep hants Handle the NoSQL
Onslaught?', VLDB Endowment, 5(12), 1712-1723. [5] M ason, R. T. (2015), 'NoSQL databases and data
modelin g techniqu es for a document -oriented NoSQL
database', Proceedings of In forming Science & IT
Education Conference (InSITE) 2015, 259-268.
[6] Pothuganti, A. (2015) 'Big Data Analy tics: Hadoop -M ap Reduce & NoSQL Databases', International Journal of
Computer Scien ce and Information Technologies, 6(1),
522-527.
[7] Smolan, R. & Erwit, J. (2012), The Human face of Big Data, Against all odds p roduction, O‟Reilly , USA.
[8] Sharda, R., Delen, D., & Turban, E. (2015), Business intelligen ce and analytics: systems for decision support
(10th ed.). Up p er Saddle River, NJ: Pearson.
[9] Ohlhorst, F. (2013), Big data analytics: Turning big data into big money. Hoboken, NJ. John Wiley and Sons.
[10] Kaur, P.D., Kaur, A. & Kaur, S. (2015), „Performance Analy sis in Bigdata‟, International Journal of In formation
Technology and Computer Science (IJITCS), 7(11), 55-61.
[11] Prasad, A, & Gohil, B. (2014), 'A Co mp arative Study of NoSQL Databases', International Journal Of Advanced
Research In Comp uter Science, 5(5), 170-176.
[12] Nay ak, A., Poriy a, A. & Poojary , D. (2013), „Typ e of NOSQL Databases and its Comp arison with Relational
Databases‟, International Journal o f Applied In formation Systems (IJAIS), 5(4) Foundation of Computer Science
FCS, New York, USA.
[13] M ongoDB (2014), „Why NoSQL?‟, https://www.mongodb.com/nosql-exp lained, [Online: accessed 20-Feb-2016]
[14] Fotache, M ., & Cogean, D. (2013), 'No SQL and SQL Databases for M obile App lications. Case Study :
M ongoDB versus PostgreSQL', Informatica Economica,
17(2), 41-58. [15] Ullah, M d A. (2015), „A Digital Library for Plant
Information with Performance Comp arison between a
Relational Database and a NoSQL Database (RDF Trip le
Store)‟, Technical Library, Paper 205.
[16] Hurst, N. (2010, ‘Visual Guide to NoSQL Systems‟, http ://blog.nahurst.com/v isual- guid e-to-nosql-systems,
[Online: accessed 5-Nov-2015]
[17] Planet Cassandra ‘NoSQL Databases Defin ed and Explain ed’, http ://www.p lanetcassandra.org/what-is-
nosql,[Online: accessed 24-M ar-2016]
[18] Klein, J., Gorton, I., Ernst, N. & Donohoe, P. (2015), 'Performance Evalu ation of NoSQL Databases: A Case
Study ', Proceedings of the 1st Workshop on Performance
Analysis of Big Data Systems, PABS’15, Austin, 5-10.
[19] M ongoDB (2015), ‘Top 5 Considerations When Evaluating NoSQL Databases’,
https://s3.amazonaws.com/ info- mon godb-
com/10 gen_Top _5_NoSQL_Considerations.p df [Online:
accessed 5-Nov-2015]
[20] Zhikun, C., Shuqian g, Y., Shu an, T., Hui, Z., Li., Ge, Z.,& Huiy u, Z. (2014), „The Data Allocation Strategy Based on
Load in NoSQL Database’, Applied Mechanics and
Materials, 513-517, 1464-1469.
[21] Leavitt, N. (2010), „Will No SQL Databases Liv e Up to Their Promise?‟, IEEE Computer 43(2) , 12-14.
[22] Subramanian, S. (2012), ‘NoSQL: An Ana lysis of th e Strengths and Weaknesses’,
https://dzone.com/articles/nosql-an aly sis-strengths-and,
[Online: accessed 15-Jan-2016]
[23] Prasad B.R. & Agarwal S. (2016), 'Co mp arative Study of Big Data Comp uting and Storage Tools: A Review',
International Journal o f Database Theory and
Application 9(1), 45-66.
[24] Warden P. (2012), Big Da ta Glossary - A Guide to th e New Generation of Data Tools, O‟Reilly , USA.
[25] Zareian S., Fokaefs, M ., Khazaei H. Litoiu M . & Zhang X. (2016), 'A Big Data Framework for C loud M onitoring',
Proceedings of the 2nd In ternationa l Workshop on BIG
Data Software Engineering (BIGDSE'16), ACM Digital Library, 58-64.
[26] Ramanathan, V. & Venkatraman, S. (2015), C loud Adoption in Enterprises: Security Issues and Strateg ies,
96-121, Book Chap ter In Haider A. and Pishdad A. (Eds.),
Business Technologies in Contemp orary Organizations: Adoption, Assimilation, and Institutionalization, IGI
Global Publishers, USA.
Authors’ Profiles
Dr. Sitalakshmi Venkatraman obtained doctoral degr ee in Comp uter Science, from
National Institute of Industrial Engineer in g,
India in 1993 and M Ed from University of
Sheffield, UK in 2001. Prior to this, she had comp leted M Sc in M athematics in 1985 and
MTech in Comp uter Science in 1987, both
from Indian Institute of Technolo gy , M adras,
India. This author is a Senior M ember (SM ) of IASCIT.
In the p ast 25 y ears, Sita's work exp erience involv es both industry and academics - develop ing turnkey p rojects for IT
industry and teachin g a variety of IT courses for tertiary
institutions, in India, Sin gap ore, New Zealand, and more
recently in Australia since 2007. Sh e curr ently works as
Lecturer (Information Technolo gy ) at the School of En gineerin g, Construction & Design, M elbourne Poly technic, Australia. She
also serves as M ember of Register of Exp erts at Australia's
Tertiary Education Quality and Standards Agency (TEQSA).
Sita has p ublished eight book ch ap ters and more than 100 research p ap ers in internationally well-known refereed journals
and conferences that include Information Scien ces, Journal of
Artificial Intelligence in Engin eering, International Journal of
66 SQL Versus NoSQL M ovement with Big Data Analytics
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 12, 59-66
Business Information Systems, and Information Management &
Computer Security. She serves as Program Committee M ember
of several international confer ences and Sen ior M ember of
p rofessional societies and editorial board of three international
journals.
Kiran Fahd receiv ed the B.Eng in software
engineer in g from the National University of Emer gin g Techno lo gies, Pakistan in 2001,
and the M aster‟s degree in Enterp rise
Plannin g Sy stem - ERP from the Victoria
University , M elbourne in 2010. Since 2001,
she has worked in the cap acity of software engineer and as a teacher.
Kiran has held various lecturing p ositions in Australian and
overseas universities. She currently teaches the subjects of
Bachelor of Infor mation Technolo gy under the Software
Develop ment major at the School of En gin eerin g, Construction & Design, M elbourne Poly technic, Australia.
Dr. S amuel Kaspi earned h is PhD (Comp uter Science) from Victoria University , a M asters
of Comp uter Science from M onash University
and a Bachelor of Economics and Politics
from M onash University . He is a member of
Australian Comp uter Society (ACS) and Association for Comp uting M achinery
(ACM ).
Sam is curr ently the Information Technology Discip line
Leader and Sen ior Lecturer of IT.at the School of En gineerin g,
Construction & Design, M elbourne Poly technic, Australia. Previously , Dr Kasp i taught at Victoria University , consulted
p rivately and was the CIO of OzM iz Pty Ltd.
Sam h as been active in both teachin g and p rivate enterp rise in
the areas of software sp ecification, design and develop ment. As chief information off icer (CIO) of a small p rivate comp any he
managed the develop ment and submission of five granted and
three p ending p atents. He also managed the submission of a
successful Federal Govern ment Comet gr ant under the
Commercialisin g Emer gin g Technolo gies category . He has also had a numb er of p eer rev iewed p ublications in cludin g the
Institute of Electrical and Electronics Engineers (IEEE).
Dr. Ramanathan Venkatraman is working as M ember, Advanced
Technology App lication Practice at
National University of Singap ore. He has
served industry and academia for mor e than
32 y ears and has a wide sp ectrum of exp erien ce in the fields of IT and business
p rocess engineerin g. His current resear ch
focuses in evolvin g decision mod els for business p roblems and
more recently , he has b een contributing to frontiers of
knowledge by devising innovative ar chitectural models for ICT in domains such as Serv ice Oriented Architecture, B ig Data and
Enterp rise Cloud Comp uting. Dr Venkatraman has a strong
p ractice app roach having worked in lar ge scale IT p rojects
across Asia, US, Europ e and NZ. He has p ublished more than 20 research p ap ers in leading journ als. Ap art from research and
consulting, he also teaches adv anced technical courses for
M asters p rogram at NUS and has been a key arch itect in setting
up innovative software engineerin g and business analy tics
curriculum in the fast changing IT education scenario.
How to cite this paper: Sitalakshmi Venkatraman, Kiran Fahd,
Samu el Kasp i, Ramanathan Venkatraman, " SQL Versus
NoSQL M ovement with Big Data Analy tics", International
Journal of Information Technolo gy and Comp uter
Science(IJITCS), Vol.8, No.12, p p .59-66, 2016. DOI: 10.5815/ijitcs.2016.12.07