PPT
APACHE DATABASES
Apache Software Foundation (ASF) is a non-profit organization operating a decentralized open-source community of developers.
765 individual Members, 206 Apache Project Management Committees, and 7,600+ Committers shepherding 300 projects and 200M+ lines of Apache code valued at more than $20B.
Founded: 03/26/1999
HQ: Wakefield, MA, USA
Website: www.apache.org
Netflix uses Apache Druid to manage its 1.5 trillion-row data warehouse to manage what users see when tapping the Netflix icon or logging in from a browser across platforms;
Apple's Siri uses Apache HBase to complete full ring replication around the world in 10 seconds;
Alibaba uses Apache Flink to process more than 2.5 billion records per second for its merchandise dashboard and real-time customer recommendations;
Amazon, DataStax, IBM, Microsoft, Neo4j, NBC Universal and many others use Apache Tinkerpop in their graph databases and to write complicated traversals;
\\\\\\\\\\\\\\\\\\\\\
Datawarehouse like application on top of Map-Reduce for ad-hoc querying, summarization and data analysis from large datasets residing in distributed storage.
Uses HQL (Hive Query Language), a SQL like language.
Market position: As per Datanyze, the leader in technographics, Hive holds 9.54% of the market share and ranks 3rd compared to SAP Business Warehouse (15.58%) and Amazon Redshift (11.01%) ("Apache Hive", 2020) as shown below.
Scalability, Availability, Redundancy, Performance
Lower learning curve than Pig or MapReduce
Market position: As per Datanyze, Apache Pig holds ranks 12th and has 2.04% of the market share, compared to Apache Hadoop (21.13%) and AWS Lambda (2.47%). This is shown below. Some big names using Pig are Capital One, PayPal, JPMorgan Chase, Cigna and Aetna (HG Insights, 2020).
Data access engine that allows Hadoop users to write complex MapReduce transformations using a simple scripting language called Pig Latin
Ease of programming
Optimization opportunities
Extensibility
Free and open source, distributed, wide column store, NoSQL database management system (DBMS)
Uses CQL (Cassandra Query Language) to communicate with its databases.
Was first launched by Facebook on July 2008 as an open-source Google Code project
Market position: As per Datanyze, Cassandra holds 2.31% of the market share and ranks 11th, compared to Microsoft SQL Server (18.92%) and Amazon's NoSQL (6.53%) ("Apache Cassandra",2020). This is depicted below.
Higher Performance
Scalability and High Availability without compromising performance.
Good Load Balancing
Used by :
Netflix(2,500 nodes, 420 TB, over 1 trillion requests per day)
Comcast
eBay(over 100 nodes, 250 TB), Apple(75,000 nodes storing over 10 PB of data)
GitHub
GoDaddy
Hulu
Intuit and over 1500 more companies that have large, active data sets.
Market position: Datanyze calculates market share by taking the number of websites using a technology and dividing it by the total websites using any technology in the same category. As per their survey, HBase ranks 14th in the market with 1.77% of the market share as compared to Microsoft SQL Server occupying 18.95% and MySQL capturing 15.99% of the market. Prominent names using HBase are Apple, AT&T, JPMorgan Chase and Lockheed Martin (HG Insights, 2020).
It is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable
Provides random, real-time read/write access to any firm's Big Data on top of Hadoop and HDFS
Linear and modular scalability
Stores denormalized data
Consistent reads and writes with automatic and configurable sharding of tables
Load Balancing
Pre-programmed failover support between Region Servers
Convenience as it provides easy to use Java API and supports random real-time CRUD operations.