Big Data Management Question Paper
Key-Value Stores
Big Data Management
Phil Bartie [email protected] EM G.29
Includes material from: Alasdair Gray, HWU Aidan Hogan, Universidad de Chile
Relational Oracle MySQL MS SQL Server
PostgreSQL DB2
MS Access SQLite
Teradata
SAP Adaptive Server
FileMaker Hive
MariaDB Informix Vertica
Database Landscape
Big Data Management
3
NoSQL
DocumentMongoDB
Elasticsearch
DynamoDB CouchBase
Key-Value Redis Memcached
Riak KV Aerospike SimpleDB
Graph Neo4J Titan Giraph InfiniteGraph
RDF Virtuoso Stardog
GraphDB Blazegraph
Jena RDF4J
NewSQL SAP HANA Google Spanner Clustrix VoltDB MemSQL NuoDB
Object Caché Db4o Versant
ObjectStore
Wide-Column Cassandra HBase Accumulo
HyperTable
XML MarkLogic
Sedna Tamino
BaseX eXist-db
Virtual Machine
Big Data Management
4
http://www.macs.hw.ac.uk/~pb56/nosql.html <<VM image
If you want to play with Redis, Neo4J, MongoDB, Cassandra NoSQL, Postgres + QGIS and OpenRefine at home then you can download this VM which has them all installed.
Runs in Virtual Box which is available for Windows, Linux, macOS hosts.
https://www.virtualbox.org/wiki/Downloads. << Virtual Box
User: nosql Password: nosql
Key-Value Stores
Big Data Management
5
Key–Value Stores
Key Value Afghanistan Kabul Albania Tirana Algeria Algiers Andorra la Vella Andorra la Vella Angola Luanda Antigua and Barbuda St. John’s … ….
Big Data Management
6
It’s just a Map / Associate Array J THREE function types available: • put(key,value) • get(key) • delete(key)
How do you do an update?
Map anything!
Key Value
country:Afghanistan capital@city:Kabul,continent:Asia,pop:31108077#2011
country:Albania capital@city:Tirana,continent:Europe,pop:3011405#2013 … … city:Kabul country:Afghanistan,pop:3476000#2013 city:Tirana country:Albania,pop:3011405#2013 … … user:10239 basedIn@city:Tirana,post:{103,10430,201} … …
Big Data Management
7
Key-Value Store Model ¤Schema-less
¤ Highly flexible: insert anything
¤Data redundancy: data not normalised ¤ Lead to data inconsistencies ¤ Larger storage requirements ¤ Faster retrieval
¤Lookup efficiency: depends on key ordering ¤ Keys not always ordered ¤ Not always possible to order keys
Big Data Management
8
Document Store vs Key-Value Store
Document Store ¤Stores documents
¤Primary and secondary indexes ¤ Can index fields within
documents
¤General purpose data store ¤ Storage compression if
collections used
Key-Value ¤Stores blobs
¤Primary key index only
¤ No support for operations on contents (values)
¤High-throughput workloads
Big Data Management
9
REDIS Demo
¤One of the most popular key-value stores
Big Data Management
10
• Redis is an in-memory key-value store that persists on disk.
• Open source (BSD licensed)
• Used as a database, cache and message broker …etc
• REDIS means REmote DIctionary Server.
https://redis.io/download
Distribution
Big Data Management
11
How might a key–value store be distributed over multiple machines? ¤Hash:
¤ Random assignment ¤ Even distribution
¤Value-based: A-E, F-J, K-S, T-Z Geolocation ¤ Clumps values ¤ Supports ranges
¤Custom
What is HASHING?
¤A quick refresher….
Big Data Management
12
lConsider a 1D array
lTo find an entry (e.g. dog) you need to perform a scan across the entries… which for a large dataset this could take a while.
vancat dogcar rat
0 1 2 3 4 5
eye
car
0 1 2 3 4 5
car c 99 a 97 r 114 sum= 310 mod(310,6) = 4
Simple ASCII based hash function (also called a hashing algorithm) for an array of size 6.
cat c 99 a 97 t 116 sum= 312 mod(312,6) = 0
cat
dog d 100 o 111 g 113 sum= 314 mod(314,6) = 2
dog
eye e 101 y 121 e 101 sum= 323 mod(323,6) = 5
eye
rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3
rat
van v 118 a 97 n 110 sum= 325 mod(325,6) = 1
van
Hash Table - example
car
0 1 2 3 4 5
cat dog eyeratvan
find van (118+97+110) = 325 mod(325,6) = 1
myData = Array [1]
Often used to store KEY-VALUE PAIRS (KVP) ..for example this might be definitions of the words, or an image database
Key: van Value:
v a n
Hash Table
e.g. for numeric keys
address = mod (key,n) where n is the number of addresses available
There are many other hashing functions…. like folding...
e.g. for alphanumeric keys
address = mod ((sum of ASCII codes in key),n) where n is the number of addresses available
e.g. key: 123456789 and size of required address is 3 digits 123+ 456 + 789 = 1368 then remove the 1 or 8 to make 368 or 136
..and many many more complex ones check here: https://www.tools4noobs.com/online_tools/hash/
Hashing Algorithm ¤ A calculation applied to a key to transform it into an address
car c 99 a 97 r 114 sum= 310 mod(310,6) = 4
cat c 99 a 97 t 116 sum= 312 mod(312,6) = 0 dog d 100 o 111 g 113 sum= 314 mod(314,6) = 2
car
0 1 2 3 4 5
cat dog
boat b 98 o 111 a 97 t 116 sum= 422 mod(422,6) = 2
boat
van v 118 a 97 n 110 sum= 325 mod(325,6) = 1
van
rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3
rat
Every address open to any item This is an example of LINEAR PROBING where next available space to right used If it gets to the end of the array then it can cycle around and check from beginning
Open Addressing -
Collisions
car
0 1 2 3 4 5
cat dog boatvan rat
find rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3
MyData = Array[3] returns boat
...then Linear Probe until result matches key
Collisions slow down the search. Increasing the array size and changing the hash function appropriately can result in better performance.
Load Factor = Total Number of Items Stored / Size of the Array
Collisions
car
0
1
2
3
4
5
cat
dog boat
van
rat
CHAINING (also called ‘closed addressing’)
cat c 99 a 97 t 116 sum= 312 mod(312,6) = 0
dog d 100 o 111 g 113 sum= 314 mod(314,6) = 2
boat b 98 o 111 a 97 t 116 sum= 422 mod(422,6) = 2
van v 118 a 97 n 110 sum= 325 mod(325,6) = 1
rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3
car c 99 a 97 r 114 sum= 310 mod(310,6) = 4
bus b 98 u 117 s 115 sum= 330 mod(330,6) =0
bus
linked lists
To find an element use hash function as before, then linked list traversal Can be faster than linked probing but if load factor is low then open addressing can be more efficient
lOpen Addressing - linear probing (try next place along, then the next etc)
> can result in primary clustering (keys grouped together in the array with other regions unoccupied)
- plus 3 rehash (look every 3rd slot along)
- quadractic probing (failed attempts)2 - rapidly increase distance from original collision address
lClosed Addressing -chaining items that have collided into a linked list or other data structure
Collision Resolutions
lDetermine the address from the key
lMinimise collisions (in theory if know data in advance can design perfect hash function)
lEasy and fast to calculate with a method for collision resolution
lIdeally produce an even distribution of hash values
Objectives of Hash Function
Instead of Array locations a HASH Function can be used to determine which SERVER to use in a distributed system
Distribution
Big Data Management
21
¤Hash: ¤ Random assignment ¤ Even distribution
¤Value-based: A-E, F-J, K-S, T-Z Geolocation ¤ Clumps values ¤ Supports ranges
¤Custom
What happens if a machine joins or leaves half way through?
Consistent Hashing Avoid re-hashing everything ¤ Hash using a ring ¤ Each machine picks n psuedo-
random points on the ring ¤ Machine responsible for arc before
its point ¤ Objects mapped to ring
Big Data Management
22
Consistent Hashing: leaving Avoid re-hashing everything ¤ If a machine leaves, its range
moves to the next machine ¤ Machine becomes responsible for
data items
Example: blue leaves
Big Data Management
23
x x
x x
Consistent Hashing: joining Avoid re-hashing everything ¤ If machine joins, it picks n pseudo-
random points ¤ Becomes responsible for items on that
part of the ring
Example: blue rejoins e.g. blue temporarily lost connectivity
Big Data Management
24
x x
x x
Consistent Hashing Avoid re-hashing everything ¤ Machines m = 3 ¤ Each machine picks n = 4 psuedo-
random points on the ring ¤ Sectors = 12
Why split into sectors?
Big Data Management
25
Replication
¤A set replication factor (here 3)
¤Commonly primary / secondary replicas ¤ Primary replica elected from secondary
replicas in the case of failure of primary
Big Data Management
26
k v
k v
A1 B1 C1 D1 E1
k v
k vk v k v
DARK = primary LIGHT = secondary
Amazon Dynamo(DB)
¤High availability, scalability, performance ¤ Sacrifices consistency
¤No distinguished nodes ¤ All nodes are equal
¤Peer-to-peer: no central control
Big Data Management
27G. DeCandia, et al, “Dynamo: Amazon’s Highly Available Key-value Store,” ACM SIGOPS Oper. Syst. Rev., vol. 41, pp. 205–220, 2007.
Hashing ¤Consistent Hashing
¤ MD5 hash of key ¤ Generates 128bit number
Big Data Management
28
Replication ¤Replication factor: n
¤ Pick n-1 next buckets (different machines!)
¤ Skip sectors that are on the same machine
¤ Replicas are the machines that will pick up load in failure ¤ Already have data values
What happens if green fails?
Big Data Management
29
Object Versioning
¤Object Versioning (per bucket) ¤PUT doesn’t overwrite: pushes version ¤GET returns most recent version
Big Data Management
30
Object Versioning
¤Object Versioning (per bucket) ¤DELETE doesn’t wipe ¤GET will return not found
Big Data Management
31
Object Versioning
¤Object Versioning (per bucket) ¤GET by version
Big Data Management
32
Object Versioning
¤Object Versioning (per bucket) ¤PERMANENT DELETE by version … wiped
Big Data Management
33
Multi-Versioning Concurrency Control (MVCC)
Big Data Management
34
e.g. mySQL / PostgreSQL
/ DynamoDBe.g.
Model
¤Named table with primary key and a value
¤Primary key is hashed / unordered
Big Data Management
35
Countries Primary Key Value
Afghanistan capital:Kabul,continent:Asia,pop:31822848#2014 Albania capital:Tirana,continent:Europe,pop:3011405#2013 … …
Cities Primary Key Value
Kabul country:Afghanistan,pop:3476000#2013 Tirana country:Albania,pop:802523#2011 … …
CAP Two options for each table:
¤AP: Eventual consistency, High availability
¤CP: Strong consistency, Lower availability
Big Data Management
36
Consistency ¤Gossiping
¤ Keep alive messages sent between nodes with state
¤Quorums: ¤ N nodes responsible for a read/write ¤ Multiple nodes acknowledge read/write for success ¤ At the cost of availability!
¤Hinted Handoff ¤ For transient failures ¤ A node “covers” for another node while its down
Big Data Management
37
Eventual Consistency
¤Merkle tree is a hash tree
¤Nodes have hashes of their children
¤Leaf node hashes from data: keys or ranges
Big Data Management
38
Eventual Consistency
¤Easy to verify regions of the Map
¤Can compare level-at-a-time ¤Root the same, so nodes are consistent
Big Data Management
39
Eventual Consistency
¤Easy to verify regions of the Map
¤Can compare level-at-a-time ¤Different, compare next level down ¤Same value, try other branch
Big Data Management
40
130 hash:
[0x0011]
68 hash:
[0x0101]
90 hash:
[]
Other Key–Value Stores
Big Data Management
42
Other Key–Value Stores
Big Data Management
43
Other Key–Value Stores
Big Data Management
44
Summary ¤Hash Map
¤ GET ¤ PUT ¤ DELETE
¤Schema-less ¤Consistent Hashing
¤ Objects on a ring
¤Replication ¤ Copy to next in ring ¤ Merkle trees
¤Object Versioning ¤CAP
¤ AP: Eventual consistency
¤ CP: Strong consistency
Big Data Management
45
Reading:
¤Dynamo: Amazon’s Highly Available Key-value Store https://dl.acm.org/citation.cfm?doid=1294261.1294281
Big Data Management
46
Abstract: Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.