Big Data Management Question Paper

profileMagnum_1993
2.1KeyValueStores2.pdf

Key-Value Stores

Big Data Management

Phil Bartie [email protected] EM G.29

Includes material from: Alasdair Gray, HWU Aidan Hogan, Universidad de Chile

Relational Oracle MySQL MS SQL Server

PostgreSQL DB2

MS Access SQLite

Teradata

SAP Adaptive Server

FileMaker Hive

MariaDB Informix Vertica

Database Landscape

Big Data Management

3

NoSQL

DocumentMongoDB

Elasticsearch

DynamoDB CouchBase

Key-Value Redis Memcached

Riak KV Aerospike SimpleDB

Graph Neo4J Titan Giraph InfiniteGraph

RDF Virtuoso Stardog

GraphDB Blazegraph

Jena RDF4J

NewSQL SAP HANA Google Spanner Clustrix VoltDB MemSQL NuoDB

Object Caché Db4o Versant

ObjectStore

Wide-Column Cassandra HBase Accumulo

HyperTable

XML MarkLogic

Sedna Tamino

BaseX eXist-db

Virtual Machine

Big Data Management

4

http://www.macs.hw.ac.uk/~pb56/nosql.html <<VM image

If you want to play with Redis, Neo4J, MongoDB, Cassandra NoSQL, Postgres + QGIS and OpenRefine at home then you can download this VM which has them all installed.

Runs in Virtual Box which is available for Windows, Linux, macOS hosts.

https://www.virtualbox.org/wiki/Downloads. << Virtual Box

User: nosql Password: nosql

Key-Value Stores

Big Data Management

5

Key–Value Stores

Key Value Afghanistan Kabul Albania Tirana Algeria Algiers Andorra la Vella Andorra la Vella Angola Luanda Antigua and Barbuda St. John’s … ….

Big Data Management

6

It’s just a Map / Associate Array J THREE function types available: • put(key,value) • get(key) • delete(key)

How do you do an update?

Map anything!

Key Value

country:Afghanistan capital@city:Kabul,continent:Asia,pop:31108077#2011

country:Albania capital@city:Tirana,continent:Europe,pop:3011405#2013 … … city:Kabul country:Afghanistan,pop:3476000#2013 city:Tirana country:Albania,pop:3011405#2013 … … user:10239 basedIn@city:Tirana,post:{103,10430,201} … …

Big Data Management

7

Key-Value Store Model ¤Schema-less

¤ Highly flexible: insert anything

¤Data redundancy: data not normalised ¤ Lead to data inconsistencies ¤ Larger storage requirements ¤ Faster retrieval

¤Lookup efficiency: depends on key ordering ¤ Keys not always ordered ¤ Not always possible to order keys

Big Data Management

8

Document Store vs Key-Value Store

Document Store ¤Stores documents

¤Primary and secondary indexes ¤ Can index fields within

documents

¤General purpose data store ¤ Storage compression if

collections used

Key-Value ¤Stores blobs

¤Primary key index only

¤ No support for operations on contents (values)

¤High-throughput workloads

Big Data Management

9

REDIS Demo

¤One of the most popular key-value stores

Big Data Management

10

• Redis is an in-memory key-value store that persists on disk.

• Open source (BSD licensed)

• Used as a database, cache and message broker …etc

• REDIS means REmote DIctionary Server.

https://redis.io/download

Distribution

Big Data Management

11

How might a key–value store be distributed over multiple machines? ¤Hash:

¤ Random assignment ¤ Even distribution

¤Value-based: A-E, F-J, K-S, T-Z Geolocation ¤ Clumps values ¤ Supports ranges

¤Custom

What is HASHING?

¤A quick refresher….

Big Data Management

12

lConsider a 1D array

lTo find an entry (e.g. dog) you need to perform a scan across the entries… which for a large dataset this could take a while.

vancat dogcar rat

0 1 2 3 4 5

eye

car

0 1 2 3 4 5

car c 99 a 97 r 114 sum= 310 mod(310,6) = 4

Simple ASCII based hash function (also called a hashing algorithm) for an array of size 6.

cat c 99 a 97 t 116 sum= 312 mod(312,6) = 0

cat

dog d 100 o 111 g 113 sum= 314 mod(314,6) = 2

dog

eye e 101 y 121 e 101 sum= 323 mod(323,6) = 5

eye

rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3

rat

van v 118 a 97 n 110 sum= 325 mod(325,6) = 1

van

Hash Table - example

car

0 1 2 3 4 5

cat dog eyeratvan

find van (118+97+110) = 325 mod(325,6) = 1

myData = Array [1]

Often used to store KEY-VALUE PAIRS (KVP) ..for example this might be definitions of the words, or an image database

Key: van Value:

v a n

Hash Table

e.g. for numeric keys

address = mod (key,n) where n is the number of addresses available

There are many other hashing functions…. like folding...

e.g. for alphanumeric keys

address = mod ((sum of ASCII codes in key),n) where n is the number of addresses available

e.g. key: 123456789 and size of required address is 3 digits 123+ 456 + 789 = 1368 then remove the 1 or 8 to make 368 or 136

..and many many more complex ones check here: https://www.tools4noobs.com/online_tools/hash/

Hashing Algorithm ¤ A calculation applied to a key to transform it into an address

car c 99 a 97 r 114 sum= 310 mod(310,6) = 4

cat c 99 a 97 t 116 sum= 312 mod(312,6) = 0 dog d 100 o 111 g 113 sum= 314 mod(314,6) = 2

car

0 1 2 3 4 5

cat dog

boat b 98 o 111 a 97 t 116 sum= 422 mod(422,6) = 2

boat

van v 118 a 97 n 110 sum= 325 mod(325,6) = 1

van

rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3

rat

Every address open to any item This is an example of LINEAR PROBING where next available space to right used If it gets to the end of the array then it can cycle around and check from beginning

Open Addressing -

Collisions

car

0 1 2 3 4 5

cat dog boatvan rat

find rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3

MyData = Array[3] returns boat

...then Linear Probe until result matches key

Collisions slow down the search. Increasing the array size and changing the hash function appropriately can result in better performance.

Load Factor = Total Number of Items Stored / Size of the Array

Collisions

car

0

1

2

3

4

5

cat

dog boat

van

rat

CHAINING (also called ‘closed addressing’)

cat c 99 a 97 t 116 sum= 312 mod(312,6) = 0

dog d 100 o 111 g 113 sum= 314 mod(314,6) = 2

boat b 98 o 111 a 97 t 116 sum= 422 mod(422,6) = 2

van v 118 a 97 n 110 sum= 325 mod(325,6) = 1

rat r 114 a 97 t 116 sum= 327 mod(327,6) = 3

car c 99 a 97 r 114 sum= 310 mod(310,6) = 4

bus b 98 u 117 s 115 sum= 330 mod(330,6) =0

bus

linked lists

To find an element use hash function as before, then linked list traversal Can be faster than linked probing but if load factor is low then open addressing can be more efficient

lOpen Addressing - linear probing (try next place along, then the next etc)

> can result in primary clustering (keys grouped together in the array with other regions unoccupied)

- plus 3 rehash (look every 3rd slot along)

- quadractic probing (failed attempts)2 - rapidly increase distance from original collision address

lClosed Addressing -chaining items that have collided into a linked list or other data structure

Collision Resolutions

lDetermine the address from the key

lMinimise collisions (in theory if know data in advance can design perfect hash function)

lEasy and fast to calculate with a method for collision resolution

lIdeally produce an even distribution of hash values

Objectives of Hash Function

Instead of Array locations a HASH Function can be used to determine which SERVER to use in a distributed system

Distribution

Big Data Management

21

¤Hash: ¤ Random assignment ¤ Even distribution

¤Value-based: A-E, F-J, K-S, T-Z Geolocation ¤ Clumps values ¤ Supports ranges

¤Custom

What happens if a machine joins or leaves half way through?

Consistent Hashing Avoid re-hashing everything ¤ Hash using a ring ¤ Each machine picks n psuedo-

random points on the ring ¤ Machine responsible for arc before

its point ¤ Objects mapped to ring

Big Data Management

22

Consistent Hashing: leaving Avoid re-hashing everything ¤ If a machine leaves, its range

moves to the next machine ¤ Machine becomes responsible for

data items

Example: blue leaves

Big Data Management

23

x x

x x

Consistent Hashing: joining Avoid re-hashing everything ¤ If machine joins, it picks n pseudo-

random points ¤ Becomes responsible for items on that

part of the ring

Example: blue rejoins e.g. blue temporarily lost connectivity

Big Data Management

24

x x

x x

Consistent Hashing Avoid re-hashing everything ¤ Machines m = 3 ¤ Each machine picks n = 4 psuedo-

random points on the ring ¤ Sectors = 12

Why split into sectors?

Big Data Management

25

Replication

¤A set replication factor (here 3)

¤Commonly primary / secondary replicas ¤ Primary replica elected from secondary

replicas in the case of failure of primary

Big Data Management

26

k v

k v

A1 B1 C1 D1 E1

k v

k vk v k v

DARK = primary LIGHT = secondary

Amazon Dynamo(DB)

¤High availability, scalability, performance ¤ Sacrifices consistency

¤No distinguished nodes ¤ All nodes are equal

¤Peer-to-peer: no central control

Big Data Management

27G. DeCandia, et al, “Dynamo: Amazon’s Highly Available Key-value Store,” ACM SIGOPS Oper. Syst. Rev., vol. 41, pp. 205–220, 2007.

Hashing ¤Consistent Hashing

¤ MD5 hash of key ¤ Generates 128bit number

Big Data Management

28

Replication ¤Replication factor: n

¤ Pick n-1 next buckets (different machines!)

¤ Skip sectors that are on the same machine

¤ Replicas are the machines that will pick up load in failure ¤ Already have data values

What happens if green fails?

Big Data Management

29

Object Versioning

¤Object Versioning (per bucket) ¤PUT doesn’t overwrite: pushes version ¤GET returns most recent version

Big Data Management

30

Object Versioning

¤Object Versioning (per bucket) ¤DELETE doesn’t wipe ¤GET will return not found

Big Data Management

31

Object Versioning

¤Object Versioning (per bucket) ¤GET by version

Big Data Management

32

Object Versioning

¤Object Versioning (per bucket) ¤PERMANENT DELETE by version … wiped

Big Data Management

33

Multi-Versioning Concurrency Control (MVCC)

Big Data Management

34

e.g. mySQL / PostgreSQL

/ DynamoDBe.g.

Model

¤Named table with primary key and a value

¤Primary key is hashed / unordered

Big Data Management

35

Countries Primary Key Value

Afghanistan capital:Kabul,continent:Asia,pop:31822848#2014 Albania capital:Tirana,continent:Europe,pop:3011405#2013 … …

Cities Primary Key Value

Kabul country:Afghanistan,pop:3476000#2013 Tirana country:Albania,pop:802523#2011 … …

CAP Two options for each table:

¤AP: Eventual consistency, High availability

¤CP: Strong consistency, Lower availability

Big Data Management

36

Consistency ¤Gossiping

¤ Keep alive messages sent between nodes with state

¤Quorums: ¤ N nodes responsible for a read/write ¤ Multiple nodes acknowledge read/write for success ¤ At the cost of availability!

¤Hinted Handoff ¤ For transient failures ¤ A node “covers” for another node while its down

Big Data Management

37

Eventual Consistency

¤Merkle tree is a hash tree

¤Nodes have hashes of their children

¤Leaf node hashes from data: keys or ranges

Big Data Management

38

Eventual Consistency

¤Easy to verify regions of the Map

¤Can compare level-at-a-time ¤Root the same, so nodes are consistent

Big Data Management

39

Eventual Consistency

¤Easy to verify regions of the Map

¤Can compare level-at-a-time ¤Different, compare next level down ¤Same value, try other branch

Big Data Management

40

130 hash:

[0x0011]

68 hash:

[0x0101]

90 hash:

[]

Other Key–Value Stores

Big Data Management

42

Other Key–Value Stores

Big Data Management

43

Other Key–Value Stores

Big Data Management

44

Summary ¤Hash Map

¤ GET ¤ PUT ¤ DELETE

¤Schema-less ¤Consistent Hashing

¤ Objects on a ring

¤Replication ¤ Copy to next in ring ¤ Merkle trees

¤Object Versioning ¤CAP

¤ AP: Eventual consistency

¤ CP: Strong consistency

Big Data Management

45

Reading:

¤Dynamo: Amazon’s Highly Available Key-value Store https://dl.acm.org/citation.cfm?doid=1294261.1294281

Big Data Management

46

Abstract: Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.