Big Data Management Question Paper

Magnum_1993
4.1GraphStores.pptx.pdf

Graph Databases Big Data Management

Phil Bartie phil.bartie@hw.ac.uk EM G.29

Based on material from: Alasdair Gray, Heriot-Watt University

Relational Oracle MySQL MS SQL Server

PostgreSQL DB2

MS Access SQLite

Teradata

SAP Adaptive Server

FileMaker Hive

MariaDB Informix Vertica

Database Landscape

Big Data Management

4

NoSQL

DocumentMongoDB

Elasticsearch

DynamoDB CouchBase

Key-Value Redis Memcached

Riak KV Aerospike SimpleDB

Graph Neo4J Titan Giraph InfiniteGraph

RDF Virtuoso Stardog

GraphDB Blazegraph

Jena RDF4J

NewSQL SAP HANA Google Spanner Clustrix VoltDB MemSQL NuoDB

Object Caché Db4o Versant

ObjectStore

Wide-Column Cassandra HBase Accumulo

HyperTable

XML MarkLogic

Sedna Tamino

BaseX eXist-db

Outline

¤Examples of graphs

¤Theory of graphs

¤Storing data graphs

¤Neo4j

¤Querying a graph store

Big Data Management

5

Graphs

These are charts, not graphs!

A Simple Graph

A node (vertex) represents an entity

Nodes are connected by relationships (edges)

Relationships can have no direction, one direction or two directions

Can you think of use cases in which graphs feature?

Facebook

Big Data Management

12

London Underground

Big Data Management

13

http://www.animalsontheunderground.com

Network Diagram

Big Data Management

20http://www.conceptdraw.com/How-To-Guide/picture/Network-diagram-System-design.png

Black Mirror - Bandersnatch (Netflix)

Big Data Management

21Graph Source: Asim Chaudhry @AsimC86

6 degrees of Kevin Bacon ¤Can all actors be linked to Kevin Bacon in under 6

hops? http://oracleofbacon.org/

Facebook claims 3.74 degrees of freedom between ‘strangers’ (Nov 2016)

Big Data Management

22

http://www.classtools.net/blog/wp-content/uploads/2015/04/bacon.jpg

https://research.fb.com/blog/2016/02/th ree-and-a-half-degrees-of-separation/

What is a graph?

¤Single relationship ¤ Set of Vertices (nodes):

¤ No properties other than a name ¤ Set of Edges (relationships):

¤ No properties ¤ Directed or undirected ¤ May be weighted

¤Operations ¤ Adjacency ¤ Neighbours

¤Algorithms: ¤ Path Finding ¤ Least cost path ¤ Centrality ¤ Degree (indegree,outdegree for directed graphs)

Big Data Management

23

G = (V,E)

Degree (number shows connections per node)

Adjacency: Example

Tube map

Transport for London This diagram is an evolution of the original design conceived in 1931 by Harry Beck · Correct at time of going to print · Poster December 2014

Station in both fare zones Bank/Waterloo Waterloo & City line open between Bank and Waterloo 0621-0030 Mondays to Fridays and 0802-0030 Saturdays. Between Waterloo and Bank 0615-0030 Mondays to Fridays and 0800-0030 Saturdays. Closed Sundays and public holidays. Camden Town Open Sundays 1300 -1730 for interchange and exit only. Covent Garden Exit only from late February until early November 2015. Also, on Saturdays and Sundays, westbound trains will not stop. Please use Leicester Square instead. Emirates Air Line Emirates Greenwich Peninsula and Emirates Royal Docks For full information about operating times and fares please visit tfl.gov.uk/emiratesairline Hounslow West Step-free access for manual wheelchairs only. Stanmore Step-free access via a steep ramp. Tottenham Court Road Central line trains will not stop at this station from early January until early December 2015. Turnham Green Served by Piccadilly line trains until 0650 Mondays to Saturdays, 0745 Sundays and after 2230 every evening. At other times use District line. West Ham Until late February 2015 step-free access will only be to the DLR and c2c. West India Quay Not served by DLR trains from Bank towards Lewisham at certain times.

Moor ParkMetropolitan

Victoria

Circle Central Bakerloo

DLR

London Overground

Piccadilly

Waterloo & City

Jubilee

Hammersmith & City

Northern

District District open weekends, public holidays and some Olympia events

Emirates Air Line

National Rail

Riverboat services

Airport

Tramlink

Interchange stations

Step-free access from street to platform

Step-free access from street to train

Emirates Air Line

River Thames

2 2

2 2

2

5

8 8 6

2

4

4

6 5

41

3

2

4 3

3

33 1

1

3

3

59 7 7 Special fares apply

5

5

4

4

46

Chorleywood

Mill Hill East

Rickmansworth

Perivale

Kentish Town West

Camden Road

Dalston Kingsland

Wanstead Park

Hanger Lane

Edgware

Burnt Oak

Colindale

Hendon Central

Brent Cross

Golders Green

West Silvertown

Emirates Royal Docks

Emirates Greenwich Peninsula Pontoon Dock

King George V

Hampstead

Belsize Park

Chalk Farm

Chesham

Amersham

Chalfont & Latimer

Moor Park

Northwood Northwood Hills

Pinner

North Harrow

Custom House for ExCeL

Prince Regent

Royal Albert

Beckton Park

Cyprus

Gallions Reach

Beckton

Watford

Croxley

Fulham Broadway

Lambeth North

Heathrow Terminal 4

Kensal Rise

Bethnal Green

Westferry Blackwall

Brondesbury Park

Hampstead Heath

Harringay Green Lanes

Leytonstone High Road

Leyton Midland Road

Hackney Central

Northwick Park

Preston Road

Royal Victoria

Wembley Park

Rayners Lane

Watford High Street

Ruislip Gardens

South Ruislip

Greenford

Northolt

South Harrow

Sudbury Hill

Sudbury Town

Alperton

Pimlico

Park Royal

North Ealing

Acton Central

South Acton

Ealing Broadway

West Ruislip

Carpenders Park

Hatch End

North Wembley

Ealing Common

South Kenton

Kenton

Kensal Green

Queen’s Park

Gunnersbury

Kew Gardens

Stockwell

Bow Church

Stonebridge Park

Harlesden

Camden Town

Willesden Junction

Headstone Lane

Parsons Green

Putney Bridge

East Putney

Southfields

Wimbledon Park

Island Gardens

Deptford Bridge

South Quay

Crossharbour

Mudchute

Heron Quays

West India Quay

Elverson Road

Oakwood

Cockfosters

Southgate

Arnos Grove

Bounds Green

Theydon Bois

Epping

Debden

Loughton

Buckhurst Hill

Walthamstow Queen’s Road

Woodgrange Park

Leytonstone

Leyton

Wood Green

Turnpike Lane

Manor House

Stanmore

Canons Park

Queensbury

Kingsbury

High Barnet

Totteridge & Whetstone

Woodside Park

West Finchley

Finchley Central Woodford

South Woodford

Snaresbrook

Hainault

Fairlop

Barkingside

Newbury Park

East Finchley

Highgate

Archway

Devons Road

Langdon Park

All Saints

Tufnell Park Neasden

Dollis Hill

Willesden Green

South Tottenham

Swiss Cottage

Kilburn

Blackhorse Road

Acton Town

Canning Town

Finchley Road

Canary Wharf

Stepney Green

East Ham

Plaistow

Upton Park

Poplar

Upper Holloway

Pudding Mill Lane

Kennington

Borough

Elm Park Dagenham

East

Dagenham Heathway

Becontree

Upney

Heathrow Terminal 5

Finchley Road & Frognal

Crouch Hill

Northfields

Boston Manor

South Ealing

Osterley

Hounslow Central

Hounslow East

Clapham North

Clapham High Street

Oval

Clapham Common

Clapham South

Tooting Bec

Tooting Broadway

Colliers Wood

South Wimbledon

Arsenal

Holloway Road

Caledonian Road

Morden

Hounslow West

Hatton Cross

Heathrow Terminals 1, 2, 3

West Harrow

Brondesbury Caledonian Road &

Barnsbury

Hackney Wick

Homerton

West Acton

East India

Chiswick Park

Roding Valley

Grange Hill

Chigwell

Redbridge

Gants Hill

Wanstead

Ickenham

Turnham Green

Uxbridge

Hillingdon Ruislip

Gospel Oak

Mile End

Bow Road

Bromley- by-Bow

Upminster Bridge

Hornchurch

London City Airport

Richmond

West Brompton

Clapham Junction

Balham

Norwood Junction

Forest Hill

Anerley

Penge West

Honor Oak Park

Brockley

Brixton

Elephant & Castle

Wembley Central

Harrow & Wealdstone

Bushey

Watford Junction

Kentish Town Highbury & Islington

Finsbury Park

Seven Sisters

Tottenham Hale

Walthamstow Central

Stratford

Upminster

Barking

West Ham

Limehouse

Woolwich Arsenal

Wimbledon

West Croydon

Cutty Sark for Maritime Greenwich

Ruislip Manor

Eastcote

West Hampstead

Wapping

New Cross Gate

Sydenham

New Cross

Crystal Palace

Queens Road Peckham

Peckham Rye

Denmark Hill

Canada Water

Surrey Quays

Whitechapel

Greenwich

Lewisham

Kilburn Park

Regent’s Park

Kilburn High Road

South Hampstead

Goodge Street

Shepherd’s Bush Market

Goldhawk Road

Bayswater

Warren Street

Aldgate

Barbican Russell Square

Mornington Crescent

High Street Kensington

St. John’s Wood

Green Park

Baker Street

Notting Hill Gate

Aldgate East

Mansion House

Oxford Circus

Bond Street

Piccadilly Circus

Holborn

Tower Gateway

Monument

Leicester Square

St. Paul’s

Hyde Park Corner

Knightsbridge

Stamford Brook

Ravenscourt Park

West Kensington

North Acton

Holland Park

Angel

Queensway Marble Arch

South Kensington

Sloane Square

Wandsworth Road

Covent Garden

Great Portland

Street

Bank

East Acton

Chancery Lane

Lancaster Gate

Kensington (Olympia)

Marylebone

Victoria

Charing Cross

Cannon Street

Farringdon

Euston

Westminster London Bridge

Liverpool Street

Stratford International

Old Street

Moorgate

Warwick Avenue Maida Vale

Tower Hill

Fenchurch Street

Paddington

Barons Court

Gloucester Road St. James’s

Park Temple

Latimer Road Ladbroke Grove

Royal Oak

Westbourne Park

Bermondsey

Rotherhithe

Shoreditch High Street

Dalston Junction

Haggerston

Hoxton

Wood Lane

White City

Shepherd’s Bush

King’s Cross St. Pancras

Euston SquareEdgware

Road

Southwark

Embankment

Stratford High Street

Abbey Road

Star Lane

Vauxhall

Blackfriars

Waterloo

Tottenham Court Road

Canonbury

Shadwell

Earl’s Court

North Greenwich

Hammersmith

Edgware Road

Harrow- on-the-Hill

Imperial Wharf

Big Data Management

24

Marble Arch is adjacent (connected) to: • Bond Street • Lancaster Gate

Neighbourhood: Example

Big Data Management

25

Tube map

Transport for London This diagram is an evolution of the original design conceived in 1931 by Harry Beck · Correct at time of going to print · Poster December 2014

Station in both fare zones Bank/Waterloo Waterloo & City line open between Bank and Waterloo 0621-0030 Mondays to Fridays and 0802-0030 Saturdays. Between Waterloo and Bank 0615-0030 Mondays to Fridays and 0800-0030 Saturdays. Closed Sundays and public holidays. Camden Town Open Sundays 1300 -1730 for interchange and exit only. Covent Garden Exit only from late February until early November 2015. Also, on Saturdays and Sundays, westbound trains will not stop. Please use Leicester Square instead. Emirates Air Line Emirates Greenwich Peninsula and Emirates Royal Docks For full information about operating times and fares please visit tfl.gov.uk/emiratesairline Hounslow West Step-free access for manual wheelchairs only. Stanmore Step-free access via a steep ramp. Tottenham Court Road Central line trains will not stop at this station from early January until early December 2015. Turnham Green Served by Piccadilly line trains until 0650 Mondays to Saturdays, 0745 Sundays and after 2230 every evening. At other times use District line. West Ham Until late February 2015 step-free access will only be to the DLR and c2c. West India Quay Not served by DLR trains from Bank towards Lewisham at certain times.

Moor ParkMetropolitan

Victoria

Circle Central Bakerloo

DLR

London Overground

Piccadilly

Waterloo & City

Jubilee

Hammersmith & City

Northern

District District open weekends, public holidays and some Olympia events

Emirates Air Line

National Rail

Riverboat services

Airport

Tramlink

Interchange stations

Step-free access from street to platform

Step-free access from street to train

Emirates Air Line

River Thames

2 2

2 2

2

5

8 8 6

2

4

4

6 5

41

3

2

4 3

3

33 1

1

3

3

59 7 7 Special fares apply

5

5

4

4

46

Chorleywood

Mill Hill East

Rickmansworth

Perivale

Kentish Town West

Camden Road

Dalston Kingsland

Wanstead Park

Hanger Lane

Edgware

Burnt Oak

Colindale

Hendon Central

Brent Cross

Golders Green

West Silvertown

Emirates Royal Docks

Emirates Greenwich Peninsula Pontoon Dock

King George V

Hampstead

Belsize Park

Chalk Farm

Chesham

Amersham

Chalfont & Latimer

Moor Park

Northwood Northwood Hills

Pinner

North Harrow

Custom House for ExCeL

Prince Regent

Royal Albert

Beckton Park

Cyprus

Gallions Reach

Beckton

Watford

Croxley

Fulham Broadway

Lambeth North

Heathrow Terminal 4

Kensal Rise

Bethnal Green

Westferry Blackwall

Brondesbury Park

Hampstead Heath

Harringay Green Lanes

Leytonstone High Road

Leyton Midland Road

Hackney Central

Northwick Park

Preston Road

Royal Victoria

Wembley Park

Rayners Lane

Watford High Street

Ruislip Gardens

South Ruislip

Greenford

Northolt

South Harrow

Sudbury Hill

Sudbury Town

Alperton

Pimlico

Park Royal

North Ealing

Acton Central

South Acton

Ealing Broadway

West Ruislip

Carpenders Park

Hatch End

North Wembley

Ealing Common

South Kenton

Kenton

Kensal Green

Queen’s Park

Gunnersbury

Kew Gardens

Stockwell

Bow Church

Stonebridge Park

Harlesden

Camden Town

Willesden Junction

Headstone Lane

Parsons Green

Putney Bridge

East Putney

Southfields

Wimbledon Park

Island Gardens

Deptford Bridge

South Quay

Crossharbour

Mudchute

Heron Quays

West India Quay

Elverson Road

Oakwood

Cockfosters

Southgate

Arnos Grove

Bounds Green

Theydon Bois

Epping

Debden

Loughton

Buckhurst Hill

Walthamstow Queen’s Road

Woodgrange Park

Leytonstone

Leyton

Wood Green

Turnpike Lane

Manor House

Stanmore

Canons Park

Queensbury

Kingsbury

High Barnet

Totteridge & Whetstone

Woodside Park

West Finchley

Finchley Central Woodford

South Woodford

Snaresbrook

Hainault

Fairlop

Barkingside

Newbury Park

East Finchley

Highgate

Archway

Devons Road

Langdon Park

All Saints

Tufnell Park Neasden

Dollis Hill

Willesden Green

South Tottenham

Swiss Cottage

Kilburn

Blackhorse Road

Acton Town

Canning Town

Finchley Road

Canary Wharf

Stepney Green

East Ham

Plaistow

Upton Park

Poplar

Upper Holloway

Pudding Mill Lane

Kennington

Borough

Elm Park Dagenham

East

Dagenham Heathway

Becontree

Upney

Heathrow Terminal 5

Finchley Road & Frognal

Crouch Hill

Northfields

Boston Manor

South Ealing

Osterley

Hounslow Central

Hounslow East

Clapham North

Clapham High Street

Oval

Clapham Common

Clapham South

Tooting Bec

Tooting Broadway

Colliers Wood

South Wimbledon

Arsenal

Holloway Road

Caledonian Road

Morden

Hounslow West

Hatton Cross

Heathrow Terminals 1, 2, 3

West Harrow

Brondesbury Caledonian Road &

Barnsbury

Hackney Wick

Homerton

West Acton

East India

Chiswick Park

Roding Valley

Grange Hill

Chigwell

Redbridge

Gants Hill

Wanstead

Ickenham

Turnham Green

Uxbridge

Hillingdon Ruislip

Gospel Oak

Mile End

Bow Road

Bromley- by-Bow

Upminster Bridge

Hornchurch

London City Airport

Richmond

West Brompton

Clapham Junction

Balham

Norwood Junction

Forest Hill

Anerley

Penge West

Honor Oak Park

Brockley

Brixton

Elephant & Castle

Wembley Central

Harrow & Wealdstone

Bushey

Watford Junction

Kentish Town Highbury & Islington

Finsbury Park

Seven Sisters

Tottenham Hale

Walthamstow Central

Stratford

Upminster

Barking

West Ham

Limehouse

Woolwich Arsenal

Wimbledon

West Croydon

Cutty Sark for Maritime Greenwich

Ruislip Manor

Eastcote

West Hampstead

Wapping

New Cross Gate

Sydenham

New Cross

Crystal Palace

Queens Road Peckham

Peckham Rye

Denmark Hill

Canada Water

Surrey Quays

Whitechapel

Greenwich

Lewisham

Kilburn Park

Regent’s Park

Kilburn High Road

South Hampstead

Goodge Street

Shepherd’s Bush Market

Goldhawk Road

Bayswater

Warren Street

Aldgate

Barbican Russell Square

Mornington Crescent

High Street Kensington

St. John’s Wood

Green Park

Baker Street

Notting Hill Gate

Aldgate East

Mansion House

Oxford Circus

Bond Street

Piccadilly Circus

Holborn

Tower Gateway

Monument

Leicester Square

St. Paul’s

Hyde Park Corner

Knightsbridge

Stamford Brook

Ravenscourt Park

West Kensington

North Acton

Holland Park

Angel

Queensway Marble Arch

South Kensington

Sloane Square

Wandsworth Road

Covent Garden

Great Portland

Street

Bank

East Acton

Chancery Lane

Lancaster Gate

Kensington (Olympia)

Marylebone

Victoria

Charing Cross

Cannon Street

Farringdon

Euston

Westminster London Bridge

Liverpool Street

Stratford International

Old Street

Moorgate

Warwick Avenue Maida Vale

Tower Hill

Fenchurch Street

Paddington

Barons Court

Gloucester Road St. James’s

Park Temple

Latimer Road Ladbroke Grove

Royal Oak

Westbourne Park

Bermondsey

Rotherhithe

Shoreditch High Street

Dalston Junction

Haggerston

Hoxton

Wood Lane

White City

Shepherd’s Bush

King’s Cross St. Pancras

Euston SquareEdgware

Road

Southwark

Embankment

Stratford High Street

Abbey Road

Star Lane

Vauxhall

Blackfriars

Waterloo

Tottenham Court Road

Canonbury

Shadwell

Earl’s Court

North Greenwich

Hammersmith

Edgware Road

Harrow- on-the-Hill

Imperial Wharf

Marble Arch 2 step neighbourhood: • Bond Street • Baker Street • Oxford Circus • Green Park • Lancaster Gate • Queensway

7 Bridges of Königsberg

Big Data Management

26

Can you walk a route that crosses every bridge only once?

https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html

Only if the number of

nodes with an odd degree

is 0 or 2

7 Bridges of Königsberg

Big Data Management

27https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html

A route visiting every edge once is known as a Euler path

degree A(3) B(3) C(5) D(3) Nodes with ODD connections is 4 so Euler path not possible

(If removed edge t) degree A(3) B(3) C(4) D(2) 2 ODD = Euler path possible

Which of these have Euler Paths?

Big Data Management

28

Only possible if number of nodes with odd

degree is 0 or 2

https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html

Applications ¤Social Networks:

¤ Friendship relationships ¤ Cliques ¤ Centres

¤Transportation: ¤ Shortest route ¤ Intercity air links ¤ Inter-transport ¤ Travelling Salesman

¤Technology ¤ Network links ¤ Software dependencies

¤Science ¤ Biological pathways ¤ Chemistry ¤ Electrical networks

¤Movies ¤ Actors, films, ¤ Locations

Big Data Management

29

Multi-Relational Graph

¤Multi-Relationship ¤ Set of Vertices: no properties ¤ Sets of Edges: may be

weighted

¤No properties on edges

Big Data Management

30

G = (V,E) E=(E0,E1,...,Em ⊆ (V xV))

http://www.w3.org/TR/rdf11-primer/

London Underground

Big Data Management

31

Tube map

Transport for London This diagram is an evolution of the original design conceived in 1931 by Harry Beck · Correct at time of going to print · Poster December 2014

Station in both fare zones Bank/Waterloo Waterloo & City line open between Bank and Waterloo 0621-0030 Mondays to Fridays and 0802-0030 Saturdays. Between Waterloo and Bank 0615-0030 Mondays to Fridays and 0800-0030 Saturdays. Closed Sundays and public holidays. Camden Town Open Sundays 1300 -1730 for interchange and exit only. Covent Garden Exit only from late February until early November 2015. Also, on Saturdays and Sundays, westbound trains will not stop. Please use Leicester Square instead. Emirates Air Line Emirates Greenwich Peninsula and Emirates Royal Docks For full information about operating times and fares please visit tfl.gov.uk/emiratesairline Hounslow West Step-free access for manual wheelchairs only. Stanmore Step-free access via a steep ramp. Tottenham Court Road Central line trains will not stop at this station from early January until early December 2015. Turnham Green Served by Piccadilly line trains until 0650 Mondays to Saturdays, 0745 Sundays and after 2230 every evening. At other times use District line. West Ham Until late February 2015 step-free access will only be to the DLR and c2c. West India Quay Not served by DLR trains from Bank towards Lewisham at certain times.

Moor ParkMetropolitan

Victoria

Circle Central Bakerloo

DLR

London Overground

Piccadilly

Waterloo & City

Jubilee

Hammersmith & City

Northern

District District open weekends, public holidays and some Olympia events

Emirates Air Line

National Rail

Riverboat services

Airport

Tramlink

Interchange stations

Step-free access from street to platform

Step-free access from street to train

Emirates Air Line

River Thames

2 2

2 2

2

5

8 8 6

2

4

4

6 5

41

3

2

4 3

3

33 1

1

3

3

59 7 7 Special fares apply

5

5

4

4

46

Chorleywood

Mill Hill East

Rickmansworth

Perivale

Kentish Town West

Camden Road

Dalston Kingsland

Wanstead Park

Hanger Lane

Edgware

Burnt Oak

Colindale

Hendon Central

Brent Cross

Golders Green

West Silvertown

Emirates Royal Docks

Emirates Greenwich Peninsula Pontoon Dock

King George V

Hampstead

Belsize Park

Chalk Farm

Chesham

Amersham

Chalfont & Latimer

Moor Park

Northwood Northwood Hills

Pinner

North Harrow

Custom House for ExCeL

Prince Regent

Royal Albert

Beckton Park

Cyprus

Gallions Reach

Beckton

Watford

Croxley

Fulham Broadway

Lambeth North

Heathrow Terminal 4

Kensal Rise

Bethnal Green

Westferry Blackwall

Brondesbury Park

Hampstead Heath

Harringay Green Lanes

Leytonstone High Road

Leyton Midland Road

Hackney Central

Northwick Park

Preston Road

Royal Victoria

Wembley Park

Rayners Lane

Watford High Street

Ruislip Gardens

South Ruislip

Greenford

Northolt

South Harrow

Sudbury Hill

Sudbury Town

Alperton

Pimlico

Park Royal

North Ealing

Acton Central

South Acton

Ealing Broadway

West Ruislip

Carpenders Park

Hatch End

North Wembley

Ealing Common

South Kenton

Kenton

Kensal Green

Queen’s Park

Gunnersbury

Kew Gardens

Stockwell

Bow Church

Stonebridge Park

Harlesden

Camden Town

Willesden Junction

Headstone Lane

Parsons Green

Putney Bridge

East Putney

Southfields

Wimbledon Park

Island Gardens

Deptford Bridge

South Quay

Crossharbour

Mudchute

Heron Quays

West India Quay

Elverson Road

Oakwood

Cockfosters

Southgate

Arnos Grove

Bounds Green

Theydon Bois

Epping

Debden

Loughton

Buckhurst Hill

Walthamstow Queen’s Road

Woodgrange Park

Leytonstone

Leyton

Wood Green

Turnpike Lane

Manor House

Stanmore

Canons Park

Queensbury

Kingsbury

High Barnet

Totteridge & Whetstone

Woodside Park

West Finchley

Finchley Central Woodford

South Woodford

Snaresbrook

Hainault

Fairlop

Barkingside

Newbury Park

East Finchley

Highgate

Archway

Devons Road

Langdon Park

All Saints

Tufnell Park Neasden

Dollis Hill

Willesden Green

South Tottenham

Swiss Cottage

Kilburn

Blackhorse Road

Acton Town

Canning Town

Finchley Road

Canary Wharf

Stepney Green

East Ham

Plaistow

Upton Park

Poplar

Upper Holloway

Pudding Mill Lane

Kennington

Borough

Elm Park Dagenham

East

Dagenham Heathway

Becontree

Upney

Heathrow Terminal 5

Finchley Road & Frognal

Crouch Hill

Northfields

Boston Manor

South Ealing

Osterley

Hounslow Central

Hounslow East

Clapham North

Clapham High Street

Oval

Clapham Common

Clapham South

Tooting Bec

Tooting Broadway

Colliers Wood

South Wimbledon

Arsenal

Holloway Road

Caledonian Road

Morden

Hounslow West

Hatton Cross

Heathrow Terminals 1, 2, 3

West Harrow

Brondesbury Caledonian Road &

Barnsbury

Hackney Wick

Homerton

West Acton

East India

Chiswick Park

Roding Valley

Grange Hill

Chigwell

Redbridge

Gants Hill

Wanstead

Ickenham

Turnham Green

Uxbridge

Hillingdon Ruislip

Gospel Oak

Mile End

Bow Road

Bromley- by-Bow

Upminster Bridge

Hornchurch

London City Airport

Richmond

West Brompton

Clapham Junction

Balham

Norwood Junction

Forest Hill

Anerley

Penge West

Honor Oak Park

Brockley

Brixton

Elephant & Castle

Wembley Central

Harrow & Wealdstone

Bushey

Watford Junction

Kentish Town Highbury & Islington

Finsbury Park

Seven Sisters

Tottenham Hale

Walthamstow Central

Stratford

Upminster

Barking

West Ham

Limehouse

Woolwich Arsenal

Wimbledon

West Croydon

Cutty Sark for Maritime Greenwich

Ruislip Manor

Eastcote

West Hampstead

Wapping

New Cross Gate

Sydenham

New Cross

Crystal Palace

Queens Road Peckham

Peckham Rye

Denmark Hill

Canada Water

Surrey Quays

Whitechapel

Greenwich

Lewisham

Kilburn Park

Regent’s Park

Kilburn High Road

South Hampstead

Goodge Street

Shepherd’s Bush Market

Goldhawk Road

Bayswater

Warren Street

Aldgate

Barbican Russell Square

Mornington Crescent

High Street Kensington

St. John’s Wood

Green Park

Baker Street

Notting Hill Gate

Aldgate East

Mansion House

Oxford Circus

Bond Street

Piccadilly Circus

Holborn

Tower Gateway

Monument

Leicester Square

St. Paul’s

Hyde Park Corner

Knightsbridge

Stamford Brook

Ravenscourt Park

West Kensington

North Acton

Holland Park

Angel

Queensway Marble Arch

South Kensington

Sloane Square

Wandsworth Road

Covent Garden

Great Portland

Street

Bank

East Acton

Chancery Lane

Lancaster Gate

Kensington (Olympia)

Marylebone

Victoria

Charing Cross

Cannon Street

Farringdon

Euston

Westminster London Bridge

Liverpool Street

Stratford International

Old Street

Moorgate

Warwick Avenue Maida Vale

Tower Hill

Fenchurch Street

Paddington

Barons Court

Gloucester Road St. James’s

Park Temple

Latimer Road Ladbroke Grove

Royal Oak

Westbourne Park

Bermondsey

Rotherhithe

Shoreditch High Street

Dalston Junction

Haggerston

Hoxton

Wood Lane

White City

Shepherd’s Bush

King’s Cross St. Pancras

Euston SquareEdgware

Road

Southwark

Embankment

Stratford High Street

Abbey Road

Star Lane

Vauxhall

Blackfriars

Waterloo

Tottenham Court Road

Canonbury

Shadwell

Earl’s Court

North Greenwich

Hammersmith

Edgware Road

Harrow- on-the-Hill

Imperial Wharf

We need node

properties!

How can a wheelchair

user get from Paddington to Green Park?

Graph Operations

¤Create: add a node or relationship

¤Read: retrieve data ¤Update: change a node

or relationship’s properties

¤Delete: remove a node or relationship

¤Adjacency: Find connected nodes

¤Neighbourhood: Find nodes up to x steps away

¤Shortest path: Find the shortest path between two nodes

Big Data Management

32

Graph storage

¤Native: Optimised and designed for graphs

¤Serialised: stored in some other form ¤Relational DBMS ¤Object store ¤Document store

Big Data Management

33Check here for more details: https://neo4j.com/blog/native-vs-non-native-graph-technology/

Native = data stored as a graph Non-native = data not stored as a graph

Storing Relationship in an RDBMS

Person ID Name 1 Alice 2 Bob … … 99 Zach

Big Data Management

34

Friendships PersonID FriendID 1 2 2 1 2 99 … … 99 1

1

99

2 FRIEND

FRIENDFRIEND

Bob’s Friends SELECT p2.Name FROM Person p1,

Friendship f1, Person p2

WHERE p1.ID = f1.PersonID AND p2.ID = f1.FriendID AND p1.Name = ‘Bob’

FRIEND

Storing Relationship in an RDBMS

Person ID Name 1 Alice 2 Bob … … 99 Zach

Big Data Management

35

Friendships PersonID FriendID 1 2 2 1 2 99 … … 99 1

1

99

2 FRIEND

FRIENDFRIEND

Bob’s Friends-of-friends SELECT p2.Name

AS Friend_of_friend FROM Person p1,

Friendship f1, Person p2, Friendship f2

WHERE p1.ID = f1.PersonID AND f1.FriendID =

f2.PersonID AND p2.ID = f2.FriendID AND p1.Name = ‘Bob’ AND f2.FriendID <> p1.ID

FRIEND

Storing Graphs in JSON

{ A : { outE : [B, C]

} B : { outE : []

} C : { outE : [D]

} D : { outE : [A]

} }

Representing a Graph in a Relational Database

outV | inV

------------

A | B

A | C

C | D

D | A

A

CB

D

Big Data Management

37

http://www.slideshare.net/slidarko/graph-databases-trends-in-the-web-of-data

Native Storage ¤Index-free adjacency

¤ Doesn’t rely on indexes ¤ Relationships modelled as links

between nodes ¤ Micro-index of adjacent

nodes ¤ Query time proportional to

amount of graph searched, not total size

InLinks OutLinks D B

C

Big Data Management

39

Representing a Graph in a Relational Database

outV | inV

------------

A | B

A | C

C | D

D | A

A

CB

D

Processing engine

¤Runs algorithms over the graph ¤Identify clusters ¤Questions about relationships

¤Traverses the graph ¤Following paths ¤Chasing pointers

Big Data Management

40

A

DC

B E

41

Figure 1-3. An overview of the graph database space

Figure 1-4 shows a common architecture for deploying a graph compute engine. The architecture includes a system of record (SOR) database with OLTP properties (such as MySQL, Oracle, or Neo4j), which serves, requests, and responds to queries from the application (and ultimately the users) at runtime. Periodically, an Extract, Transform, and Load (ETL) job moves data from the system of record database into the graph compute engine for offline querying and analysis.

Figure 1-4. A high-level view of a typical graph compute engine deployment

A variety of different types of graph compute engines exist. Most notably there are in- memory/single machine graph compute engines like Cassovary, and distributed graph compute engines like Pegasus or Giraph. Most distributed graph compute engines are

A High-Level View of the Graph Space | 7

Big Data Management

I. Robinson, J. Webber, and E. Eifrem, Graph Databases. O’Reilly Media, 2013.

Performance: Relationship queries

Relational ¤Performance degrades with

many joins

¤Multiple self-joins even worse ¤ Each self-join must create a

copy of the relation

¤Cost increases as cardinality increases

Graph ¤Performance stays constant

¤ Queries are localised to a node

¤ Nodes are linked via relationships

¤Cost constrained by query

Big Data Management

42

Find all friends of my friends

Query Performance: Set queries

Relational ¤Exploit data model structure

¤Matches “traditional” business query loads

Graphs ¤Have to visit every node

Big Data Management

43

For each film, how many actors are there? What is the average number of actor per film?

Flexibility: New Data Needs

Relational ¤Schema redesign

¤Implementing redesign implies data migration

Graphs ¤Add new nodes and

relationships

¤Naturally supported

Big Data Management

44

http://neo4j.com/

Native Graph Storage

For this course we will use…

..other native graph databases are available…

• Data model looks like graph you would draw on whiteboard, so simple to discuss with non technical people

• ACID compliant • Flexible schema • Cheap to run (e.g. compared to Oracle) • Relationships are no longer hidden, they are a 1st class entity • Fast

Benefits

• Correspond to entities

• Have properties that describe the entity; stored as JSON-like

• Have labels that define the class of entity

Nodes

HWU

KEN

{ name: “ken”, age: 26, job: “ra”

}

UNIVERSITY

PERSON

RESEARCH

TEACH

• Link 2 entities with a direction

• Have properties that describe the relationship; stored as JSON-like

• Can describe the nature (e.g., parent- child)

• Or be qualitative (e.g., start date)

• Have labels that define the class of relationship

Relationships

HWU

KEN

{ start : “1.4.2006”, contract : “rolling” grade : 7

}

EMPLOYS

• Relationships ALWAYS have 1 direction

• Relationships cannot have 2 directions

• You can have multiple relationships – do you need to?

Answer: no

¤You can navigate in reverse!

HOWEVER.. You might want to!

For example: HWU employs Ken Ken is employed by HWU it doesn’t make sense that Ken Employs HWU

BUT: Ken follows HWU (i.e. on Twitter) and HWU follows Ken

Exercise

Create a graph that models the relationships between Heriot-Watt University, Edinburgh and yourself.

52

STUDIES AT

LOCATED N

PERSON

LIVES IN

UNIVERSITY

CITY

STUDIES AT

LOCATED IN

PERSON

LIVES IN

{ name: “steve”, age: 23, job: “student”

}

UNIVERSITY

{ name: “hwu”, founded: 1821

}

{ name:

“Edinburgh” }

CITY

Queries generate sub-graphs

HWU

KEN

Query

Query

Treating relationships as 1st class entities, and storing quantitative or qualitative facts about the relationship enables very complex querying:

Find all men who are connected within three friends of my women friends who like sailing but not bowling and who live within 30 miles of my address.

• Still something of a niche product. Many use cases, but not a general purpose DBMS in the way that RDBMS are.

• Immature in comparison to RDBMS • Writes are slow (in comparison to something like MongoDB) • Flexible schema are not that flexible. You need to fix a schema in

order that your program knows what to ask for and how to ask for it. So if there is no schema defined at the DBMS level, you end up defining one in the programs that access the data.

• Neo4J Cypher pushing for a standard query language https://neo4j.com/press-releases/query-language-graph- databases-international-standard/

Weaknesses

Neo4j Architecture

at O(1) cost each. To find who is friends with Alice, we simply follow all of Alice’s incoming FRIEND relationships to their source, again at O(1) cost each. Given these costings, it’s clear that, in theory at least, graph traversals can be very effi! cient. But such high-performance traversals only become reality when they are sup! ported by an architecture designed for that purpose.

Native Graph Storage If index-free adjacency is the key to high-performance traversals, queries, and writes, then one key aspect of the design of a graph database is the way in which graphs are stored. An efficient, native graph storage format supports extremely rapid traversals for arbitrary graph algorithms—an important reason for using graphs. For illustrative pur! poses we’ll use the Neo4j database as an example of how a graph database is architected.

First, let’s contextualize our discussion by looking at Neo4j’s high-level architecture, presented in Figure 6-3. In what follows we’ll work bottom-up, from the files on disk, through the programmatic APIs, and up to the Cypher query language. Along the way we’ll discuss the performance and dependability characteristics of Neo4j, and the design decisions that make Neo4j a performant, reliable graph database.

Figure 6-3. Neo4j architecture

Neo4j stores graph data in a number of different store files. Each store file contains the data for a specific part of the graph (e.g., nodes, relationships, properties). The division of storage responsibilities—particularly the separation of graph structure from property data—facilitates performant graph traversals, even though it means the user’s view of

144 | Chapter 6: Graph Database Internals

Big Data Management

59I. Robinson, J. Webber, and E. Eifrem, Graph Databases. O’Reilly Media, 2013.

Nodes Relationships Properties

Neo4J hardware calculator

Big Data Management

60https://neo4j.com/hardware-sizing/

But (like Dynamo), tables are tunable towards CP

Where is Neo4j?

Big Data Management

61

C

A P

CA: Guarantees to give a correct response but only while network works fine (Centralised / Traditional)

CP: Guarantees responses are correct even if there are network failures, but response may fail (Weak availability)

AP: Always provides a “best-effort” response even in presence of network failures (Eventual consistency)

Versions of Neo4J

Big Data Management

62

• Community • Enterprise

• Both offer ACID transactions • Both offer Cypher graph query language • Both offer drivers for C#, Java, JS, Python…

• Only Enterprise offers clustering

Neo4J: Enterprise Edition – Horizontal Scaling

Big Data Management

63https://neo4j.com/docs/operations-manual/current/clustering/introduction/

Core Servers: main responsibility is to safeguard data – they replicate all transactions using the Raft protocol. Raft ensures that the data is safely durable before confirming transaction commit to the end user application.

Read Replicas are fully-fledged Neo4j databases capable of fulfilling arbitrary (read-only) graph queries and procedures.

Read Replicas' main responsibility is to scale out graph workloads. They are asynchronously replicated from Core Servers via transaction log shipping. Periodically (usually in the ms range) a Read Replica will poll a Core Server for any new transactions and the Core Server will ship those transactions to the Read Replica. Many Read Replicas can be fed data from a relatively small number of Core Servers, allowing for a large fan out of the query workload for scale.

Cypher Query Language

Neo4j specific

Big Data Management

http://neo4j.com/developer/cypher-query-language/

Neo4J

Big Data Management

Create simplest graph

Big Data Management

67

What happened?

Big Data Management

68

Displaying nodes

Big Data Management

69

Display Graph: nodes

Big Data Management

70

Display Graph

Big Data Management

71

Create a relationship

Big Data Management

72

Relationship created

Big Data Management

73

Display result using match

Big Data Management

74

Create link from 0 to 1

Big Data Management

75

Displaying the new edge

Big Data Management

76

Delete all nodes and edges

Big Data Management

77

More information

Big Data Management

78

Cypher Summary ¤ Create a node: create()

¤ Retrieve all nodes: match(n) return n

¤ Create two nodes and a relationship called :LINKS_TO between them: create()-[:LINKS_TO]-> ()

¤ Create LINKS_TO relationship between nodes 0 and 1: match (n),(m) where id(n)=0 and id(m)= 1 create(n) -[:LINKS_TO]-> (m)

¤ Delete all nodes and edges: match(n) optional match (n)-[r]-() delete n,r

79

Movie Schema

Big Data Management

80

Neo4J supports ACID ¤ In order to fully maintain data integrity and ensure good transactional behaviour, Neo4j supports the ACID properties:

¤ Atomicity: If any part of a transaction fails, the database state is left unchanged. ¤ Consistency: Any transaction will leave the database in a consistent state. ¤ Isolation: During a transaction, modified data cannot be accessed by other operations. ¤ Durability: The DBMS can always recover the results of a committed transaction.

¤ All database operations that access the graph, indexes, or the schema must be performed in a transaction. ¤ The default isolation level is READ_COMMITTED. ¤ Data retrieved by traversals is not protected from modification by other transactions. ¤ Non-repeatable reads may occur (i.e., only write locks are acquired and held until the end of the transaction). ¤ One can manually acquire write locks on nodes and relationships to achieve higher level of isolation ¤ Locks are acquired at the Node and Relationship level. ¤ Deadlock detection is built into the core transaction management.

Big Data Management

81https://neo4j.com/docs/java-reference/current/transactions/

TRANSACTION 1.Begin a transaction. 2.Perform database operations. 3.Mark the transaction as successful or not. 4.Finish the transaction.

Requires a write lock, and Cypher automatically acquires one. MATCH (n:X {id: 42}) SET n.prop = n.prop + 1

Summary ¤Graphs good for relationships ¤Network analysis

¤Property graphs ¤Edges are directed ¤Nodes and edges have properties ¤Multi-relational

¤Efficient native storage and processing ¤Neo4j contains pattern matching query language

Big Data Management

82

http://en.wikipedia.org/wiki/Graph_database

Cypher vs SQL

Big Data Management

83

Query that for each customer who bought a product, will recommend products that peer customers have purchased. The WHERE clause removes products that the customer has already purchased, since we don’t want to recommend something the customer has already bought.

SELECT product.product_name as Recommendation, count(1) as Frequency FROM product, customer_product_mapping, (SELECT cpm3.product_id, cpm3.customer_id FROM Customer_product_mapping cpm, Customer_product_mapping cpm2, Customer_product_mapping cpm3 WHERE cpm.customer_id = ‘customer-one’ and cpm.product_id = cpm2.product_id and cpm2.customer_id != ‘customer-one’ and cpm3.customer_id = cpm2.customer_id and cpm3.product_id not in (select distinct product_id FROM Customer_product_mapping cpm WHERE cpm.customer_id = ‘customer-one’) ) recommended_products WHERE customer_product_mapping.product_id = product.product_id and customer_product_mapping.product_id in recommended_products.product_id and customer_product_mapping.customer_id = recommended_products.customer_id GROUP BY product.product_name ORDER BY Frequency desc;

MATCH (u:Customer {customer_id:’customer-one’})- [:BOUGHT]->

(p:Product) <- [:BOUGHT]-(peer:Customer)- [:BOUGHT]->(reco:Product) WHERE not (u)-[:BOUGHT]->(reco) RETURN reco as Recommendation, count(*) as Frequency ORDER BY Frequency DESC LIMIT 5;

https://neo4j.com/blog/slow-sql-queries-killing-recommendation-engine/?ref=blog

CYPHER SQL

Lab Exercise

¤See instructions on Vision

¤Download neo4j, unzip, and run from /bin folder

¤Two labs available – do lab a (networks) and optionally lab b (data analysis)

¤Don’t forget to stop neo4j service when finished

Big Data Management

84

Download from: http://www.macs.hw.ac.uk/~pb56/neo4j.tar.gz (for Linux lab PCs 86MB zip file)

Neo4J for Very Large Scale Systems

¤ https://www.youtube.com/watch?v=BfPDZf2wmqg&feature=youtu.be

Big Data Management

85

Intro to Cypher Video Link

¤https://youtu.be/VdivJqlPzCI?t=2

Big Data Management

86

http://neo4j.com/books/graph-databases/