Ciel, mes données ne sont plus relationnelles

Preview:

DESCRIPTION

Quand la gestion des données de nos applications web dépasse la simple persistance dans une base de données relationnelle (type SGBD), l’utilisation de technologies alternatives dites « NoSql » est nécessaire. Nous aborderons les 4 grandes familles de NoSql (Key/Value, Document, Column-oriented et Graph) ainsi que leur intégration dans des applications PHP.

Citation preview

Ciel ! Mes données ne sont plus relationnelles

BLEND WEB MIX 01 Octobre 2013

1

Xavier Gorse

2

@xgorse

3

Association Française des Utilisateurs de PHP

• Crée en 2001• Forum PHP ( 21 & 22 Novembre 2013 à Paris)• AperoPHP et Rendez Vous• Antennes Locale• Président en 2009 www.afup.org

• Initié en 2010 par Hugo Hamon• Pas encore une vraie association• Sfpot mensuel avec conférence suivie d’un apéro• Antenne à Marseille, Lyon ??

Association Francophone des utilisateurs de SYmfony

www.afsy.fr

4

Elao

• Fondateur en 2005

• Lyon & Paris

• Agence Web Technique de 15 personnes

• Symfony depuis 2006

• Partenaire officiel SensioLabs

www.elao.com

5

Plan

• Trend

• Key-value databases

• Document databases

• Graph databases

• Column-oriented databases

6

RDBMS performance

7

Data complexity

Perfo

rman

ce

Relational database

Requirement of application

Salary&list&

Most&Web&apps&

Social&Network&

Loca5on7based&services&

Source @ianSrobinson - @jimwebber from NeoTechnology

complexity = f(size, connectedness, uniformity)

8

Data Size

9

2007 2008 2009 2010 20112012

2013

Data Size

• 500 million page views a day

• ~3TB of new data to store a day

• Posts are about 50GB a day. Follower list updates are about 2.7TB a day.

10

Connectedness

11

Source @ianSrobinson - @jimwebber from NeoTechnology

1990 2010 20202000

web 2.0 “web 3.0”web 1.0

Inform

a(on

)con

nec(vity)

Text)Documents)

Hypertext)

Feeds)

Blogs)

Wikis)

UGC)

Tagging)Folksonomies)

RDFa)

Ontologies)

GGG)

Uniformity

• Semi-­‐structured  data

• Different  data  lifecycle

• Store  more  data  about  each  en7ty

• Individualisa7on    &  decentraliza7on  of  content  genera7on

12

NoSQLNot Only SQL

13

NoSQL

• Non-­‐Rela7onal

• Cluster  Friendly

• Schema  less

• Distributed  architecture

14

ACID & CAP Theorem

ACID

• Atomicity

• Consistency

• Isola7on

• Durability

15

Cap  Theorem

• Consistency

• Availability

• Par77on  Tolerance

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

16

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

17

Key-value databases

• Inspired by Amazon’s Dynamo (2007)

• Global collection of key-value

• Big scalable HashMap

18

• Strengths

• Simple data model

• High performance

• Great at scaling out horizontally

• Weaknesses

• Simplistic data model

• Poor for complex data

19

Key-value databases

• Written in C - BSD License - 2009

• Very fast and light-weigth

• All data in memory

• Persistence

• Master/Slave Replication

• Used for caching, session or working queue

20

Key-value databases

http://redis.io/

• Riak

• Memcache (RAM)

• Voldemort

• Amazon DynamoDB (Saas)

• IronCache (Saas)

21

Key-value databases

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

22

Document databases

• Inspired by IBM Lotus Notes/Domino

• Idem from Key/Value with value as a document

• A document is a key-value collection

• Flexible schema

• Non-relational, data is de-normalized

23

Document databases • Strengths

• Simple, powerful data model

• Good scaling, Easy/Auto sharding

• Usually “ACID” compliant

• Weaknesses

• Unsuited for interconnected data

• Query model limited to keys (and indexes)  

24

Document databases • Written in C++ - License AGPL - 2009

• JSON-style documents

• Full Index Support

• Fast In-Place Updates

• Auto-Sharding

• Replication & High Availability

• A lot of Connector

• Big Community

• Commercial Support

25

http://www.mongodb.org

Document databases

• Lotus Notes / Domino

• CouchDB written in Erlang, Javascript for Query

• OrientDBwritten in Java, relationship as graph

26

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

27

Graph databases

• Nodes with properties

• Named relationships with properties

• Focus on the data structure

• Direct pointer to its adjacent element and no indexlookups are necessary

28

Graph databases• Strengths

• Powerful data model

• Fast for connected data

• A new data architecture

• Weaknesses

• No Sharding : All data in one instance

• Using Node/Relation property for Query kill performance

• A new data architecture

29

Graph databases• Java - GPL/Commercial - 2007

• Query language : Cypher / Gremlin

• REST Interface

• Embed Mode

• High Availability ( Master / Slave)

• Commercial Support

30

http://neo4j.org

GraphDB - Products

• Titan

• OrientDB

• InfiniteGraph

• AllegroGraph

31

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

32

Column-oriented database

• A big table, with column families

• Data stored by column instead of row

• Build for distributed architecture

• Map-reduce for querying/processing

• Flexible schema

• Easy sharding (partitioning)

33

Column-oriented database• Strengths

• Data model supports semi-structured data

• Naturally indexed (columns)

• Horizontally scalable – RW increase linearly

• Fault tolerant – no single point of failure

• Weaknesses

• Unsuited for interconnected data

34

Column-oriented database• Java - Apache License 2 - 2008

• Developed by Facebook

• Decentralized

• Supports replication and multi data center replication

• Scalability

• Fault-tolerant

• MapReduce support

35

http://cassandra.apache.org/

Column-oriented database

• HBase (Apache)

• HyperTable

• BigTable (Google)

36

Conclusion

• Application architecture impact

• Store your data in the way you want to query it

• Denormalize your data and try to keep them up-to-date !

37

38

Merci

Recommended