Le big data à l'épreuve des projets d'entreprise

Preview:

DESCRIPTION

Slide du petit déjeuner du 11 décembre 2013 Dans un contexte économique délicat, les outils du « big data » apportent toute la rapidité, la souplesse et la scalabilité requise pour mettre en oeuvre des projets d'entreprise tirant profit de volumes d'information importants. Ces technologies sont désormais une réalité à intégrer aux projets SI. La société Klee Group organise ce déjeuner thématique en proposant des intervenants du Big Data : - Mongo DB - Elasticsearch - CMS Rubedo

Citation preview

LOGO du client

#2013

Big Data à l’épreuve des projets d’entreprise

Bretagne 2013

Pas tout à fait….

Et des camions il y en a ….

Et des camions il y en a ….

Ecotaxe

§ Flux entrant 24/7

• 2 000 points par seconde

• 200 paquets par seconde

§ Flux sortant 24/7

• 3* 200 paquets par seconde

§ Conservation 3 mois

• 1, 5 Milliard de paquets

• 7 téraoctets

Big Data ?

Big Data

Règle des 3V

Big data is high-volume, high-velocity and high-variety information

assets that demand cost-effective, innovative forms of information

processing for enhanced insight and decision making.

gartner.com

Règle des 3V

Big data is high-volume, high-velocity and high-variety information

assets that demand cost-effective, innovative forms of information

processing for enhanced insight and decision making.

gartner.com

Big Data

Variety Volume

Velocity

Data

Toujours plus…

Quantité

de données

Temps

Toujours plus, et plus encore…

Quantité

de données

Temps

Qualité de

décision

Quantité d’informations

Sur information

U /

The Inverted U Peter Morville

U

Sous information

Créer du

sens

Pour créer du sens

il faut

transformer la donnée en information

Data è Information

métadonnées

Donnée propriétés

Exemple : individu, événement,

équipement

métadonnées

Donnée propriétés

Métadonnées

Exemple : tags, chronologie, géolocalisation,

relations, notes, commentaires….

Information

Donnée propriétés

Métadonnées

Méta - Information

Cycle de création

timeline

Data …………………………………………….......

Création Enrichissement

Information

Rechercher / Représenter

Dan Roam

Rechercher / Représenter

#FacettedSearch

Stocker Rechercher Analyser

Trajectoire

Stocker Rechercher Analyser

Trajectoire

Stocker Rechercher Analyser

Trajectoire

Ecotaxe

Stocker Rechercher Analyser

§ Flux entrant 24/7

• 2 000 points par seconde

• 200 paquets par seconde

§ Flux sortant 24/7

• 3* 200 paquets par seconde

§ Conservation 3 mois

• 1, 5 Milliard de paquets

• 7 téraoctets

Ecotaxe #Volume #Velocity

#MongoDB

#Cluster

#Sharding

#Multi-sites

Architecture

§ En phase amont

Lutter contre la peur des décideurs / la résistance des équipes

§ En phase de spécifications /réalisation

Intégrer l’approche documentaire vs approche relationnelle

Former les équipes de développement

Exemple : logique transactionnelle

§ En phase de production

Lutter contre l’hébergement traditionnel / san

Favoriser l’approche horizontale vs verticale

RETEX MongoDB

Changement de paradigme

Vertical / Horizontal

« Scalabilité » Verticale Si besoin de plus de puissance

• on ajoute de la mémoire ….

• puis on remplace par un serveur de gamme plus

puissante

Corollaire : les machines sont surdimensionnées

pour absorber une augmentation potentielle de

charge

Vertical / Horizontal

« Scalabilité » Verticale Si besoin de plus de puissance

• on ajoute de la mémoire ….

• puis on remplace par un serveur de gamme plus

puissante

Corollaire : les machines sont surdimensionnées

pour absorber une augmentation potentielle de

charge

Vertical / Horizontal

« Scalabilité» Horizontale Si besoin de plus de puissance

• on ajoute des serveurs

Corollaire : linéarisation du coût / usage

Vertical / Horizontal

« Scalabilité» Horizontale Si besoin de plus de puissance

• on ajoute des serveurs

Corollaire : linéarisation du coût / usage

§ Avantages

• Qualité de la documentation

• Mise en œuvre rapide

• Versatilité

§ Bénéfices

• Agilité fonctionnelle

• Evolution du modèle aisée / versionnement natif

• Agilité technique

• Alignement matériel par rapports aux usages

MongoDB

Ne pas utilisez MongoDB si votre système est transactionnel, pour le reste …

§ Inconvénient

• Sharding pas si simple !

SPARK

Stocker Rechercher Analyser

RETEX Elasticsearch

CQRS Command Query Responsibility Segregation

Store Index

EventBus

Command Query

Rubedo Le CMS Big Data

Stocker Rechercher Analyser

Premier CMS open-source

basé sur un socle NoSQL

+

RETEX Rubedo

Dans un monde où

LAMP est LA Norme

NoSQL, mais pour quoi faire ?

§ Les CMS gèrent des Contenus …

… structurés

et

classés

NoSQL et Gestion de contenus

Approche relationnelle

type MySQL

Pour un type de contenu : 1 collection

Pour 10 types de contenus : 1 collection

1 requête unitaire : 1 collection

Pour un type de contenu : 6 tables

Pour 10 types de contenus : 29 tables

1 requête unitaire = 6 tables et 2 jointures

Approche NoSQL

documentaire

type MongoDB

Rubedo : comparaison des approches

§ Atouts Fonctionnels • Souplesse de modélisation

• Evolutivité dans le temps

• Fonctionnalités de Recherche

§ Atouts Techniques • Performances en lecture/écriture

• Stockage de grands volumes

• Montée en charge linéaire

• Gestion des fichiers intégrée (MongoDB)

• Sécurité centralisée

Rubedo : les atouts du NoSQL

§ Limites & précautions • Pas de transactions

• Déport des règles métiers dans

la couche applicative

• Framework de développement

indispensable !

• Certaines typologies de projets

peuvent nécessiter une

architecture hybride (site de e-

commerce complexe par

exemple)

Performances &

Volumétrie Mobilité

Recherche &

Géolocalisation

Ouverture &

Extensibilité

Souplesse Ergonomie

§ Portails à fort trafic ou volumétrie

§ Plateformes multi-sites

§ Sites mobiles

§ Contenus géo-localisés & cartographie

§ Moteurs de recherche verticaux

§ Plateformes de contribution décentralisées Use

cases

Rubedo : les cas d’usage

RUBEDO : démonstration

JavaScript, HTML5, CSS3

NoSQL

HTML5, CSS3

DEMONSTRATION

CSCSS3

Pause

10 min

LOGO du client

Merci de votre attention

Elasticsearch Revolutionizing Data Search

and Analytics

Richard Maurer– SEMEA Territory Manager

Agenda

•  Purpose of Elasticsearch

•  Features of Product

•  Customer Examples

•  Company Overview

•  Commercial Offerings

•  Resources

Purpose of Elasticsearch

•  Organize data and make it easily accessible

– Through powerful search and analytics

– Easily consumable (even for non-data scientists)

– Elegantly handles extremely large data volumes

– Delivers results in real time

•  Technology stack agnostic

•  Used across all market verticals

Features of Elasticsearch

•  Structured & unstructured search

•  Advanced analytics capabilities

•  Unmatched performance

•  Real-time results

•  Highly scalable

•  User friendly installation and maintenance

User: GitHub Searches 20TB of data, 1.3 billion files and 130 lines

of code using Elasticsearch

User: Foursquare Searches 50,000,000 venues every day using

Elasticsearch

User: Fog Creek Software Searches 40,000,000,000 (40 billion) lines of code in

real-time using Elasticsearch

User: StumbleUpon

Delivers millions of recommendations every day using Elasticsearch

Example: Email Archiving Email Archiving of 2 Petabytes of data across 100’s of servers

Big data, structured and unstructured

Example: Support Agents Custom Support – Search, Facets, and Reports

Real time metrics

Unprecedented Uptake

Elasticsearch has more than 5 Million downloads … and 400,000 more each month

Cumulative Cummmmmmmmmmmmmmmmmmmmmmmmmmmmuuuuuuuuuuuuuuuuuuuuuuuuuuuuullllllllllllllllaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaattttttttttttttttttttttiiiiiiiiiiiiiiiiiiiiiiiivvvvvvvvvveeee

Company Overview

•  More than 5 million downloads

•  400,000 New Downloads per Month

•  1000s of Mission Critical Implementations

•  Top Investors: Benchmark Capital, Index Ventures

•  Seasoned Executive Team – Founded by Creator of Elasticsearch

– Seasoned Executives from SpringSource

Users

User Raves Chris Cowan @uhduh

I’m in love with @elasticsearch! I want to use it for everything right now!

Alain Richardt @alaincxs

Moving ffrom #solr to # Elasticsearch is like upgrading from a Reliant Robin to a McLaren F1

Pete Connolly @peteconnolly

Two really useful and productive days of training from @kimchy and @uboness all about #elasticsearch. Best training course in years

Cyril Lacôte @clacote

#ElasticSearch is the s*&t. Amazingly simple and powerful. Open source is awesome. That's made my day.

Logan Lowell @fractaloop

Tweaking @elasticsearch for huge indexes can be fun. I'm very glad the IRC channel is so helpful too.

Product Offerings: Support Throughout Your Project

1.  Core Elasticsearch Training

2.  Development and Production Support

3.  Technical Account Manager

1: Training

Core Elasticsearch Training

•  Two day classroom training

•  Delivered by Elasticsearch developers

1.  Worldwide Public Courses

2.  Onsite Training Course

2: Support

3: Technical Account Manager

•  Named technical resource

•  Single point of contact into Elasticsearch

•  Onboarding call to assess your goals

•  Four health checks per year

•  Go-to expert to drive success with your Elasticsearch deployment

Resources

•  www.elasticsearch.com

•  www.elasticsearch.org

•  User Groups: http://www.elasticsearch.org/community/forum/

•  Contact:

Richard Maurer

Territory Manager

Richard.maurer@elasticsearch.com

Le Big Data à l'épreuve des

projets d'entreprise

Yann Aubry Regional Director

The Big Data Unknown

3

Top Big Data Challenges?

Translation? Most struggle to know what Big Data is, how to manage it and who can manage it

Source: Gartner

4

Understanding Big Data – It’s Not Very “Big”

from Big Data Executive Summary – 50+ top executives from Government and F500 firms

64% - Ingest diverse, new data in real-time

15% - More than 100TB of data

20% - Less than 100TB (average of all? <20TB)

When To Use Hadoop, NoSQL

6

Enterprise Big Data Stack

EDW Hadoop

Ma

na

ge

me

nt

& M

on

ito

rin

g

Se

cu

rity &

Au

ditin

g

RDBMS

CRM, ERP, Collaboration, Mobile, BI

OS & Virtualization, Compute, Storage, Network

RDBMS

Applications

Infrastructure

Data Management

Online Data Offline Data

7

Consideration – Online vs. Offline

•  Long-running •  High-Latency •  Availability is lower priority

•  Real-time •  Low-latency •  High availability

Online Offline vs.

8

Consideration – Online vs. Offline

Online Offline vs.

9

MongoDB/NoSQL Is Good for!

360° View of the

Customer

Mobile & Social

Apps Fraud Detection

User Data

Management

Content

Management &

Delivery

Reference Data

Product Catalogs Machine to

Machine Apps Data Hub

10

Hadoop Is Good for!

Risk Modeling Churn Analysis Recommendation

Engine

Ad Targeting Transaction

Analysis

Trade

Surveillance

Network Failure

Prediction Search Quality Data Lake

How To Use The Two Together?

12

Insurance leader generates coveted 360-degree view of customers in 90 days – “The Wall”

Case Study

Problem Why MongoDB Results

•  No single view of customer

•  145 yrs of policy data, 70+ systems, 15+ apps

•  2 years, $25M trying to aggregate in RDBMS – failed

•  Agility – prototype in 5 days; production in 90 days

•  Dynamic schema & rich querying – combine disparate data into one data store

•  Hot tech to attract top talent

•  Increased call center productivity

•  Better customer experience, reduced churn, more upsell opps

•  Dozens more projects in the works to leverage this data platform

13

Machine Learning

Ad-Serving

•  Catalogs and products

•  User profiles

•  Clicks

•  Views

•  Transactions

•  User segmentation

•  Recommendation engine

•  Prediction engine

Algorithms

MongoDB

Connector for

Hadoop

MongoDB overview

15

MongoDB

The leading NoSQL database

Document Database

Open-Source

General Purpose

16

To provide the best database for how we build and run apps today

MongoDB Vision

Build

–  New and complex data

–  Flexible

–  New languages

–  Faster development

Run

–  Big Data scalability

–  Real-time

–  Commodity hardware

–  Cloud

17

•  10 of the Top Financial Services Institutions

•  10 of the Top Electronics Companies

•  10 of the Top Media and Entertainment Companies

•  8 of the Top Retailers

•  6 of the Top Telcos

•  5 of the Top Technology Companies

•  4 of the Top Healthcare Companies

Fortune 500 & Global 500

18

5,000,000+ MongoDB Downloads

100,000+ Online Education Registrants

20,000+ MongoDB User Group Members

20,000+ MongoDB Days Attendees

20,000+ MongoDB Management Service (MMS) Users

Global Community

19

MongoDB Features

• JSON Document Model with Dynamic Schemas

•  Auto-Sharding for Horizontal Scalability

•  Text Search

•  Aggregation Framework and MapReduce

• Full, Flexible Index Support and Rich Queries

•  Built-In Replication for High Availability

•  Advanced Security

•  Large Media Storage with GridFS

20

MongoDB Business Value

Enabling New Apps Better Customer Experience

Lower TCO Faster Time to Market

21

Data Hub User Data Management

Big Data Content Mgmt & Delivery Mobile & Social

MongoDB Solutions

22

MongoDB Partners (200+)

Software & Services

Cloud & Channel Hardware

23

MongoDB Products and Services

Training Online and In-Person for Developers and Administrators

MongoDB Management Service (MMS) Cloud-Based Suite of Services for Managing MongoDB Deployments

Subscriptions MongoDB Enterprise, MMS (On-Prem), Professional Support, Commercial License

Consulting Expert Resources for All Phases of MongoDB Implementations

MongoDB Products and Services

25

MongoDB Enterprise

Enterprise build with value-added capabilities

•  Advanced Security w/Kerberos

•  On-Prem Management

–  Visualization and alerts on 100+ system metrics

–  Backup features coming soon

–  On-premise version of MongoDB Monitoring Services (MMS)

•  Enterprise Software Integration via SNMP

•  Private, On-Demand MongoDB University Training

•  Certified OS Support

26

•  Monitoring, with charts,

dashboards and alerts on 100+

metrics

•  Backup and restore, with point-

in-time recovery, support for

sharded clusters

MongoDB Management Service

Cloud-based suite of services for managing

MongoDB deployments

•  MMS On-Prem included with MongoDB Enterprise

(backup coming soon)

27

Consulting

•  Named MongoDB

expert

•  Advisory services

• Ongoing basis

•  Assist with all phases of

project

•  E.g., config., testing,

optimization, best

practices

•  Assess overall status

and health of existing

MongoDB deployment

Lightning Consults also available

Technical Account

Manager

Custom Consulting Health Check

28

Public

Training

•  Dev, admin, and

combined courses

available

•  North America and

EMEA

•  Customized to your

needs

•  For devs and admins

• On-Site

•  Free

•  For devs and admins

•  7 weeks

• Weekly lectures,

homework, final exam

Private Online

Private, On-Demand MongoDB University Training Included with MongoDB Enterprise Subscription

29

For More Information

Resource Location

MongoDB Downloads mongodb.com/download

Free Online Training education.mongodb.com

Webinars and Events mongodb.com/events

White Papers mongodb.com/white-papers

Case Studies mongodb.com/customers

Presentations mongodb.com/presentations

Documentation docs.mongodb.org

Additional Info info@mongodb.com

Resource Location

@yannaubry