Centre de Calcul de l’Institut National de Physique ...} Flink like SPARK Batch, streaming, …}...

Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules

XLDB 2017 10/11/2017

}  Very large amounts of data

}  Very volatile data

}  Data as streams

}  Data schemaless }  Data quality

Modern data management

}  XLDB from 2011

}  XLDB 2017 ◦  2,5 days ◦  140 attendees ◦  14 talks ◦  17 lightnings talks ◦  2 sessions posters ◦  2 demonstrations

}  Speakers ◦  LIRIS ◦  Databricks (SPARK) ◦  CERN ◦  SAP Big Data ◦  Imperial College ◦  …

}  SAP Hub ◦  Plateforme de connexion avec hadoop, …

}  Liris ◦  Hive and HadoopDB evaluation with LSST data ◦  Performance �  Loading : Hive is better �  Query reponse time : HadoopDB is better with a high volumetry ◦  Scalability (25 à 50 machines): both scale up well

}  Liris proposal

}  Flink like SPARK ◦  Batch, streaming, …

}  SPARK New stream features ◦  SQL can be run on streams (Bullet) ◦  Checkpoints improve fault-tolerance ◦  Aggregation by window

}  LeanXcale ◦  New database vendor ◦  Scalable transactional management: scale up to many million of

transactions per second ◦  OLTP and OLAP

}  MonetDB by ◦  Column storage ◦  Interesting product but no map and not very well documented

}  CEPH presentation (CERN) ◦  Open source product ◦  CEPH does not use replicated block ◦  Storage virtualisation

}  CloudMdsQL ◦  Provide integrated access to multiple, heterogeneous cloud data

stores such as NoSQL, HDFS and RDBMS ◦  Others polystores : SPARKSQL, Polybase … ◦  Issues : Execute joins between RDBMS and HDFS and Nosql ◦  Not OpenSource (LeanXcale)

QueryProcessor

RDBMSWrapper

HDFSWrapper

SELECTid,xFROMASCAN(…).MAP(…).REDUCE(…).FILTER(KEYIN(1,3)).PROJECT(…)

}  Knowledge Preservation in HEP (Notre Dame) ◦  Huge investment in producing data for science ◦  Data can be wasted or not re-used ◦  Data preservation: “backing up your hard drive” ◦  harder problem: software + “knowledge” ◦  Data And Software Preservation for Open Science

�  CERN –DESY – CNRS �  Containers Portability = Preservation! �  CERN Open Data Portal

�  CERN Analysis Preservation ◦  How new analysis tools can be preserved ?

}  European Bioinformatics Institute : Genomics based on ElasticSearch

�  8 data nodes (2 cores / 32Gb RAM / 200Gb disk) �  3.2 billons of documents / 782Gb �  Complex query on >100 million genes ~500ms

}  Bullet �  A real-time query engine that lets you run queries on very large data

streams �  OpenSource �  Components : Storm, kafka

}  Kafka Kloner like MirrorMaker ◦  A dynamic High-Speed Inter-Cluster Kafka Replicator ◦  Developped by Yahoo for yahoo ◦  150 billion events per day with an average latency around 2 sec.

Lightning talks

}  CERN Openlab project with SPARK ◦  Physics Data Analytics and Data Reduction with Apache Spark ◦  CMS Experiment

}  Oracle database In-Memory ◦  Significant improvement for Data warehouse appliance

Lightning talks

}  AstroSpark ◦  SPARK for astronomical data : Cone-Search, Cross-Match … ◦  Data partitioning and indexing with healpix ◦  Query optimizer for astronomical queries ◦  Astronomical Data Query Language support

Lightning talks

}  QSERV ◦  Execution of 2 queries before losing the ssh connexion at CC ◦  Shared nothing architecture ◦  Big challenge but many things to do : �  Fault-tolerance �  Data distribution �  Big queries

}  Wikidata

Demonstration

Centre de Calcul de l’Institut National de Physique ...} Flink like SPARK Batch, streaming, …}...

Documents

Infographie - Douze semaines chez Spark

Spark : 5 moyens simples et rapides pour exploiter vos Big Data avec Spark et Talend

Spark Streaming

SPOR-PIHCIN Research Day - PIHCIN SPARK

Blocage des checkpoints immunologiques PD-1/PD …Correspondances en Onco-Théranostic - Vol. III - n 3 - juillet-août-septembre 2014 11155 Blocage des checkpoints immunologiques

Introduction to Apache Spark

Machine Learning - Spark / MLlib

Spark Mini Booster Manuel en français Version 2.1 Table ...cdn-downloads.tcelectronic.com/media/2734243/tc_electronic_spark... · Consignes de sécurité importantes Spark Mini Booster

Invitation contributive spark us #1

Table des matièresjavaetmoi.com/wp-content/uploads/2015/04/initiation-a-spark-avec... · Apache Spark se présente comme la nouvelle génération de moteur de calcul distribué qui

SANQUA SPARK 1

Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / NextProductToBuy

Oracle Streams Concepts and Administration - Oracle Documentation

Swna spark me-v2

Les Streams sont parmi nous

Programmation R sous Spark avec SparkR

Tutoriel adobe spark vidéo - ac-lyon.fr · PDF fileAdobe Spark Application Ipad, et ordinateur Réseau Départemental de Ressources Informatiques Septembre 2016

Spark簡介 Introduction to Spark - nuu.edu.twdebussy.im.nuu.edu.tw/sjchen/BigData-Spark/SparkSQL.pdf · 2020-05-11 · 國立聯合大學資訊管理學系巨量資料課程(陳士杰)

SPARK WIRELESS - Focal · 2018-04-16 · SPARK WIRELESS® Manuel d’utilisation / User Manual / Gebrauchsanleitung / Manuale d’uso / Manual de uso / Manual de utilização / Handleiding

Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !