View
1
Download
0
Category
Preview:
Citation preview
1
• Infrastructure
• Matthias Lermer• Hochschule Furtwangen
• matthias.lermer@hs-furtwangen.de
HALFbACk Project meeting – November 16th
Infrastructure – Overview The Infrastructure provides the means to:
Store sensor data Interfaces and Storage for scalable database: Apache Cassandra Interface for streaming data: Apache Kafka
Store machine fingerprint data OPC-UA Interface Will be linked to corresponding sensor data in Cassandra
Preprocess data in the Halfback Cloud (Openstack Environment) Interfaces and VMs for Real-Time processing: Apache Storm Interfaces and Vms for Batch processing: Apache Spark
Infrastructure – Overview The Infrastructure provides the means to:
Analyze the data with the help of Machine Learning in the Halfback Cloud Interface and VMs: Apache Spark Further solutions, e.g., Tensorflow, possible if needed (You can also use your own solutions, as access to the data in the
database will be provided)
Provide access only to authenticated and authorized persons Virtual Privat Network (VPN) Customized Access to interested companies can be provided
Ensure High Availability and Fault Tolerance with Ceph
Infrastructure – Overview
Hardware:
Openstack Environment: Halfback Cloud 10 Computing Nodes with Intel Xeon Quad Cores Storage about 4 TB usable right now, everything is replicated in case of fault
Still in the process of upgrading and distributing storage more evenly
(Bottleneck right now because of Ceph)
Infrastructure – Overview Storage in HDFS also
possible Depending on the Use
Case (Data) Has to be evaluated
(e.g., Machine Profiles probably better as XML)
Kafka and Storm (or Spark Streaming) in case near Real Time or continuous data processing is needed
Infrastructure – Overview Procedure for Access
You will get a VPN Certificate and connect to the Halfback Cloud
Now you can: Access the Database with sensor data / machine fingerprints directly,
copy the data to your own pc for analysis Log in to a VM with pre-installed components e.g., Spark and run your
scripts (Preprocess/Machine Learning) distributed directly in our Cloud Install new or needed components on your VM (Ubuntu 16.04 OS)
Infrastructure – Technologies OpenStack (Open-Source)
OpenStack provides the means to create self hosted clouds Control Storage, Computation and Networking Create Virtual Machines with predefined installed components Robust, fault tolerance environment
2 Controller Nodes, 10 Computing Nodes Storage Replication with Ceph
Infrastructure – Technologies Ceph (Open-Source) “The Future of Storage”
Storage Clusters with infinite scalability Object Storage (Providing File/Block Layers), think of Valet parking Fault tolerance, Everything is replicated and distributed across nodes
Reliable Autonomic Distributed Object Store (RADOS) Controlled Replication Under Scalable Hashing (CRUSH)
Supports Snapshots, Cloning, Load Balancing Hardware independent
Infrastructure – Technologies Spark (Open Source) “Lightning-fast cluster computing”
Fast processing in Big Data Environments (e.g. Cassandra) Managing and processing streams or events (later: Broker) Distributed Machine Learning (E.g. Gridsearch, hyperparameter tuning)
Java, Scala, Python with integrated Mlib library Use existing Tensorflow code (small code change needed) Also support for Cafe, sklearn, Keras, etc.
Infrastructure – Technologies Only adaptable components (Open Source) are used Technology stack can quickly be modified and adapted to needs Automation and flexibility plays a big role
Example 1: Machine → OPC-UA → Kafka → Spark Streaming
enables continous Machine Learning Modeling Example 2: Machine → OPC-UA → Cassandra → Spark enables batch Machine Learning Modeling
Components like Kafka (publish/subscribe) will provide basis for automation of broker mechanisms
Recommended