21
1 © 2018 The MathWorks, Inc. Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Application Engineer [email protected]

Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

1© 2018 The MathWorks, Inc.

Embarquez votre Intelligence Artificielle (IA) sur CPU,

GPU et FPGA

Pierre Nowodzienski – Application Engineer

[email protected]

Page 2: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

2

From Data to Business value

Generate raw data

End devices

Extract information

Data analysis

Get valuable knowledge

Make decisions

Artificial

Intelligence

Page 3: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

3

1 0 1 1 0 11 0 1 1 0 1

Amount of data

Transport cost

High latency

Availability

Artificial Intelligence opportunities in « Internet of Everything » world

CLOUD

Energy cost

Page 4: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

4

Do the right thing at the right place

Artificial Intelligence opportunities in « Internet of Everything » world

CLOUD

Mission Real-time analyticsLocal control center

Operational Intelligence

Global control center

Business intelligence

SWaP-C High Medium Low

Latency Very Low Low - Medium High

Today webinar focus:

How can we design and deploy Neural

Networks on embedded targets ?

Page 5: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

5

Embedded targets & mitigations

Efficiency

(performance/watt)

Development

productivity

Low

High

HighLow

Code generation

Code generation

Code

generation

• C/C++ programing language

• Sequential processing

• CUDA/ OpenCL programing

language

• Partly parallel processing

• VHDL/Verilog programing

language

• Partly parallel processing

Page 6: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

6

MathWorks workflows: Neural Network to embedded targets

Artificial Neural Network

Design & Training

Application

design

Dataset

Train the Network

Trained

Convolutional or

DAG Network

Trained

Shallow Neural

Network

GPU Coder

Embedded

Coder

HDL CoderASIC

ANSI/ISO

compliant

Application

logic

Application

logic

First part:

Deploying Deep Neural

Network

Second part:

Deploying Shallow

Neural Network

Page 7: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

7

Deep Learning is a Subset of Machine Learning

Machine Learning

Deep Learning

Page 8: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

8

Algorithm Design to Embedded Deployment Workflow

MATLAB algorithm

(functional reference)

Functional test1 Deployment

unit-test

2

Desktop

GPU

C++

Deployment

integration-test

3

Desktop

GPU

C++

Real-time test4

Embedded GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Application

logic

Page 9: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

9

Demo: Alexnet Deployment with ‘mex’ Code Generation

Page 10: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

10

Algorithm Design to Embedded Deployment on Tegra GPU

Functional test1

(Test in MATLAB on host)

Deployment

unit-test

2

(Test generated code in

MATLAB on host + GPU)

Tesla

GPU

C++

Deployment

integration-test

3

(Test generated code within

C/C++ app on host + GPU)

Tesla

GPU

C++

Real-time test4

(Test generated code within

C/C++ app on Tegra target)

Tegra GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Cross-compiled on host

with Linaro toolchain

MATLAB algorithm

(functional reference)

Application

logic

Page 11: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

11

Alexnet Deployment to Tegra: Cross-Compiled with ‘lib’

Two small changes

1. Change build-type to ‘lib’

2. Select cross-compile toolchain

Page 12: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

12

Desktop CPU

Raspberry Pi board

Deploying to CPUs

GPU

Coder

NVIDIA

TensorRT &

cuDNN

Libraries

Application

logic

Page 13: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

13

GPU Coder for Deployment

Deep Neural Networks

Deep Learning, machine learning

Image Processing and

Computer Vision

Image filtering, feature detection/extraction

Signal Processing and

Communications FFT, filtering, cross correlation,

5x faster than TensorFlow

2x faster than MXNet

60x faster than CPUs

for stereo disparity

20x faster than

CPUs for FFTs

ARM Compute

Library

Intel

MKL-DNN

Library

GPU Coder

Page 14: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

14

MathWorks workflows: Neural Network to embedded targets

Artificial Neural Network

Design & Training

Application

design

Dataset

Train the Network

Trained

Convolutional or

DAG Network

Trained

Shallow Neural

Network

GPU Coder

Embedded

Coder

HDL CoderASIC

ANSI/ISO

compliant

Application

logic

Application

logic

Second part:

Deploying Shallow

Neural Network

Page 15: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

15

Demo: Shallow network deployment on Zynq platform

Neural network as gas emission estimator (sensorless)

Engine

Shallow Neural

Network

Engine torque

Gas emission

Estimated torque

Estimated gas emission

Speed command

Fuel Rate

Page 16: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

16

Demo workflow

Train the

Network

Create the

Network structure

Test the

Network

Iterate

Export to

Simulink

Fine-tune &

optimize for

the target

Generate

code

Page 17: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

17

Demo summary

Train the

Network

Create the

Network structure

Test the

Network

Iterate

Export to

Simulink

Fine-tune &

optimize for

the target

Generate

code

Neural Network Toolbox

Parallel Computing Toolbox

Fixed Point

Designer

HDL Coder

Embedded

Coder

Page 18: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

18

HDL Optimization options

▪ HDL Coder with Simulink

– Streaming

– Sharing

– Line buffers as RAMs

– RAM Fusion

– Architecture Flattening

– Efficient resource mapping

▪ HDL Coder with MATLAB

– RAM Mapping

– Loop Streaming

– Resource Sharing

– CSD/FCSD

▪ HDL Coder with Simulink

– Input/Output pipelining

– Distributed Pipelining

– Hierarchical Dist. Pipelining

– Constrained Pipelining

– Clock-Rate Pipelining

– Back-Annotation

– Adaptive Pipelining

▪ HDL Coder with MATLAB

– Input/Output pipelining

– Distributed pipelining

– Loop Unrolling

▪ HDL Workflow Advisor

▪ Automatic Delay Balancing

▪ Validation model generation

Area Optimizations Speed Optimizations

Workflow and Verification

Page 19: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

19

Key takeaways

▪ Comprehensive & integrated development environment from dataset to target

▪ Fast design space exploration and trade-off

▪ Target-independant functional reference for target-optimized implementation

model

▪ Deploy « Smart application », not Neural network only

Page 20: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

20

MathWorks workflows: Neural Network to embedded targets

Artificial Neural Network

Design & Training

Application

design

Dataset

Train the Network

Trained

Convolutional or

DAG Network

Trained

Shallow Neural

Network

GPU Coder

Embedded

Coder

HDL CoderASIC

ANSI/ISO

compliant

Application

logic

Application

logic

Page 21: Embarquez votre Intelligence Artificielle (IA) sur CPU ... · Artificial Intelligence opportunities in « Internet of Everything » world CLOUD Mission Real-time analytics Local control

21

Next steps

▪ Web site technical resources

– Lookup Table Optimization

– Data Type Optimization (documentation)

– Efficient Implementation on FPGAs (documentation)

– Deep Learning Inference for Object Detection on Raspberry Pi

– Pedestrian Detection on a NVIDIA GPU with TensorRT

▪ Contact us

[email protected]

– +33-1-41-14-88-45

Special thanks to Vaidehi Venkatesan (Fixed-Point Designer development team)

for her great job to create this demo material!