30
8/06/2019 1 Hanoi, June 2019 Truyen Tran Deakin University @truyenoz truyentran.github.io [email protected] letdataspeak.blogspot.com goo.gl/3jJ1O0 Deep Learning for Genomics Present and Future

Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

  • Upload
    others

  • View
    12

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 1Hanoi, June 2019

Truyen TranDeakin University @truyenoz

truyentran.github.io

[email protected]

letdataspeak.blogspot.com

goo.gl/3jJ1O0

Deep Learning for GenomicsPresent and Future

Page 2: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 2

Deep Learning

Biomedicine

Knowledge driven

Page 3: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 3

Vectors & sets

Sequences

Deep learning The futurist

Graphs

Agenda

Bio structures

Page 4: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Feature detector

8/06/2019 4

Integrate-and-fire neuron

andreykurenkov.com

Block representation

DL in a nutshell

Source: http://karpathy.github.io/assets/rnn/diags.jpeg

“Deep” as layered blocks

Relations, messaging & attention

Page 5: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

DL can learn from data, and fake it

8/06/2019 5

Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.

Page 6: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

DL can generate sequences nicely

8/06/2019 6GPT-2, https://openai.com/blog/better-language-models/#fn2

Page 7: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

What can DL do to genomics?Deep learning offeringsFunction approximation

Program approximation

Program synthesis

Deep density estimation

Disentangling factors of variation

Capturing data structures

Generating realistic data (sequences)

Question-answering

Information extraction

Knowledge graph construction and completion

8/06/2019 7

Genomic problemsGWAS, gene-disease mappingBinding site identificationFunction predictionDrug-target bindingDrug designStructure predictionSequence generationFunctional genomicsOptimizing sequencesOrganizing the (knowledge about) omics universe

Inspire

Solve

Page 8: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 8

Vectors & sets

Sequences

Deep learning The futurist

Graphs

Agenda

Bio structures

Page 9: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

“Diet networks” for GWAS

8/06/2019 9

Use a “hypernet” to generate the main net.

Features are embedded (not data instance).

Unsupervised autoencoder as regularizer.

Works well on country prediction on the 1000 Genomes Project dataset. But this is a relatively easy problem. PCA, even

random subspace can do quite well!

Images taken from the paper

#REF: Romero, Adriana, et al. "Diet Networks: Thin Parameters for Fat Genomic" ICLR (2017).

Page 10: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Gene expression: DeepTRIAGE

8/06/2019 10

http://distill.pub/2016/augmented-rnns/

Attention mechanism

Page 11: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 11

Vectors & sets

Sequences

Deep learning The futurist

Graphs

Agenda

Bio structures

Page 12: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Identifying binding sites

8/06/2019 12http://www.nature.com/nbt/journal/v33/n8/full/nbt.3300.html

DeepBind (Alipanahi et al, Nature Biotech 2015)

Page 13: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Multiple modalities

8/06/2019 13

#REF: Eser, Umut, and L. Stirling Churchman. "FIDDLE: An integrative deep learning framework for functional genomic data inference." bioRxiv (2016): 081380.

Page 14: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Source: https://simons.berkeley.edu/sites/default/files/docs/4575/2016-kundaje-simonsinstitute-deeplearning.pdf

https://qph.ec.quoracdn.net

Chromatins

Page 15: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 16

Vectors & sets

Sequences

Deep learning The futurist

Graphs

Agenda

Bio structures

Page 16: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

From vector to graph withPAN: Personalized Annotation Networks

8/06/2019 17

Nguyen, Thin, Samuel C. Lee, Thomas P. Quinn, Buu Truong, Xiaomei Li, Truyen Tran, Svetha Venkatesh, and Thuc Duy Le. "Personalized Annotation-based Networks (PAN) for the Prediction of Breast Cancer Relapse." bioRxiv (2019): 534628.

Page 17: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

#REF: Penmatsa, Aravind, Kevin H. Wang, and Eric Gouaux. "X-ray structure of dopamine transporter elucidates antidepressant mechanism." Nature 503.7474 (2013): 85-90.

Predicting molecular bioactivities as querying a graph

8/06/2019 18

Controller

… Memory

Drug molecule

Task/query Bioactivities

#Ref: Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Graph Memory Networks for Molecular Activity Prediction." ICPR’18.

Page 18: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Multi-target binding for drug repurposing as graph multi-labeling

8/06/2019 19

#REF: Do, Kien, et al. "Attentional Multilabel Learning over Graphs-A message passing approach." Machine Learning, 2019.

Page 19: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

#REF: Do, Kien, et al. "Attentional Multilabel Learning over Graphs-A message passing approach." arXiv preprint arXiv:1804.00293(2018).

Page 20: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Drug-target binding as graph reasoning

Reasoning is to deduce knowledge from previously acquired knowledge in response to a query (or a cues)

Can be formulated as Question-Answering or Graph-Graph interaction:Knowledge base: Binding targets (e.g., RNA/protein sequence, or 3D structures), as a graphQuery: Drug (e.g., SMILES string, or molecular graph)Answer: Affinity, binding sites, modulating effects

8/06/2019 21

Page 21: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

An analogy from Video Question AnsweringVideo as sequence of frame, but also a complex 3D graph of objects, actions and scenes Protein, RNA

Question as sequence of words, but also a complex dependency graph of concepts Protein, drug

Answer as facts (what and where) and deduced knowledge. Affinity, binding sites,

modulation effect

8/06/2019 22

#Ref: Minh-Thao Le, Vuong Le, Truyen Tran, Learning to Reason with Relational Video Representation for Question Answering, In Submission 2019.

Page 22: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Drug-drug, drug-target & protein-protein as graph-graph interaction

8/06/2019 23

𝑴𝑴1 … 𝑴𝑴𝐶𝐶

𝒓𝒓𝑡𝑡1 …𝒓𝒓𝑡𝑡𝐾𝐾

𝒓𝒓𝑡𝑡∗

Controller

Write𝒉𝒉𝑡𝑡

Memory

Graph

Query Output

Read heads

Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational dynamic memory networks." arXivpreprint arXiv:1808.04247(2018).

Page 23: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Inferring (bio) relations as knowledge graph completion

8/06/2019 24

https://www.zdnet.com/article/salesforce-research-knowledge-graphs-and-machine-learning-to-power-einstein/

Do, Kien, Truyen Tran, and Svetha Venkatesh. "Knowledge graph embedding with multiple relation projections." 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018.

Page 24: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Drug design as structured machine translation, aka conditional generationCan be formulated as structured machine translation: Inverse mapping of (knowledge base + binding properties) to (query) One to many relationship.

8/06/2019 25

Representing graph as string (e.g., SMILES), and use sequence VAEs or GANs.Graph VAE & GANModel nodes & interactionsModel cliques

Sequences Iterative methods

Reinforcement learningDiscrete objectives

Any combination of these + memory.

Page 25: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Drug design as reinforcement learning

8/06/2019 26

You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation." NeurIPS (2018).

Page 26: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 27

GraphsVectors & sets

Sequences

Deep learning The futurist

Agenda

Bio structures

Page 27: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

28

https://towardsdatascience.com/opportunities-and-obstacles-for-deep-learning-in-biology-and-medicine-6ec914fe18c2https://www.oreilly.com/ideas/deep-learning-meets-genome-biology

Genetic diagnosticsRefining drug targetsPharmaceutical developmentPersonalized medicineBetter health insuranceSynthetic biology

Page 28: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Deep learning versusgenomicsNeuron ↔ Nucleotide, amino acid (building bricks)

Neural networks ↔ Chemical/biological networks (the house)

Message passing ↔ Signalling (the communication)

Neural programs ↔ Proteins/RNAs (the operating machines)

Neural Turing machine ↔ DNA (data + instruction + control)

Neural universe ↔ Omics universe (the computational universe)

Learning over time ↔ Co-evolution (adaptation)

Super Neural Turing machine ↔ DNA + Evolution (data + program + adaption)

8/06/2019 29

Bertolero, M. A., Blevins, A. S., Baum, G. L., Gur, R. C., Gur, R. E., Roalf, D. R., ... & Bassett, D. S. (2019). The network architecture of the human brain is modularly encoded in the genome. arXivpreprint arXiv:1905.07606.

Page 29: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

Living bodies as multiple programs interacting

8/06/2019 30

We need new (neural) capabilities:

Truly Turing machine: programs can be stored and called when needed.

Can solve BIG problem with many sub-modules.

Composionality

Can reason given existing structures and knowledge bases Neural Stored-program Memory

Page 30: Deep Learning for Genomics Present and Future · Deep learning offerings. Function approximation Program approximation Program synthesis Deep density estimation Disentangling factors

8/06/2019 31

The team