43
Structure sans attribution M-A Delsuc - 6 mars 2007 Différentes approches pour l'attribution automatique de spectres de RMN et la détermination structurale de protéines Quelques approches théoriques et pratiques

Différentes approches pour l'attribution automatique de spectres de

Embed Size (px)

Citation preview

Page 1: Différentes approches pour l'attribution automatique de spectres de

Structure sans attribution M-A Delsuc - 6 mars 2007

Différentes approches pour l'attribution automatique de spectres

de RMN et la détermination structurale de protéines

Quelques approches théoriques et pratiques

Page 2: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

structure sans attribution

• idée simple :• une tâche dans une 2D indique la présence de 2 spins

dans le spectre 1D.

• une tâche dans une 2D-NOESY apporte une information de proximité entre les 2 spins porteurs

• il suffit de 4 distances pour placer un spin dans l’espace- 4n+1 distances pour 3n-6 degrés de liberté

• l’attribution apporte deux informations différentes :- le placement du spin porteur dans la séquence primaire de la

protéine

- un jeu de contraintes de distances dues à la simple présence des liaisons moléculaires

Page 3: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

structure sans attribution

• idée déjà ancienne• 1992 Malliavin, T. E., Rouh, A., Delsuc, M., and Lallemand, J. Y. Approche directe de la

détermination de structures moléculaires à partir de l’effect Overhauser nucléaire. CR Acad. Sci. Paris 315(II), 653–659.

• 1993 Oshiro, C. M., and Kuntz, I. D. Application of distance geometry to the proton assignment problem. Biopolymers 33, 107–115.

• 1996 Atkinson, R. A., and Saudek, V. The direct determinationof protein structure from multidimensional NMR spectra without assignment: An evaluation of the concept. In ‘‘Dynamics and the Problem of Recognition in Biological Macromolecules’’(O. Jardetzky and J.F.Lefébre, eds.). Plenum Press, NewYork.

• 2001 T.A. Malliavin, P. Barthe & M.A. Delsuc “FIRE : an automatic protocol for protein fold recognition and NMR spectral assignment" Theor. Chem. Acc. 106, 91-97.

• 2002 Grishaev, A., and Llinas, M. CLOUDS, a protocol for deriving a molecular proton density via NMR. Proc. Natl. Acad. Sci. USA 99, 6707–6712.

• 2005 Grishaev, A., and Llinas, M. “Protein structure elucidation from minimal NMR data: the CLOUDS approach.” Methods Enzymol. 394:261-95.

Page 4: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

différentes approches

• structure directe• détermination directe d’une structure à basse résolution

• nuage statistique• produire un ensemble de structures compatibles avec les

spectres mesurés

• faisceau d’information• détecter la présence de structure secondaires

• attribution en connaissant la structure

Page 5: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

• principe• purement homonucléaire

• Approche• Peak-List

• SPI : détection de systèmes de spin

• BACUS : élimination de d’impossibilité spectrale

• MIDGE : calcul en retour des intensités NOESY

• CLOUDS : génération de nuages d’atomes

• EMBEDS : placement de la protéine sur le nuage

COSY TOCSY NOESY dist

X X X

X X X

X

X

Grishaev, A., and Llinas, M. CLOUDS, a protocol for deriving a molecular proton density via NMR. (2002) Proc. Natl. Acad. Sci. USA 99, 6707–6712.

CLOUDS

Page 6: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

analyse Bayésienne

• le problème• des informations d’origine différentes,

• des informations plus ou moins sûres

• la solution• utiliser les probabilités de chaque élément d’information

• appliquer le théorème de Bayes

P (A|B) =P (B|A)P (A)

P (B)

Thomas Bayes (1702 — 1761)

Page 7: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

analyse Bayésienne

P (M |m) =P (m|M)P (M)

P (m)

M : modèlem : mesureP(M|m) : vraisemblance du modèle après la mesure m P(M) : probabilité du modèle a-prioriP(m) : probabilité de la mesure a-prioriP(m|M) : vraisemblance de la mesure

la même opération peut-être appliquée itérativement pour chaque élément d’information recueillie par la mesure

Page 8: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

[10] Protein Structure Elucidation from Minimal NMRData: The CLOUDS Approach

By Alexander Grishaev and Miguel Llinas

Abstract

In this chapter we review automated methods of protein NMR dataanalysis and expand on the assignment-independent CLOUDS approach.As presented, given a set of reliable NOEs it is feasible to derive a spatialH-atom distribution that provides a low-resolution image of the proteinstructure. In order to generate such a list of unambiguous NOEs, a proba-bilistic assessment of the NOE identities (in terms of frequency-labeledH-atom sources) was developed on the basis of Bayesian inference. Themethodology, encompassing programs SPI and BACUS, provides a list of‘‘clean’’ NOEs that does not hinge on prior knowledge of sequence-specificresonance assignments or a preliminary structural model. As such,the combined SPI/BACUS approach, intrinsically adaptable to include13C- and/or 15N-edited experiments, affords a useful tool for the analysisof NMR data irrespective of whether the adopted structure calculationprotocol is assignment-dependent.

Introduction

Despite significant recent advances in instrumentation, experimentaldesign, and data analysis, the derivation of macromolecular structures vianuclear magnetic resonance (NMR) remains a slow process, mainly due tothe complexity of assigning signals to individual spins. Consequently, mucheffort is being devoted to the automation of data analysis (Gronwald andKalbitzer, 2004; Guntert, 2003; Moseley and Montelione, 1999).

Macromolecular structure elucidation via NMR is equivalent tomapping the available spectral information onto specific molecularsites—a case of an ‘‘inverse problem.’’ Fortunately, this process is amena-ble to automation since (1) in general, NMR data are redundant, whichhelps to cope with the degeneracy of resonance frequencies; (2) themapping procedure can be broken down into a series of discrete, separatesteps at each of which the data analysis is subjected to a well-defined set oflogical, programmable rules; (3) the goal of the overall NMR structuralelucidation is usually a family of folds of !1 A precision, rather than arigorously defined structure, which relaxes requirements on the quality of

[10] protein structure elucidation from minimal NMR data 261

Copyright 2005, Elsevier Inc.All rights reserved.

METHODS IN ENZYMOLOGY, VOL. 394 0076-6879/05 $35.00

[10] Protein Structure Elucidation from Minimal NMRData: The CLOUDS Approach

By Alexander Grishaev and Miguel Llinas

Abstract

In this chapter we review automated methods of protein NMR dataanalysis and expand on the assignment-independent CLOUDS approach.As presented, given a set of reliable NOEs it is feasible to derive a spatialH-atom distribution that provides a low-resolution image of the proteinstructure. In order to generate such a list of unambiguous NOEs, a proba-bilistic assessment of the NOE identities (in terms of frequency-labeledH-atom sources) was developed on the basis of Bayesian inference. Themethodology, encompassing programs SPI and BACUS, provides a list of‘‘clean’’ NOEs that does not hinge on prior knowledge of sequence-specificresonance assignments or a preliminary structural model. As such,the combined SPI/BACUS approach, intrinsically adaptable to include13C- and/or 15N-edited experiments, affords a useful tool for the analysisof NMR data irrespective of whether the adopted structure calculationprotocol is assignment-dependent.

Introduction

Despite significant recent advances in instrumentation, experimentaldesign, and data analysis, the derivation of macromolecular structures vianuclear magnetic resonance (NMR) remains a slow process, mainly due tothe complexity of assigning signals to individual spins. Consequently, mucheffort is being devoted to the automation of data analysis (Gronwald andKalbitzer, 2004; Guntert, 2003; Moseley and Montelione, 1999).

Macromolecular structure elucidation via NMR is equivalent tomapping the available spectral information onto specific molecularsites—a case of an ‘‘inverse problem.’’ Fortunately, this process is amena-ble to automation since (1) in general, NMR data are redundant, whichhelps to cope with the degeneracy of resonance frequencies; (2) themapping procedure can be broken down into a series of discrete, separatesteps at each of which the data analysis is subjected to a well-defined set oflogical, programmable rules; (3) the goal of the overall NMR structuralelucidation is usually a family of folds of !1 A precision, rather than arigorously defined structure, which relaxes requirements on the quality of

[10] protein structure elucidation from minimal NMR data 261

Copyright 2005, Elsevier Inc.All rights reserved.

METHODS IN ENZYMOLOGY, VOL. 394 0076-6879/05 $35.00

somewhat similar to identification of an imperfectly phased electron densi-ty in X-ray crystallography. Indeed, our approach to this problem bearssome resemblance to the ARP/wARP automated refinement method

Fig. 9. Proton density distributions ( focs): col 2 (left) and kringle 2 (right). Individualatomic focs are discernible as condensed, pseudospherical density clusters.

Fig. 8. Individual clouds for col 2 (top) and kringle 2 (bottom). Complete clouds areshown on the left and HN-only clouds on the right. Modified from Grishaev and Llinas(2002a), copyright 2002, National Academy of Sciences, USA.

[10] protein structure elucidation from minimal NMR data 285

somewhat similar to identification of an imperfectly phased electron densi-ty in X-ray crystallography. Indeed, our approach to this problem bearssome resemblance to the ARP/wARP automated refinement method

Fig. 9. Proton density distributions ( focs): col 2 (left) and kringle 2 (right). Individualatomic focs are discernible as condensed, pseudospherical density clusters.

Fig. 8. Individual clouds for col 2 (top) and kringle 2 (bottom). Complete clouds areshown on the left and HN-only clouds on the right. Modified from Grishaev and Llinas(2002a), copyright 2002, National Academy of Sciences, USA.

[10] protein structure elucidation from minimal NMR data 285

somewhat similar to identification of an imperfectly phased electron densi-ty in X-ray crystallography. Indeed, our approach to this problem bearssome resemblance to the ARP/wARP automated refinement method

Fig. 9. Proton density distributions ( focs): col 2 (left) and kringle 2 (right). Individualatomic focs are discernible as condensed, pseudospherical density clusters.

Fig. 8. Individual clouds for col 2 (top) and kringle 2 (bottom). Complete clouds areshown on the left and HN-only clouds on the right. Modified from Grishaev and Llinas(2002a), copyright 2002, National Academy of Sciences, USA.

[10] protein structure elucidation from minimal NMR data 285

CLOUDS

Page 9: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

FIRE

• Idée : • s’appuyer sur les HN seuls

• Approche• analyse de 3D -15N-HSQC-NOESY

• nettoyage de la matrice de proximité

• génération de structures 3D

• placement de la protéine sur les HN - attribution séquentielle

HSQC-NOESY

match-matrix

distances

X

X

X

T.A. Malliavin, P. Barthe & M.A. Delsuc “FIRE : an automatic protocol for protein fold recognition and NMR spectral assignment" (2001) Theor. Chem. Acc. 106, 91-97.

Fold Insight by nuclear REsonance

Page 10: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

match-matrix3D 15N-HSQC-NOESY

match-matrix

P8 P13

reconstruit

théorique

Page 11: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

plongement

P8 : 68 aa

Page 12: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

plongement

P8 : 68 aa

Page 13: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

plongement

P8 : 68 aa

Page 14: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

plongement

P8 : 68 aa

Page 15: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

plongement

P8 : 68 aa

Page 16: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

attribution statistique

• la question• j’observe un système de spin

=> je mesure un ensemble de déplacements chimique

• quel est l’acide aminé le plus probable ?

• les solutions• Neural Network

- Rescue 1 1H

- Rescue 1N 1H 15N

• statistique Bayesienne- Rescue 2 1H 15N 13C

RESidue prediCtion with neUral nEtworks

Page 17: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

attribution statistique

• la question• j’observe un système de spin

=> je mesure un ensemble de déplacements chimique

• quel est l’acide aminé le plus probable ?

• les solutions• Neural Network

- Rescue 1 1H

- Rescue 1N 1H 15N

• statistique Bayesienne- Rescue 2 1H 15N 13C

RESidue prediCtion with neUral nEtworks

amélioration de la BMRB

Page 18: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

Rescue - réseau de neurones

"Rescue : an Artificial Neural Network tool for helping the NMR spectral assignment of proteins" J.L.Pons and M-A. Delsuc J.Bio.NMR 15 p15-26 (1999)

Page 19: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

Rescue 2 : approche Bayesienne

42 Antoine Marin et al.

Figure 3. Gaussian distributions of the chemical shifts of the nuclei HN , H!, H", N, C!, C", C’

among the 20 amino acids. The parameters of the Gaussian distributions were calculated from

a statistical analysis of the 783 non homologous sequences of the BMRB. The distribution are

displaying with the same spectral width among hydrogens and heavy atoms. For each nucleus,

the parameters #/D are given on the corresponding plot (see subsection 4.1).

article_bayes.tex; 21/11/2003; 11:16; p.42

Marin A., Malliavin T.E., P.Nicolas and Delsuc M.A., From NMR chemical shifts to amino acid types: investigation of the predictive power carried by nuclei. J.Bio.NMR, 2004. 30 p 47-60.

-analyse des valeurs (premier ordre)-analyse des corrélations (deuxième ordre)

HN H! H" N C C! C"

Page 20: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

Rescue 2 - résultats44 Antoine Marin et al.

25%

50%

75%

100%

A R D N C E Q G H I L K M F S T W Y V

Figure 5. Sensitivity (solid) and specificity (hatched) results for all the amino acid types in

the frame of the CCSM using the H!, HN , N, C!, C’ and C" nuclei. The number of predicted

spin systems in this test is 26,148.

A (7.8) A

R (4.7) E Q K R M

D (6.4) D L N

N (4.4) N D

C (1.5) W H Y C V N

E (8.1) E Q K

Q (4.4) Q E

G (9.3) G

H (1.9) Q H W E K

I (5.8) I Y

L (9.3) L D

K (7.8) K M

M (2.0) K M E Q

F (4.0) Y I

S (6.0) S

T (5.4) T

W (1.2) W Q E H

Y (3.0) Y I

V (7.1) V

Figure 6. The types of amino acid predicted are displayed according to the type of the amino

acid in the prediction input (left of the figure). The prediction results are given for the CCSM

model using the H!, HN , N, C!, C’ and C" nuclei. The numbers given in brackets are the

population sizes of the input in percentage. For each amino acid type, the proportion observed

for its own type (in gray) represents the sensitivity for this type.

article_bayes.tex; 21/11/2003; 11:16; p.44

44 Antoine Marin et al.

25%

50%

75%

100%

A R D N C E Q G H I L K M F S T W Y V

Figure 5. Sensitivity (solid) and specificity (hatched) results for all the amino acid types in

the frame of the CCSM using the H!, HN , N, C!, C’ and C" nuclei. The number of predicted

spin systems in this test is 26,148.

A (7.8) A

R (4.7) E Q K R M

D (6.4) D L N

N (4.4) N D

C (1.5) W H Y C V N

E (8.1) E Q K

Q (4.4) Q E

G (9.3) G

H (1.9) Q H W E K

I (5.8) I Y

L (9.3) L D

K (7.8) K M

M (2.0) K M E Q

F (4.0) Y I

S (6.0) S

T (5.4) T

W (1.2) W Q E H

Y (3.0) Y I

V (7.1) V

Figure 6. The types of amino acid predicted are displayed according to the type of the amino

acid in the prediction input (left of the figure). The prediction results are given for the CCSM

model using the H!, HN , N, C!, C’ and C" nuclei. The numbers given in brackets are the

population sizes of the input in percentage. For each amino acid type, the proportion observed

for its own type (in gray) represents the sensitivity for this type.

article_bayes.tex; 21/11/2003; 11:16; p.44

• spécificité• sensibilité

Page 21: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

• SVM support vector machine (vapnik 1998)

• CRAACK• approche consensus

CRAACK : approche consensus

classificationséparation maximalenoyau non-linéaire

marge maximale

Consensus Rules for Amino-Acids Characterization using Kernel methodsBenod C., Delsuc M.A. and Pons J.L., CRAACK: consensus program for NMR

amino acid type assignment. J Chem Inf Model, (2006). 46(3): p. 1517-22.

Page 22: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

7

Technical organization of Craack

Figure 1 presents the general flowchart of the Craack method, as presented in this work. All the chemical shifts

available for a given spin-system are directed, with the proper format, to each typing modules used for the consensus

approach. Each module generates an answer, and this set of answers is used as input by the consensus algorithms.

Figure 1

Figure 1: Flowchart of CRAACK. A studied spin system is input into the five typing modules which generate five

different answers. These answers are used by two consensus strategies (SVM and classic vote algorithm) in order to

propose a global typing solution.

First approach: consensus by SVM

The first strategy is based on Support Vector Machines 11. MySVM toolbox software 17 was chosen for this purpose

because it allows to code the entries with a high level of missing values.

It is essential to design an optimum coding of the typing modules output, as an input vector for the SVM consensus

step. This input vector should contain the information from the five modules, including the predicted residues and

their associated scores. This coding has to bring all the necessary information for the final residues classification

without bringing unwanted noise. Several types of input coding were tested. The most effective coding consists of

SVM consensus strategy Vote consensus strategy

Group + Score Residue + Score

Spin system

HN, H!, H", HN, N, H!, H",

C!, C"

HN, N, H!, H",

C!, C"HN, N, H!, H",

C!, C"

HN, N, H!, H",

C!, C"

Rescue RescueN Rescue 2 Platon SVMTyp

Group + Score Group + Score Residue + Score Residue + Score Residue + Score

CRAACK : deux approches• approche consensus

- Rescue 1 et 2

- Platon (Labudde, D.; Leitner, D.; Kruger, M.; Oschkinat, H. J Biomol NMR. 2003, 25(1), 41-53.)

- SVMtyping

Page 23: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

10

Result & discuss

The results obtained with the two consensus methods are shown in table 3, and are detailed in figure 2.

Table 3. Rate of success of the consensus typing modules.

Typing tools Strategy Selective

Capacity

Rate of success

on TestDBc

(2000 tests)

Rate of success

on TestDB

(8000 tests)

Rate of success

on RefDB

(19299 tests)

Consensus by SVM Support Vector

Machine

8 groups 95.43 %

(1924 answers)

87.77 %

(7610 answers)

91.7 %

(18206 answers)

Consensus by Vote Vote Algorithm 20 residues 73 %

(2000 answers)

54.53 %

(8000 answers)

71.09 %

(19299 answers)

Table 3 presents the global efficiency of the two consensus methods. It can be seen that the SVM tool, separating the

residues type in 8 classes, is more efficient than any other single typing tool with the same selective capacity.

Similarly, the Vote tool, with a classification of all the amino acids is also more efficient than any other tool. Figure

2 details the residual errors obtained with both consensus methods.

Figure 2

Figure2: Response of the two consensus strategies as observed on the RefDB test base. For each family of aminoacid analyzed (located on the left) the number of answer is given graphically (by colour). The rate of success for the

résultats

Benod C., Delsuc M.A. and Pons J.L., CRAACK: consensus program for NMR amino acid type assignment. J Chem Inf Model, (2006). 46(3): p. 1517-22.

Page 24: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

12

reliability. On the SVM approach, the rate of success is 91.7 if only answers with positive reliability are taken into

account. 68.4% of the base of test is predicted with a reliability superior to 2. The rate of success on this part of the

base is then 97,5%. In spite of a limited dividing power (8 groups), this tool has an excellent performance.

In the case of the vote consensus, reliability was computed from the number of votes in the process. Values go from

0 to 6.8. The figure 3 shows that more than 75% of the test base is predicted with a reliability superior or equal to 4.

The rate of correct answers on this part of the base is more than 80%.

Figure 3

Figure 3: Rate of success on TestDBc for all the answers over a given reliability coefficient (abscissa). The

percentage of entries for a given reliability is also given. In case of SVM strategy, only answers with positive

reliability are taken into account.

Five supplementary proteins were chosen to test the Craack procedure. These proteins are not present in the test

bases used for the optimization of the consensus. They are described in the table 4 and are composed of helix alpha

and beta sheet in equivalent proportions, all are 15N and 13C labelled. The rate of correct answers in the residue

typing is located between 75.76% and 91.59% for the consensus by SVM and between 67.09 and 77.57% for the

consensus by vote.

13

The secondary structure rates of correct prediction are indicated in table 4. Coil zones were not taken into account

for the computation of the rates. The results for the secondary structure prediction are fair albeit based on the

approximate results of the typing prediction step. It can be observed on the five tested proteins that the quality of the

secondary structure prediction is variable, and does not seem to be related to the state, helix or sheet, of the protein.

Table 4. Rate of success of typing consensus and secondary structure prediction for 5 proteins not present in the

test and learning bases.

Protein name

(PDBid) 21

Secondary structure % SVM % Vote % secondary

structure *

Human Kinase B Protein

(1P6S) 22

Sheet + Helix 75.76 67.96 67.14

MPS One Binder

(1R3B) 23

Sheet + Helix 77.11 68.67 66.15

Focal Adhesion Kinase

(1QVX) 24

Helix 91.59 77.57 57.95

Cd44 Antigen

(1POZ) 25

Sheet + Helix 77.16 69.29 84.69

Superoxide dismutase

(1RK7) 26

Sheet 81.64 67.09 43.59

* Random coil zones were not taken into account in the percentage of correct secondary structure prediction

The figure 4 shows more exactly the case of the Human Kinase B protein. The typing step and the secondary

structure prediction are shown. It is to note that errors are very often associated to a low typing reliability. On the

contrary, for reliabilities larger than 5 (on the scale from 0 to 9) only 13 errors can be observed among which 8 are

confusions already described between Y and F, D and N, E and Q.

Figure 4

........11........21........31........41........51........61

MNEVSVIKEGWLHKRGEYIKTWRPRYFLLKSDGSFIGYKERPEAPDQTLPPLNNFSVAEC

MNQVSVIIMGHLYTLGLYIMTWR-IFLALKSNGSYIGDVQR-QA-DQTD--LDNFSNA—-

59479764487752375763767-37436498897686354-58-8774--7896929—-

------EEEEEEEEEE—--EEEEEEEEE—--EEEEE-EEEEE—-EEEEE-EEEE--EEE

--E-E—EEEHEE-H—-HE-HEEE-EHEEEEHEEEEEE-EEH-EE----E--EEEEE----

........71........81........91........101.......111.

QLMKTERPRPNTFVIRCLQWTTVIERTFHVDSPDEREEWMRAIQMVANSLK

MLMATQK-A-NTFVI----TT------FHVDS-DERQQNIRAIQMVANSLK

3563743-3-67776----26------86598-987776369785799997

EEEE-------EEEEEE---------EEEE--HHHHHHHHHHHHHHHHHH-

-EEEEEE-E-EH-EEE----E------EEE-E-H-HHHHHHHHHHHHH--E

Figure 4: Typing prediction (Vote strategy) and secondary structure prediction for human kinase B protein. The

actual primary sequence and secondary structure are in black. Predictions are shown in red. Secondary structure is

defined with the PSEA software 27. The consensus vote reliability (0 to 9) is shown (below typing prediction) and

grey zones indicate prediction with reliability larger or equal to 5.

Conclusion

Page 25: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

Application - protéine IB5

• IB5- protéine salivaire

- faible complexité- 70 aa

- 29 prolines

- dipeptide : 0.156 tripeptide : 0.531 (T. Nandi et al (2003) J.Biomol.Struct.& Dyn.,)

- non structurée en solution

- interaction spécifique avec les polyphénol

SPPGKPQGPPQQEGNKPQGPP PPGKPQGPP PAGGNPQQPQAPPAGKPQGPP PPPQGGR PPRPAQGQQPPQ

EpiGalloCatéchineGallate

Page 26: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

spectres

IB5 - 50µMH2O - pH 3.5600 MHzcryo-sonde

Page 27: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

spectres

IB5 - 50µMH2O - pH 3.5600 MHzcryo-sonde

Page 28: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

spectres

IB5 - 50µMH2O - pH 3.5600 MHzcryo-sonde

Page 29: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

superposition IB5 - EGCG

IB5 - 50µMEGCG 12 eq

Page 30: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

suivit de pics• pour lever le problème de la superposition

Ravel et al 23/23

Figures

Figure 1

Canonical situation observed during peak list analysis; examples are given with 4 peak lists :

( ) ( )1 ,..., 4L L

a) True positive peak path, (A, B, C, D).

b) Path (A, B', C, D) with a false negative peak B' (missing peak).

c) Path (A, B', C, D) with a false positive peak B' next to a complete peak path (A, B, C, D).

d) Two paths (A,B,C,D) and (A,B',C,D) with overlapping peaks A,C D.

e) Two paths (A,B,C,D) and (A',B,C',D') with overlapping peak B.

f) Two paths crossing (A,B,C,D) and (A',B',C',D').

Page 31: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

suivit de pics• pour lever le problème de la superposition

Ravel et al 23/23

Figures

Figure 1

Canonical situation observed during peak list analysis; examples are given with 4 peak lists :

( ) ( )1 ,..., 4L L

a) True positive peak path, (A, B, C, D).

b) Path (A, B', C, D) with a false negative peak B' (missing peak).

c) Path (A, B', C, D) with a false positive peak B' next to a complete peak path (A, B, C, D).

d) Two paths (A,B,C,D) and (A,B',C,D) with overlapping peaks A,C D.

e) Two paths (A,B,C,D) and (A',B,C',D') with overlapping peak B.

f) Two paths crossing (A,B,C,D) and (A',B',C',D').

Ravel et al 26/26

a)

b)

Page 32: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

suivit de pics• pour lever le problème de la superposition

Ravel et al 23/23

Figures

Figure 1

Canonical situation observed during peak list analysis; examples are given with 4 peak lists :

( ) ( )1 ,..., 4L L

a) True positive peak path, (A, B, C, D).

b) Path (A, B', C, D) with a false negative peak B' (missing peak).

c) Path (A, B', C, D) with a false positive peak B' next to a complete peak path (A, B, C, D).

d) Two paths (A,B,C,D) and (A,B',C,D) with overlapping peaks A,C D.

e) Two paths (A,B,C,D) and (A',B,C',D') with overlapping peak B.

f) Two paths crossing (A,B,C,D) and (A',B',C',D').

Ravel et al 26/26

a)

b)

Ravel et al 26/26

a)

b)

Page 33: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

suivit de pics• pour lever le problème de la superposition

Ravel et al 23/23

Figures

Figure 1

Canonical situation observed during peak list analysis; examples are given with 4 peak lists :

( ) ( )1 ,..., 4L L

a) True positive peak path, (A, B, C, D).

b) Path (A, B', C, D) with a false negative peak B' (missing peak).

c) Path (A, B', C, D) with a false positive peak B' next to a complete peak path (A, B, C, D).

d) Two paths (A,B,C,D) and (A,B',C,D) with overlapping peaks A,C D.

e) Two paths (A,B,C,D) and (A',B,C',D') with overlapping peak B.

f) Two paths crossing (A,B,C,D) and (A',B',C',D').

Ravel et al 26/26

a)

b)

Ravel et al 26/26

a)

b)

Ravel et al 28/28

0.0% 2.5% 5.0% 7.5% 10.0% 12.5% 15.0% 17.5% 20.0% 22.5% 25.0%

0.0%

2.5%

5.0%

7.5%

10.0%

12.5%

15.0%

17.5%

20.0%

22.5%

25.0%

0.95-1

0.9-0.95

0.85-0.9

0.8-0.85

0.75-0.8

0.7-0.75

0.65-0.7

0.6-0.65

0.55-0.6

0.5-0.55

0.45-0.5

0.4-0.45

k! "

k! +

Rate of success

Figure 5

Evolution of the GAPT performance according to the error rates ( )k!+ , ( )k!" .

Five successive experiments were simulated (M=5). For each experiment, the rates of false positive

and false negative peaks ( )k!+ , ( )k!" were varied between 0% and 25% with 5% steps. The

number of peaks per peak list ( )N k was fixed to 70. The noise level was fixed to 2.5%. The

crowding rate was in the range of [1.8,2.2]. The whole simulation was then repeated 15 times

altogether. The simulation involved 540 = 36*15 GAPT analyses. The chart presents the mean over

15 simulations.

Page 34: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

Ravel et al 30/30

Figure 7

15N HSQC NMR spectrum of wheat nsLTP-2 liganded to lysophosphatidyl myristoyl glycerol at

300 K. The crosses present the location of the detected peaks, detected as local maxima, and further

fitted to a 2D Gaussian shape. The outlined area is studied in detail in the text and in Figure 5.

LTP-7variation de T300 K - 310K par pas de 2.5K

GAPT : General Algorithm for nmr Peak TrackingRavel P., Kister G., Malliavin T.E. and Delsuc M.A., A General

Algorithm for Peak-Tracking in multi-dimensional NMR experiments. J.Bio.NMR, (2007). sous presse.

Page 35: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

Ravel et al 30/30

Figure 7

15N HSQC NMR spectrum of wheat nsLTP-2 liganded to lysophosphatidyl myristoyl glycerol at

300 K. The crosses present the location of the detected peaks, detected as local maxima, and further

fitted to a 2D Gaussian shape. The outlined area is studied in detail in the text and in Figure 5.

Ravel et al 31/31

Figure 8

Zoom of the outlined area in Figure 5. The red arrows show paths. The circles represent the

frequency fitted peaks. Blue circles are for the first experiment obtained at 300 K, pink circles for

the last experiment obtained at 310 K and black circles for the other experiments [302.5 K,307.5 K].

Green circles represent false negative peaks. i.e. missing peaks added by the algorithm.

LTP-7variation de T300 K - 310K par pas de 2.5K

GAPT : General Algorithm for nmr Peak TrackingRavel P., Kister G., Malliavin T.E. and Delsuc M.A., A General

Algorithm for Peak-Tracking in multi-dimensional NMR experiments. J.Bio.NMR, (2007). sous presse.

Page 36: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

NPK

• logiciel de traitement des données RMN• successeur de Gifa

• écrit en python

• structuré en couches logiques séparées

NMR Processing Kernel

mathematical kernel library } low level FORTRAN code

elementary actions

processing phases

complete strategies !high level python code

user interface

1

Modeling of NMR processing, toward e!cient unattendedprocessing of NMR experiments.

Dominique Tramesel1,2, Vincent Catherinot2 Marc-Andre Delsuc1!

December 10, 2006

submitted to J.Magn.Reson.

short title : modeling of NMR processing

1 CNRS UMR5048, Centre de Biochimie Structurale, F34090 Montpellier, France;INSERM U554, F34090 Montpellier, France;Universite Montpellier 1 et 2, F34090 Montpellier, France

2 NMRtec S.A.S., 288 rue d’Uppsala, F34080 Montpellier, France

* to whom correspondence should be addressede-mail address : [email protected] : (+33) 467 41 77 08fax : (+33) 467 41 79 13

1

soumis à J.Magn.Reson

Page 37: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

NPK

a)

b)def shear( slope, pivot ):

"""

shearing 2D

slope removed to the spectrum by the operation.

1 corresponds to going from the main diagonal to the horizontal

pivot : 0..1

position of the invariant column. O: lower side - 1: upper side

shear(1,0.5) : goes from SECSY to COSY

shear(-1,0.5) : goes from INADEQUATE to COSY

shear(7.0/9,0.5) : shears a Spin 3/2 MQMAS

see also : tilt.g

"""

if ( get_dim() != 2 ):

raise "To be applied on 2D only"

si = get_si1_2d()

sisi = 2*power2(si-2)

chsize( sisi, get_si2_2d())

if ( get_itype_2d() == 0 or get_itype_2d() == 1 ):

thetype = "real"

iftbis("f1")

else:

thetype = "complex"

ift("f1")

si2 = get_si2_2d()

sw2 = get_specw_2_2d()

sw1 = get_specw_1_2d()

for i in range(1, si2+1 ):

col(i)

dim(1)

f2 = (i-pivot*si2)*sw2/si2

decal = f2 * si / sw1

coef = -180.0 * decal * slope

phase( 0.5*coef, coef )

dim(2)

put("col", i)

if ( thetype == "real" ):

ftbis("f1")

else:

ft("f1")

chsize( si, get_si2_2d())

1

a)

-1012345678910ppm

b)

0,5

11,5

22,5

33,5

44,5

5

00,511,522,533,544,555,5ppm

ppm

1

traitement automatique ILT et DOSYshearing continu

et aussi :• 1D - 2D - 3D• MaxEnt et échantillonnage partiel• paramètres par défaut• programme extensible (python)• base de développement

Page 38: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

criblage d’interaction

• principe• protéine connue, structure 3D connue

• mesure de 2D 15N-HSQC

• attribution partielle statistique

• criblage rapide par analyse de déplacement de raies

• exemple• criblage de l’interaction LTP-7 / molécule hydrophobe

Page 39: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

• 300 µM LTP - 50µl , 70µg• 20 min. acquisition

• 10 eq acide taurocholique• DMSO comme co-solvant

Page 40: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

à faire ...

• criblage

• détermination de repliement• séquence primaire ➞ structures 3D potentielles

• attribution + secondaire + contacts

• travail sur les matrices de proximité (match-matrix)

• RMN du solide• contacts 13C-13C

• contacts HCCH

Page 41: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

histogrammes des distances

0

1

2

3

4

5

1 1.5 2 3 4 50

2

4

6

8

10

1 1.5 2 3 4 5

1H C-C HC-HC

nombre de voisins nombre de voisins cumulés

Page 42: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

disponibilité des logiciels

• attribution• Rescue 1 et 2

• CRAACK (consensus)

• traitement• NPK (traitement de spectres)

• GAPT (suivit de pics)

• structures• Fire,...

en-ligneen-ligne

téléchargementtéléchargement

?

http://abcis.cbs.cnrs.fr

Page 43: Différentes approches pour l'attribution automatique de spectres de

Str

uct

ure

san

s at

trib

uti

on

M-A

Del

suc

- 6

mar

s 20

07

Remerciements

• premiers travauxJean-Yves LallemandThérèse MalliavinDaniel AbergelAlain Rouh

• Génération de structure

Thérèse Malliavin Vincent CatherinotPatrice RavelPhilippe Barthe

• Attribution statistiqueJean-Luc PonsAntoine MarinDaniel AuguinCindy Benod

• NPKVincent CatherinotPatrice RavelDominique Tramesel

• IB5-polyphénolsVéronique Cheynier Christine Pascal