16
Sommaire Base de donnees Modele entite-relation Identifiants normalises Programme Python de gestion des annotations Choix techniques Processus Remarques et ameliorations possibles Comparaisons d'annotations Conclusion

Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Embed Size (px)

Citation preview

Page 1: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Sommaire● Base de donnees

• Modele entite-relation

• Identifiants normalises

● Programme Python de gestion des annotations

• Choix techniques

• Processus

• Remarques et ameliorations possibles

● Comparaisons d'annotations

● Conclusion

Page 2: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Base de donnees – Modele entite-relation

Page 3: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Bases de donnees – Identifiants

normalisesIDENTIFIANTS NORMALISES IDENTIFIANTS AFFX IDENTIFIANTS RESNET

GENE_SYMBOL Gene Symbol Name GENE_SYMBOL

ALIAS NA Alias NA

GENE_NAME Gene Title Description GENE_NAME

ENTREZ_ID Entrez Gene LocusLink ID LOCUS_ID

UNIGEN_ID UniGene ID Unigene ID UNIGENE

CHROMOSOME_LOCATION Chromosomal Location homo sapiens chromosome position MAP

OMIM_ID OMIM OMIM ID OMIM

ENSEMBL_ID Ensembl NA NA

ACCESS_NUMBER Genbank ID ACC_NUM

REFSEQ_ID RefSeq Protein ID NA REFSEQ

SWISSPROT_ID SwissProt Swiss-Prot Accession NA

GO_ID GO_ID GO_ID

GO_DESCRIPTION GO_DESCRIPTION GO_ID

PATHWAY Pathway KEGG pathway PATH

IDENTIFIANTS BIOCO

RefSeq Transcript IDAnnotation Transcript Cluster

Gene Ontology Biological Process Gene Ontology Molecular Function Gene Ontology Cellular Component Gene Ontology Biological Process Gene Ontology Molecular Function Gene Ontology Cellular Component

Page 4: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Gestion des annotations – Choix

techniques

Implementation utilisant Python 2.3.4• Facilite et rapidite d'utilisation.• Excellent exercice dans le cadre de ce stage.• Modules pour travailler avec MySQL (mySQLdb) et

pour lire des fichiers .csv (csvReader).

Page 5: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques
Page 6: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques
Page 7: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques
Page 8: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Gestion des annotations – Remarques

et ameliorations futures

● Uniformiser le format des fichiers dans le fond et dans la forme● Meme traitement pour 2 organismes differents● Relation entre 2 probesets appartenant a differentes especes pour un

meme gene● Traitement des experiences

• Analyse comparative CUFI/NULI (Berthiaume) vs CF/non-CF

(Wright)• Analyse comparative NULI/DMNQ vs ATII/TNF (Berthiaume)

Page 9: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques
Page 10: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques
Page 11: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Comparaison des annotations Differences

entre les gene_symbol+--------------+-----------+-----------------------------------------------+---------------+| probe | Affy | ResNet | BioCo |+--------------+-----------+-----------------------------------------------+---------------+ | 1552279_a_at | PCFT | SARM1 | HCP1 || 1552318_at | DCTN5 | ARC | GIMAP1 || 1552393_at | ENTHD1 | FLJ25421 | RP1-172B20.3 || 1552394_a_at | ENTHD1 | FLJ25421 | RP1-172B20.3 || 1552405_at | NLRP5 | MATER | NALP5 || 1552411_at | DEFB106B | DEFB106 | DEFB106A | | 1552412_a_at | DEFB106B | DEFB106 | DEFB106A || 1552449_a_at | LOC653486 | RYD5 | SCGB1C1 || 1552514_at | WBP2NL | MGC26816 | CTA-250D10.11 || 1552531_a_at | NLRP11 | PYPAF6 | NALP11 || 1552641_s_at | LOC732419 | TOB3 | ATAD3B || 1552641_s_at | LOC727868 | TOB3 | ATAD3B | | 1552641_s_at | ATAD3A | TOB3 | ATAD3B || 1552663_a_at | ERC1 | ELKS | RAB6IP2 || 1552833_at | B3GNT6 | IMAGE:4907098 | B3Gn-T6 | | 1552834_at | B3GNT6 | IMAGE:4907098 | B3Gn-T6 || 1552882_a_at | FAM123B | FLJ39827 | RP11-403E24.2 || 1552927_at | MAP3K7IP3 | MGC45404 | TAB3 | | 1552928_s_at | MAP3K7IP3 | MGC45404 | TAB3 || 1552932_at | NLRP6 | PYPAF5 | NALP6 || 1553002_at | DEFB105B | DEFB105 | DEFB105A | | 1553247_a_at | CYP4F8 | ZNF564 | ZNF709 || 1553315_at | SLFNL1 | FLJ23878 | RP11-348A7.4 || 1553320_s_at | LOC641983 | MGC26484 | CDC14C | | 1553320_s_at | CDC14B | MGC26484 | CDC14C || 1553320_s_at | LOC648060 | MGC26484 | CDC14C || 1553326_at | RXFP2 | GREAT | LGR8 | | 1553340_s_at | AMAC1 | AMAC | AMAC1L2 || 1553590_at | FAM27E1 | MGC42630 | LOC158318 || 1553639_a_at | GBP2 | PERC | PPARGC1B | | 1553639_a_at | GBP4 | PERC | PPARGC1B || 1553695_a_at | NLRX1 | FLJ21478 | NOD9 || 1553761_at | C22orf30 | MGC50372 | RP4-694E4.2 || 1553817_at | LOC727983 | POM121L1 | DKFZP434P211 || 1553817_at | LOC651452 | POM121L1 | DKFZP434P211 || 1553817_at | LOC728451 | POM121L1 | DKFZP434P211 | | 1553817_at | LOC646074 | POM121L1 | DKFZP434P211 || 1553817_at | LOC728418 | POM121L1 | DKFZP434P211 || 1553818_x_at | LOC727983 | POM121L1 | DKFZP434P211 | | 1553818_x_at | LOC651452 | POM121L1 | DKFZP434P211 |

Page 12: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

+--------------+-----------------------------------------------+-----------------------------------------------+-----------------------------------------------+| probe | Affy | ResNet | BioCo | +--------------+-----------------------------------------------+-----------------------------------------------+-----------------------------------------------+| 1552318_at | Dynactin 5 (p25) | activity-regulated cytoskeleton-associated pr | GTPase, IMAP family member 1 | | 1553247_a_at | cytochrome P450, family 4, subfamily F, polyp | zinc finger protein 564 | zinc finger protein 709 || 1553562_at | CD8b molecule | CD8 antigen, beta polypeptide 1 (p37) | CD8 antigen, beta polypeptide (p37) | | 1553822_at | receptor (chemosensory) transporter protein 1 | receptor transporting protein 1 | receptor transporter protein 1 || 1553823_a_at | receptor (chemosensory) transporter protein 1 | receptor transporting protein 1 | receptor transporter protein 1 | | 1553993_s_at | mediator of RNA polymerase II transcription, | immunoglobulin kappa variable 1/OR-1 | mediator of RNA polymerase II transcription, || 1554194_at | CDNA clone IMAGE:4825132 | PDZ and LIM domain 2 (mystique) | KIAA1967 | | 1554260_a_at | FRY-like | Rac GTPase activating protein 1 | furry homolog-like (Drosophila) || 1554344_s_at | similar to aquaporin 12A | aquaporin 12B | aquaporin 12A | | 1554511_at | WW and C2 domain containing 1 | KIBRA protein | WW, C2 and coiled-coil domain containing 1 || 1554762_a_at | WW and C2 domain containing 2 | BH3-only member B protein | WW, C2 and coiled-coil domain containing 2 | | 1555671_at | islet cell autoantigen 1,69kDa-like | amyotrophic lateral sclerosis 2 (juvenile) ch | amyotrophic lateral sclerosis 2 (juvenile) ch || 1555833_a_at | CDNA FLJ38849 fis, clone MESAN2008936 | immunity-related GTPase family, Q | nucleophosmin (nucleolar phosphoprotein B23, | | 1555855_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | 20-alpha (3-alpha)-hydroxysteroid dehydrogen || 1555855_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | aldo-keto reductase family 1, member C1 (dihy | | 1555856_s_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | 20-alpha (3-alpha)-hydroxysteroid dehydrogen || 1555856_s_at | Aldo-keto reductase family 1, member C2 (dihy | aldo-keto reductase family 1, member C1 (dihy | aldo-keto reductase family 1, member C1 (dihy | | 1555913_at | gon-4-like (C. elegans) | gon-4 homolog (C.elegans) | gon-4-like (C.elegans) || 1555950_a_at | CD55 molecule, decay accelerating factor for | decay accelerating factor for complement (CD5 | CD55 antigen, decay accelerating factor for c | | 1556078_at | Hypothetical protein LOC143286 | chromosome 10 open reading frame 6 | mitochondrial ribosomal protein L43 || 1556088_at | olfactory receptor, family 5, subfamily T, me | RPA interacting protein | complement component 1, q subcomponent bindin |

Comparaison des annotations

Differences entre les gene_name

Page 13: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Comparaison des annotations

Differences entre les Entrez_id+--------------+--------+--------+--------+| probe | Affy | ResNet | BioCo |+--------------+--------+--------+--------+| 1552281_at | 5826 | 378941 | 283375 | | 1552281_at | 5826 | 72002 | 283375 || 1552281_at | 5826 | 72086 | 283375 || 1552302_at | 728772 | 103625 | 113277 || 1552302_at | 728772 | 217203 | 113277 || 1552303_a_at | 728772 | 103625 | 113277 | | 1552303_a_at | 728772 | 217203 | 113277 || 1552318_at | 84516 | 312312 | 170575 || 1552318_at | 84516 | 16205 | 170575 || 1552318_at | 84516 | 11838 | 170575 || 1552318_at | 84516 | 23237 | 170575 | | 1552318_at | 84516 | 53837 | 170575 || 1552318_at | 84516 | 54323 | 170575 || 1552318_at | 84516 | 97989 | 170575 || 1552381_at | 84669 | 272009 | 135295 || 1552449_a_at | 653486 | 338417 | 147199 | | 1552474_a_at | 7402 | 25257 | 2593 || 1552474_a_at | 7402 | 14431 | 2593 || 1552474_a_at | 7402 | 103105 | 2593 || 1552611_a_at | 23091 | 362552 | 3716 || 1552611_a_at | 23091 | 84598 | 3716 | | 1552611_a_at | 23091 | 16451 | 3716 || 1552611_a_at | 23091 | 100022 | 3716 || 1552611_a_at | 23091 | 230508 | 3716 || 1552611_a_at | 23091 | 319959 | 3716 || 1552641_s_at | 55210 | 388767 | 83858 | | 1552641_s_at | 55210 | 170769 | 83858 |

Page 14: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Comparaison des annotations

Differences entre les Unigen_id+--------------+-----------+-----------+-----------+| probe | Affy | ResNet | BioCo |+--------------+-----------+-----------+-----------+ | 1007_s_at | Hs.631988 | Mm.5021 | Hs.520004 || 1007_s_at | Hs.631988 | Rn.7807 | Hs.520004 || 1053_at | Hs.647062 | Mm.383189 | Hs.139226 || 1053_at | Hs.647062 | Rn.113319 | Hs.139226 || 1552263_at | Hs.431850 | Rn.34914 | Hs.568258 || 1552263_at | Hs.431850 | Mm.196581 | Hs.568258 || 1552264_a_at | Hs.431850 | Rn.34914 | Hs.568258 || 1552264_a_at | Hs.431850 | Mm.196581 | Hs.568258 || 1552281_at | Hs.94395 | Mm.22983 | Hs.524506 || 1552281_at | Hs.94395 | Hs.556043 | Hs.524506 || 1552286_at | Hs.437691 | Mm.159369 | Hs.534515 || 1552301_a_at | Hs.143046 | Mm.33477 | Hs.178728 || 1552301_a_at | Hs.143046 | Rn.28432 | Hs.178728 || 1552309_a_at | Hs.632387 | Mm.200188 | Hs.22370 || 1552309_a_at | Hs.632387 | Rn.107975 | Hs.22370 || 1552314_a_at | Hs.185774 | Mm.227733 | Hs.469543 || 1552315_at | Hs.647087 | Mm.25405 | Hs.159955 || 1552315_at | Hs.647087 | Rn.10086 | Hs.159955 || 1552315_at | Hs.647087 | Hs.40888 | Hs.159955 || 1552316_a_at | Hs.647087 | Rn.10086 | Hs.159955 || 1552316_a_at | Hs.647087 | Hs.40888 | Hs.159955 || 1552316_a_at | Hs.647087 | Mm.25405 | Hs.159955 || 1552318_at | Hs.435941 | Hs.40888 | Hs.159955 || 1552318_at | Hs.435941 | Mm.25405 | Hs.159955 || 1552318_at | Hs.435941 | Rn.10086 | Hs.159955 || 1552330_at | Hs.513832 | Hs.567640 | Hs.534773 || 1552337_s_at | Hs.591609 | Rn.141410 | Hs.386365

Page 15: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Comparaison des annotations

Differences entre les Chromosome_location+--------------+--------------------------+------------------+----------------------------------------------+| probe | Affy | ResNet | BioCo | +--------------+--------------------------+------------------+----------------------------------------------+| 1552281_at | 14q24.3 | 12q13.3 | 12q13.2 | | 1552318_at | 16p12.1 | 8q24.3 | 7q36.1 || 1553034_at | 21q22.3 | 1q44 | 1q43-q44 | | 1553432_s_at | 16p12.1 | 16p12.2 | 16p12.2|16p12.2 || 1553639_a_at | 1p22.2 | 5q32 | 5q33.1 | | 1554500_a_at | 1q43|1q23.1 | 1q43 | 1q43|1q23.1 according to Sierra (Genomics 79 || 1554500_a_at | 1q43|1q23.1 | 1q43 | 177 | | 1554500_a_at | 1q43|1q23.1 | 1q43 | 2002) [AFS] || 1555282_a_at | 1p22.2 | 5q32 | 5q33.1 | | 1555671_at | 2q33.1 | 2q33.2 | 2q33 || 1556088_at | 11q11 | 17p13.2 | 17p13.3 | | 1557203_at | Xq13.1 | Xq13.1-q13.2 | Xq13.2 || 1557886_at | 17q24.1-q24.2 | 17q24.2 | 17q24.1 | | 1559285_at | 17q21.31 | 14q32.3-qter | 14q32.3-qter|14q32 || 1559501_at | 21q22.2 | 2p22-p21 | 21q22.12 | | 1559917_a_at | 21q22.2 | 2p22-p21 | 21q22.12 || 1561669_at | 3p12-p11.1 | 3p11.1 | 3p11.2 | | 1563221_at | 12q23 | 12q24.11 | 5q32 || 1563488_at | 12q24.31-q24.32 | 12q24.3 | 12q24.32 | | 1565454_at | Xp11.22-p11.21 | Xp11.22 | Xp11.21 || 1565772_at | 11q13-q14 | 11q13.5 | 11q14.1 | | 1567862_at | 1q42.12 | 1 | 1q42 || 1568884_at | 7q22 | 7q22-q32 | 4q28 | | 1569519_at | 1p36.13 | 1p13-p11 | 1q21.1 || 1569519_at | 1q21.2 | 1p13-p11 | 1q21.1 | | 201003_x_at | 1q32 | 1q32.2 | 20q13.2 || 201003_x_at | 3q26.31 | 1q32.2 | 20q13.2 | | 201003_x_at | 3q26.31 | 1q32 | 20q13.2 || 201104_x_at | 1q21.1 | 1p13-p11 | 1q12-1q21.2 | | 202938_x_at | 22q13.2 | 22q13 | 22q13.2-q13.31 || 203624_at | Xp22.32; Ypter-p11.2 | Xp22.3 or Yp11.3 | Xp22.32 | | 203624_at | Xp22.32; Ypter-p11.2 | Xp22.3 or Yp11.3 | Ypter-p11.2 || 204171_at | 17p11.2 | 17q23.2 | 17q23.1 | | 206290_s_at | 1q43|1q23.1 | 1q43 | 1q43|1q23.1 according to Sierra (Genomics 79 || 206290_s_at | 1q43|1q23.1 | 1q43 | 177 |

|

Page 16: Sommaire ● Base de donnees Modele entite-relation Identifiants normalises ● Programme Python de gestion des annotations Choix techniques Processus Remarques

Comparaison des annotations –

Differences entre les OMIM_ID+--------------+-------+--------+--------+--------+| probe | id | Affy | ResNet | BioCo |+--------------+-------+--------+--------+--------+| 121_at | 44932 | 218700 | 167415 | 167415 || 121_at | 44932 | 218700 | 167415 | 218700 || 121_at | 44932 | 167415 | 167415 | 218700 || 1255_g_at | 44933 | 602093 | 600364 | 602093 || 1255_g_at | 44933 | 600364 | 600364 | 602093 || 1255_g_at | 44933 | 602093 | 600364 | 600364 || 1494_f_at | 44941 | 211980 | 122720 | 122720 || 1494_f_at | 44941 | 211980 | 608054 | 122700 || 1494_f_at | 44941 | 122700 | 122720 | 122700 || 1494_f_at | 44941 | 211980 | 608054 | 122720 || 1494_f_at | 44941 | 122700 | 122720 | 122720 || 1494_f_at | 44941 | 122720 | 608054 | 122700 || 1494_f_at | 44941 | 122720 | 608054 | 122720 || 1494_f_at | 44941 | 211980 | 122720 | 122700 || 1494_f_at | 44941 | 122700 | 608054 | 122700 || 1494_f_at | 44941 | 122700 | 608054 | 122720 || 1494_f_at | 44941 | 122720 | 122720 | 122700 || 1552281_at | 44959 | 603214 | 608730 | 608730 || 1552304_at | 44973 | 152427 | 603313 | 603313 || 1552306_at | 44974 | 152427 | 603313 | 603313 || 1552332_at | 44994 | 609761 | 609761 | 609823 || 1552332_at | 44994 | 609823 | 609761 | 609761 || 1552332_at | 44994 | 609823 | 609761 | 609823 || 1552334_at | 44995 | 609823 | 609761 | 609761 || 1552334_at | 44995 | 609823 | 609761 | 609823 || 1552334_at | 44995 | 609761 | 609761 | 609823 |