Upload
alysha-putt
View
212
Download
0
Embed Size (px)
Citation preview
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Plan
1. Introduction
2. Querying sequence databases (60%)
3. Building your own sequence databases (30%)
4. Use of API (10%)
5. Further
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Introduction
1. History
2. Un système de base de données et un outil d’interrogation
3. Principe général d’ACNUC
4. Accès aux programmes et aux bases
5. Déroulement de l’atelier
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Introduction Historique
ACNUC est un système de gestion de bases de données dédié à la gestion des séquences biologiques, en particulier génomiques.
Son développement a débuté en 1980.
Il sert à la fois d'outil d'interrogation et de couche basse pour le développement de logiciel.
Il reste le seul logiciel permettant l'interrogation, transparente pour l'utilisateur, des sous-séquences des séquences présentent dans les banques.
Des développements récents avec Stéphane Delmote permettent d’interroger les banques à distance via un serveur de sockets
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Introduction Principe
Le principal géneral d’ACNUC repose sur l’indexation des fichiers de séquences annotées (EMBL, GenBank, SwissProt ...)
Les différents champs des annotations sont indexés dans des fichier d’index (NOMS, ESPECES, MOT-CLEFS, etc) qui sont mis en relation via des pointeurs.
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Introduction Accès aux programmes et aux bases
Les programmes, les bases de données et la documentation sont accessibles sur le site du PBIL:
http://pbil.univ-lyon1.fr/
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Introduction Workshop progress
Several exercises and examples of applications will be discussed during the workshop.
This presentation and several scripts are available at:
ftp://pbil.univ-lyon1.fr/pub/in2p3/formation_acnuc/
GENERAL DOCUMENTATION:
http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html
QUERY LANGAGE DOCUMENTATION LANGUAGE:http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Query sequence databases
1. First steps with ‘QueryWin’2. The query language
• simple query• séquences and sub-sequences• complicated query
3. Data extraction• several formats• extract peculiar part of the sequences
4. Using ‘query’• simle scripts• complex scripts
5. Using ‘seqinR’• query databases from R
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
First steps with QueryWin
« QueryWin » works on all platforms : Unix/Linux, Mac, Windows
2 versions are availble:
the « local version» works on local databases
the « client version » works on distant databases
Available at PBIL:
http://pbil.univ-lyon1.fr/software/query_win.html
Documentation available at PBIL
http://pbil.univ-lyon1.fr/software/doclogi/docacnuc/acnucwin/acnwian/aquerywin.html
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
First steps with QueryWin
Lauch Query_Win - Mac version: click on the application
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
First steps with QueryWin
Launch Query_Win - on the clusters (local version)
launch query_win on EMBL:
>query_win embl
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
First steps with QueryWin
Lauch Query_Win - on the clusters (local version)
launch query_win on EMBL:
>query_win embl
command window - query language
command buttons
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
First steps with QueryWin
Two ways (not exclusives) of querying tthe database:
1.using buttons and menus
2.using the query language
Exercise 1 :select mouse sequences in EMBL
•method 1:
Click on the buttons select then species and type « mus » in the opening window .
Choose option « build query »
Have a look on the command window.
Execute
Try again with the option « make list »
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
First steps with QueryWin
•method 2: type « sp=mus » in the command window
IMPORTANT :
Queries done with method 1 are displayed as a query langage in the command window
This is an excellent way to learn the query language
From now, try to answer the question with the buttons and menus and observe thow it is translated in query language.
Little by little ,you may tru to use directly the query language.
Another thing: A « HELP » mode is available in Query_Win
Exercice 1 suite
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query languagesimple queries
All operations are possible with query_win (by clicking on buttons or using the query language)
Some simple examples :
-query a sequence according to its name
-query a sequence according to its accession number
-query a sequence according to its species or taxon
-query a sequence according to a keyword
Other examples :
-Which species is associated to this sequence ?
-Which keywords are associated to this sequence ?
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query languagesimple queries
ACNUC query language is described here:
http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE
Exercise 2 :
Query SwissProt
Retrieve sequences of cat (Felis cattus) using the buttons
Retrieve sequences of cat (Felis cattus) using the query language
Compare the results
Exercise 2bis :
Query SwissProt
Retrieve sequences with the taxonomic ID (TaxonID) of the felis genre (tid=9682)
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query languagesimple queries
Exercise 3 :
Query SwissProt
Retrieve sequences associated to the keyword « adenylate cyclase » using the buttons
Retrieve sequences associated to the keyword « adenylate cyclase » using the query language
Check the different annotation fields. Where is adenylate cyclase?
Do the same with GenBank
ACNUC query language is described here:
http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query languagesimple queries
Exercise 4 :
Query GenBank
Retrieve sequences associated to the BTG1 gene
Check the different annotation fields. Where is the information on the gene ?
Do the same with SwissProt
Help
the gene name is a keyword
ACNUC query language is described here:
http://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query languagesimple queries
Use of « wild card » : @
To retrieve keyword beginning with « toto », search for toto@ .
Exercize 5 :
Retrieve sequences associated to keyword beginning with BTG
Note
You may use the wild card for species and sequence name
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query language sequences & sub-sequences
One of the main strength of ACNUC is the definition and the use of sequences and sub-sequences.
ID ESCOL3_3; SV 2; circular; genomic DNA; GRV; PRO; 5498450 BP.XXAC BA000007_GR;XXblah blah blahXXCC This Genome Reviews entry was created from entry BA000007.2 in theCC EMBL/Genbank/DDBJ databases on 03 March 2009.XXFH Key Location/QualifiersFHFT source 1..5498450FT /organism="GR Escherichia coli"FT /strain="Sakai = O157:H7 = RIMD 0509952 = EHEC"FT /mol_type="genomic DNA"FT /chromosome="Chromosome"FT /db_xref="taxon:386585"FT .5F1 5'ncr 1..189FT /cds_name="ESCOL3_3.PE1 "FT .PE1 CDS 190..273FT /codon_start=1FT /gene_name="thrL"FT /locus_tag="ECs0001"FT /protein_id="BAB33424.1"FT /transl_table=11FT /translation="MKRISTTITTTITTTITITITTGNGAG"FT .3F1 3'ncr 274..353FT /cds_name="ESCOL3_3.PE1 "FT misc_structure 215..328FT /gene_name="Thr_leader"FT /db_xref="Rfam:RF00506"FT .5F2 5'ncr 274..353FT /cds_name="ESCOL3_3.PE2 "FT .PE2 CDS 354..2816FT /codon_start=1FT /gene_name="thrA"FT /locus_tag="ECs0002"FT /product="Aspartokinase I, homoserine dehydrogenase I "FT /function="NADP or NADPH binding"FT /function="amino acid binding"FT biosynthetic process"FT /protein_id="BAB33425.1"FT /db_xref="GO:0004072"FT /db_xref="UniProtKB/TrEMBL:Q8XA84"FT /transl_table=11
etc etc
CDS5’ncr3’ncr
5’ 3’1
1
1
22
2
33
3
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
ACNUC defines sequences and sub-sequences.
A sequence may contain many sub-sequences.
For example, a chromosome and its CDS are respectively a sequence containing several sub-sequences
A sub-sequence may be of several type
Exercise 6 :
Query HOGENOMDNA (complete genomes)
Retrieve sequences of Escherichia coli o157:h7 str. sakai
Question: what are these sequences ?
Retrieve sub-sequences of chromosome ESCOL3_3
Question: which type are these sequences ?
Retrieve the CDS of chromosome ESCOL3_3
Back to the séquence ESCOL3_3: check for the CDS in the annotations
The query language sequences & sub-sequences
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Séquences are associated to one species.
All its sub-sequences are associated to this species.
It is not the case of keywords. A keyword may be associated to a sequence or only to one of its sub-sequence.
Exercise 7 :
Query SwissProt
Retrieve sequences associated to the BTG1 gene
Do the same in GenBank
What are these sequences?Help
gene name is a keyword
The query language sequences & sub-sequences
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Combinations of criteria:
•Operations AND, OR, NOT, AND NOT
•Use of parenthesis
•Crossing results list:
Exercice 8 :
Query SwissProt
Retrieve mammalian sequences
Retrieve sequences associated to BTG1
Cross these 2 list : list1 AND list2
Retrieve mammalian sequences associated to BTG1 in a single query
Retrieve mammalian sequences associated to BTG1,BTG2,BTG3 and BTG4 in a single query. How many sequences you obtained?
Indice
beware OR and AND
The query language complex queries
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Other criteria:
year of publication ex: y<1986
author of publication au=marley
idem journal
molecule m=mRNA
organelle o=MITOCHONDRION
type t=CDS
hôte h=homo sapiens
status (not for GenBank) st=EST
The query language complex queries
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Modify a sequences list according to the sequences date or sequence lengths
Exercise 10::
Query SwissProt
Retrieve sequences from mus
Select sequences with more than 300 aa
Select sequences which have been added after Y2K
The query language complex queries
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query language complex queries
Exercise 11:
Query SwissProt
Wich are species in witch BTG1 is found in sequence annotations? (it does not mean that other species do not present this gene)
Solution :retrieve sequences associated to the gene then retrieve the species associated to these sequences)
Exercise 11bis
Do the same in one command line
Exercise 12
Retrieve the name of all the strains of E. coli found in EMBL
Exercice 12bis
Retrieve the list of eukaryots in HOGENOMDNA.
Retrieve the list of fungi.
Help
projecting species
ps
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query language browsing taxonomy and keywords
Both taxonomy and keytwords are organised in a hierarchy.
It is possibleto browse these hierarchies with the button browse of Query_win
A keyword may have « parent ».
For example, EC-numbers are keyword, all descending of the keyword « EC_Number »
This is very useful to sort and select keywords.
You may select a parent keywords in Query_Win by selecting the button « by name », then enter the word and click « exec » then « done
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query language browsing taxonomy and keywords
Exercise 13 :
Query SWISSPROT
Retrieve all keywords associated to human
There is too many keywords!
We only want EC numbers:
Retrieve descending keyowrds of de « EC_NUMBERS »
How many are they?
Exercise 13 bis:
Retrieve EC_NUMBERS associated to human
Vocabulaire
pk list
kd list
(nk=)
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of de files
You may use of files containing:
sequence names
sequence accession number
keywords
species
Exercise 14 :
In Uniprot retrieve the human EC numbers from the file created in exercise 13bis. What are the mouse sequences associated to these EC numbers.
Vocabulaire
fk file
un lmist
ps list
The query language complex queries
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
The query language scan of annotations
It is possible to scan the annotations.
Interesting of the word to scan is not indexed and if the list of sequences to scan is not too big
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Data extractionseveral formats
Exercise 15 :
Query HOGENOMDNA
Selectionner sequences of yeast (saccharomyces)
Extract sequences of chromosomes in FASTA format
Extract sequences of CDS translated into protein in FASTA format
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Data extraction extract part of sequences
Exercise 16 :
In HOGENOMDNA
Selectionner sequences of yeast (saccharomyces)
Extract sequences of CDS in FASTA format
Extract sequences of CDS in EMBL format
Extract 5’non coding sequences in FASTA format
Extract the 1000 first residus of each chromosome in FASTA format
Extract the 500 residus preceding the CDS in FASTA format
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query
« query » is the command line version of query_win
Its interest relies on the possibilty of using scripts.
This helps the automation of th processing, which is very useful in the following cases:
- long suite of queries boring re-write each time: less errors, save time
- use of workflows
- use of generic scripts for different uses
- use on clusters and farms.
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query launching
As Query_Win , 2 versions are available:
local version ( installed on pbil, pbil-dev, et les workers pbil-debX)
client version (query distant databases)
Both available for Linux/Unix, MacOS, Windows.
Locale version : query embl
>query embl
Client version : queryr embl
>raa_query
then choose database, or directly:
>raa_query pbil.univ-lyon1.fr:5558/embl
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query instructions
« query » use the same query language as query_win.
However, there are small differences, especially in the managment of lists.
Do not hesitate to consult help by typing HELP.
Exercise 17
Query HOGENOMDNA (complete genomes)
Retrieve sequences of Escherichia coli o157:h7 str. sakai
Retrieve sub-sequences of chromosome ESCOL3_3
Retrieve CDS of chromosome ESCOL3_3
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query instructions
Solution exercise 17
Query HOGENOMDNA (complete genomes)
Retrieve sequences of Escherichia coli o157:h7 str. sakai
Retrieve sub-sequences of chromosome ESCOL3_3
Retrieve CDS of chromosome ESCOL3_3
Save CDS
query hogenomdna
sel
sp=Escherichia coli O157:h7 str. sakai
mod
list1
5
sel
n=ESCOL3_3 et t=cds
save
list3
list_cds
stop
select a list ( defaut :list1)
selection criterium
modify list
list to be modified
type of modification
selec a new list ( default: list3)
selection criterium
save list
list to be saved
file
exit query
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query instructions
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query use of scripts
A script is used as it follows
query banque << EOF
instructions
instructions
instructions
instructions
EOF
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query use of scripts
Execute precedng exercise with a script.
Moreover, extract CDS in FASTA format
source exemple_script_1.csh
or
csh exemple_script_1.csh
terminalno
Exercice 18
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query use of scripts
source exemple_script_2.csh oucsh exemple_script_2.csh
sel/l=plant
giving a name to the list helps the writing and understanding of the script
This script select homologous gene famiies ( HOGENOM families) shared by plants and cyanobacteria but not by animals.
CDS of Arabidopsis present in these families are saved and extracted in FASTA format
Exercice 19
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of query use of scripts
csh exemple_script_3_bis.csh viridiplantae cyanobacteria metazoa
Use of a script with arguments
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of seqinR
It is possible to query ACNUC databases from the R software.
Use the seqinR package
Exercise 17ter
with R:
Query HOGENOMDNA (complete genomes)
Retrieve the CDS of Escherichia coli o157:h7 str. sakai
Plot the histogram of CDS lengths
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of seqinR
Solution Exercise 17
install.pacakges(« seqinr »)
library(« seqinr »)
choosebank(« hogenomdna »)
query("cds","sp=Escherichia coli o157:h7 str. sakai et t=cds")
lengths<-lapply( cds$req,getLength)
hist(unlist(lengths))
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
Why ?
1. To stock and access to sequences of interest.
• selection and modification of a sub-set of a generalist database
• sequencing
2. Allowing complex queries
3. Create your own keywords and associated hierarachy
4. Automation of queries
5. Share and diffusion
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
How to select a local database:
index are in /ma_banque/index
flat files are in /ma_banque/flat_files
Define environnement variables acnuc et gcgacnucsetenv mabase « /ma_banque/index /ma_banque/flat_files »
query mabase
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
Build a database from annotated data
script build_uniprot.csh
initf : create indexes
acnucgener: indexation of sequences
Documentation:
http://pbil.univ-lyon1.fr/databases/acnuc/acnuc_gestion.html
Exercise 20
build a database in SWISSPROT format
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
Build a database from annotated data
script build_embl.csh
initf : create indexes
acnucgener: indexation of sequences
Documentation:
http://pbil.univ-lyon1.fr/databases/acnuc/acnuc_gestion.html
Exercise 21
build a database in EMBL format
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
By default, many fields are used to define the keywordsHowever it is possible to specify supplementary fields to define keywords.
Examplesearch for keyword HBG298754 in the previously created embl database.
The keyowrd is nout found.. However the field /gene_family="HBG298754" exists(cf séquence ECODH_1.PE2)
Exercise 22
Rebuild the database with
build_embl_customized.csh...
Query for the keyword again.
Defining new keywords (EMBL/GenBank only)
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Qualifier = GENE_FAMILY Use_Value = True Parent_Keyword = GENE_FAMILY
Qualifier = DB_XREF Use_Value = True Parent_Keyword = CROSS REFERENCES
Qualifier = PROTEIN_ID Use_Value = True Parent_Keyword = PROTEIN IDS
Qualifier = %(C+G) Use_Value = True Parent_Keyword = CG_CONTENTS
Qualifier = LOCUS_TAG Use_Value = True Parent_Keyword = LOCUS_TAG
Build your own ACNUC database
Defining new keywords (EMBL/GenBank only)
Use the file « custom_policy » which should be in the directory $acnuc (index)
fichier custom_policyECODH_1.PE2 Location/Qualifiers (length=2463 bp)FT CDS 337..2799FT /codon_start=1FT /gene_family="HBG298754"FT /evidence="4: Predicted"FT /gene_id="IGI03726849"FT /gene_name="thrA"FT /locus_tag="ECDH10B_0002"FT /product="Fused aspartokinase I and homoserineFT dehydrogenase I"FT /function="NADP or NADPH binding"FT /function="amino acid binding"FT /function="homoserine dehydrogenase activity"FT /biological_process="aspartate family amino acidFT biosynthetic process"FT /protein_id="ACB01207.1"FT /db_xref="GO:0004072"FT /db_xref="InterPro:IPR001048"FT /db_xref="UniProtKB/TrEMBL:B1XBC7"FT /transl_table=11FT /%(C+G)="CG<60%"FT /note="C+G content in third codon positions = 57.6 % "//
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
Enrich annotations et create keywords
Yoy may enrich the annotations with adapted keywords.
For example, the following lines
FT /gene_family="HBG298754"FT /%(C+G)="CG<60%"FT /note="C+G content in third codon positions = 57.6 % "
have been added to allows to query the database according to the GC contents or the gene family.
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
Enrich annotations et create keywords
Going further: Modify the annotations and create an associated custom_qualifier_policy file.
Exercise 23
Modify custom_policy to generate different keywords
2 examples custom_qualifier_policy.hogenom custom_qualifier_policy.tp
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
Build a database from raw sequence data (FASTA)
ACNUC database are builded from SwissProt, EMBL ou Genbank format. You need to convert a FASTA file into the correct format to build the database.
Uniprot: script BioPerl
EMBL/GenBank : readseq
http://www.ebi.ac.uk/cgi-bin/readseq.cgi
gener_prot.pl Chlre4_best_proteins_small.fasta Chlre4_best_proteins.dat CHLRE
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Build your own ACNUC database
Build a database from raw sequence data (FASTA)
Exercise 24
Transfom ecoli_dna.fasta file in EMBL format and build an ACNUC database
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Sample sequence management
Yoy may want to do query as: Retrieve all the sequences send to the
sequencing of 15/02/2010 Retrieve all the sequences send to the
sequencing of 15/02/2010 and associated to the « toto » experiment.
Retrieve all the sequences associated to the « toto » experiment and the « tata » species
Build your own ACNUC database
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Steps Step 1: Cleaning and annotation of sequences Step 2: Transform FASTA file into EMBL file
(readseq). Step 3: Add keywords as:
Obtention date Experiment name Etc.
Step 4: Build the database
Build your own ACNUC database
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of API API C/C++
Documentation :General structurehttp://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html
API C (local version)http://pbil.univ-lyon1.fr/databases/acnuc/structure.html
API C (client version ,acces via sockets)http://pbil.univ-lyon1.fr/databases/acnuc/raa_acnuc.html
API C++ (client version ,acces via sockets, Bio++)http://pbil.univ-lyon1.fr/databases/acnuc/bpp-raa/bpp-raa.html
Exemples of API C local version :http://pbil.univ-lyon1.fr/databases/acnuc/example.phphttp://pbil.univ-lyon1.fr/databases/acnuc/cfonctions.html
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of APIAPI C
local version
Exercise 25
test exemple1.c
/*gcc -c exemple1.c -I /bge/banques/csrcgcc -o exemple1 exemple1.o -L /bge/banques/csrc -lcacnucdeb*/
#include "dir_acnuc.h"
main(int argc,char *argv[]) {/*char my_taxon[] = "Bovidae"; /* case ignored */char my_taxon[500];int num, err, *list, numsp;int i = 2;
if (argc == 1) {fprintf(stderr,"Usage: exemple1 taxon_name\n");exit(1);}strcpy(my_taxon,argv[1]);
while (argc > i) {strcat(my_taxon," ");strcat(my_taxon,argv[i]);i ++;
}acnucopen();
list = (int *)calloc(lenw , sizeof(int) );err = shkseq(my_taxon, list, 1);if(err == 2) { printf("Taxon %s does not exist in the current database.\
n",my_taxon);exit(1);}num = 1;while( (num = irbit(list, num, nseq)) != 0) {/* here num is the rank of a seq attached to taxon my_taxon */readsub(num);printf("%s\t%s\n",my_taxon,psub->name);}
free(list);
}
select a local database:
« choix embl » or « choixbanque »
else
setenv acnuc/acnucdb/embl/index
setenv gcgacnuc /acnucdb/embl/flat_files
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of APIAPI C
local version
Exercise 26
http://pbil.univ-lyon1.fr/databases/acnuc/ex_requete.php
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of API API C
client version
Exercise 27
http://pbil.univ-lyon1.fr/databases/acnuc/raa_acnuc.html#example
API C client version (acess via les sockets)http://pbil.univ-lyon1.fr/databases/acnuc/raa_acnuc.html
Formation ACNUC - 3 Mars 2010 - Pôle Informatique du LBBE - CNRS - Université de Lyon
Use of API API Python
Documentation :http://pbil.univ-lyon1.fr/cgi-bin/raapythonhelp.csh