[Biopython] Lineage from GenBank files Question
Peter
biopython at maubp.freeserve.co.uk
Sat Oct 23 14:43:31 UTC 2010
On Fri, Oct 22, 2010 at 7:22 PM, Ara Kooser <akooser at unm.edu> wrote:
> Hello all,
>
> I've been working on a code to parse information from BLAST .xml files and
> GenBank files. I am interested in adding the taxonomy lineage information to
> the code.
>
There are two approaches here, firstly the (limited) lineage in the GenBank
flat files themselves, and secondly using the taxon ID or accession online
with the NCBI Entrez API to get the full lineage.
Taking an example,
LOCUS NC_000932 154478 bp DNA circular PLN 15-APR-2009
DEFINITION Arabidopsis thaliana chloroplast, complete genome.
ACCESSION NC_000932
VERSION NC_000932.1 GI:7525012
DBLINK Project:116
KEYWORDS .
SOURCE chloroplast Arabidopsis thaliana (thale cress)
ORGANISM Arabidopsis thaliana
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
rosids; eurosids II; Brassicales; Brassicaceae; Arabidopsis.
REFERENCE 1 (bases 1 to 154478)
...
The lineage is in the header, the lines following the SOURCE
and ORGANISM lines. This all gets recorded in the SeqRecord
annotations dictionary:
>>> from Bio import SeqIO
>>> record = SeqIO.read("", "genbank")
>>> record.annotations["source"]
'chloroplast Arabidopsis thaliana (thale cress)'
>>> record.annotations["organism"]
'Arabidopsis thaliana'
>>> record.annotations["taxonomy"]
['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta',
'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons',
'core eudicotyledons', 'rosids', 'eurosids II', 'Brassicales',
'Brassicaceae', 'Arabidopsis']
There is also some relevant information in any source feature (usually
there is one and only one, and this will be the first feature), such as the
taxon ID.
>
> I do have a second question. Once I have a chunk of code running
> and made pretty what is the best way to submit it so it can be posted
> up in the Cookbook section.
>
It is a wiki, just make sure you include [[Category:Cookbook]] and
it will appear here: http://biopython.org/wiki/Category:Cookbook
Peter
More information about the Biopython
mailing list