[Bioperl-l] embl.pm and virus names
Heikki Lehvaslaiho
heikki at nildram.co.uk
Sat Nov 8 09:05:00 EST 2003
Neal,
The Bio::Species class is relatively new in Bioperl and has not been
extensively tested. The EMBL parser simply expects to find a normal
binomial scientific name in every entry.
I'll try to fix this for viri. The current EMBL parser generates this
kind of structure:
$VAR1 = bless( {
'_sub_species' => 'virus',
'_classification' => [
'immunodeficiency',
'Human',
'Primate lentivirus group',
'Lentivirus',
'Retroviridae',
'Retroid viruses',
'Viruses'
]
}, 'Bio::Species' );
I've now changed my copy of the parser to produce:
$VAR1 = bless( {
'_classification' => [
'Human immunodeficiency virus',
'Primate lentivirus group',
'Lentivirus',
'Retroviridae',
'Retroid viruses',
'Viruses'
]
}, 'Bio::Species' );
No subspecies and the whole OS line is in first item of the array. I
reasonably happy with this. My only gripe is that if call binomial() on
this object, you get:
'Primate lentivirus group Human immunodeficiency virus'
while genus() gives:
'Primate lentivirus group'
Is this good enough, or can anyone suggest a better solution?
In addition to EMBL, I'll try to make sure that GenBank and SWISS-PROT
parsers treat viri similarly.
-Heikki
P.S. I could not find any EMBL entries with PCC6803 in OS line, but
given OS line like 'Synechocystis sp. PCC6803', 'PCC6803' should end up
into subspecies().
-H
On Thu, 2003-11-06 at 16:49, Neil Rawlings wrote:
> I am trying to use the Bio::SeqIO::EMBL to parse EMBL database entries,
> but am having problems whenever I try to retrieve the organism name
> whenever the EMBL entry is for a viral sequence. I am using the embl.pm
> module and a line such as:
>
> My ($spec, $genus) = $entry->species->classification();
>
> But for a virus (which doesn't have a species name - for example "apple
> chlorotic leaf spot virus") I get "Apple chlorotic" as the organism
> name. I'm not just interested in viruses, so I'm happy when the name
> comes back as "Drosophila melanogaster". The problem is also apparent
> for some bacteria, especially something like Synechocystis sp. PCC6803
> in which PCC6803 is lost (probably assumed to be a subspecies name).
>
> If a solution exists to this problem, please let me know.
>
> ========================================================================
> ====
> Neil D. Rawlings
> Sanger Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambs CB10 1SA, UK
>
> Tel: +1223 495330
> Fax: +1223 494919
> E-mail: <A HREF="mailto:ndr at sanger.ac.uk">ndr at sanger.ac.uk</A>
> ========================================================================
> ======
> Please visit the MEROPS database for peptidase classification. The URL
> is:
> <a
> href="merops.sanger.ac.uk">MEROPS.SANGER.AC.UK</a>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list