[Bioperl-l] EMBL format field

Jason Stajich jason at bioperl.org
Tue Jun 10 23:55:56 UTC 2008


What version of bioperl? It works for me using  this code I get  
'CB271253' printed out.

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $in = Bio::SeqIO->new(-format => 'embl', -file => shift);
while( my $seq = $in->next_seq ) {
  print $seq->id,"\n";
}

On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote:

> That's weird. I also met this problem. I tried a embl-format file  
> like this:
>
> ID   CB271253; SV 1; linear; mRNA; EST; INV; 591 BP.
> XX
> AC   CB271253;
> XX
> DT   24-FEB-2003 (Rel. 74, Created)
> DT   24-FEB-2003 (Rel. 74, Last updated, Version 1)
> XX
> DE   taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to
> DE   SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence.
>
> from: http://www.ebi.ac.uk/cgi-bin/dbfetch? 
> db=embl&id=CB271253&style=raw
>
> the $seq object's   ->id, ->display_id  are "unkown id" ...
>
>
>
> ZQ Ye
>
> 2008/6/9 Hilmar Lapp <hlapp at gmx.net>:
>> If this is the case with the latest version of BioPerl it should  
>> be filed as
>> a bug report for the embl parser. The ID ought to be reported in
>> $seq->get_secondary_accessions() (which returns an array). If it  
>> doesn't, it
>> sounds like a bug to me.
>>
>>        -hilmar
>>
>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>
>>> Hi Wen,
>>> A dump of that sequence object (Data::Dumper is your friend !)  
>>> reveals
>>> that the PA EMBL field is not saved into the object. However, you  
>>> will
>>> find the string 'AB000170.1' in the embedded CDS feature, more  
>>> precisely
>>> the seqid of the location object. I don't know whether that is  
>>> always
>>> the case, but it is in your particular example.
>>> So, to get your hands on that value you have to do:
>>>
>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures;
>>> my $parent_id = $cds->location->seq_id;
>>>
>>> HTH,
>>> Marc
>>>
>>> Marc Logghe
>>> Senior Bioinformatician
>>> Ablynx nv
>>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: [Bioperl-l] EMBL format field
>>>>
>>>> Hi all,
>>>>
>>>> I have a EMBL file that I want to extract one of the line
>>>>
>>>> ###file###
>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>> XX
>>>> PA   AB000170.1
>>>> XX
>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>> XX
>>>> OS   Sus scrofa (pig)
>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
>>>> Euteleostomi;
>>>> Mammalia;
>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus.
>>>> OX   NCBI_TaxID=9823;
>>>> .........
>>>>
>>>> I want the accession number in the line that starts with PA,  
>>>> AB000170
>>>> in this example.
>>>>
>>>> Can anybody kindly help, tell me which module and method I  
>>>> should use?
>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>> get_secondary_id, etc.. they did not work...
>>>>
>>>> Thanks a lot!
>>>>
>>>> Wen
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list