[Bioperl-l] EMBL format field

Wed Jun 11 00:19:55 UTC 2008

PA is an odd field; it isn't described in the EMBL user manual:

http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html

but appears in mRNA files, so I'm guessing it stands for the (p)rotein  
(a)ccession.  I don't think this should be stored as primary/secondary  
accession, but maybe as a DBLink annootation?

chris

On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:

> PA is a field that we don't currently parse, something that should  
> be filed as a bug on bugzilla.
> Would you be able to do this?
>
> -jason
> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>
>> Hilmar,
>>
>> I tried that, it did not work. Marc's way can work.
>>
>> Thanks,
>> Wen
>>
>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>
>>> If this is the case with the latest version of BioPerl it should  
>>> be filed as a bug report for the embl parser. The ID ought to be  
>>> reported in $seq->get_secondary_accessions() (which returns an  
>>> array). If it doesn't, it sounds like a bug to me.
>>>
>>> 	-hilmar
>>>
>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>> Hi Wen,
>>>> A dump of that sequence object (Data::Dumper is your friend !)  
>>>> reveals
>>>> that the PA EMBL field is not saved into the object. However, you  
>>>> will
>>>> find the string 'AB000170.1' in the embedded CDS feature, more  
>>>> precisely
>>>> the seqid of the location object. I don't know whether that is  
>>>> always
>>>> the case, but it is in your particular example.
>>>> So, to get your hands on that value you have to do:
>>>>
>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures;
>>>> my $parent_id = $cds->location->seq_id;
>>>>
>>>> HTH,
>>>> Marc
>>>>
>>>> Marc Logghe
>>>> Senior Bioinformatician
>>>> Ablynx nv
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a EMBL file that I want to extract one of the line
>>>>>
>>>>> ###file###
>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>> XX
>>>>> PA   AB000170.1
>>>>> XX
>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>> XX
>>>>> OS   Sus scrofa (pig)
>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
>>>>> Euteleostomi;
>>>>> Mammalia;
>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae;  
>>>>> Sus.
>>>>> OX   NCBI_TaxID=9823;
>>>>> .........
>>>>>
>>>>> I want the accession number in the line that starts with PA,  
>>>>> AB000170
>>>>> in this example.
>>>>>
>>>>> Can anybody kindly help, tell me which module and method I  
>>>>> should use?
>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>> get_secondary_id, etc.. they did not work...
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> Wen
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign