[Bioperl-l] EMBL format field
Chris Fields
cjfields at uiuc.edu
Wed Jun 11 00:19:55 UTC 2008
PA is an odd field; it isn't described in the EMBL user manual:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
but appears in mRNA files, so I'm guessing it stands for the (p)rotein
(a)ccession. I don't think this should be stored as primary/secondary
accession, but maybe as a DBLink annootation?
chris
On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
> PA is a field that we don't currently parse, something that should
> be filed as a bug on bugzilla.
> Would you be able to do this?
>
> -jason
> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>
>> Hilmar,
>>
>> I tried that, it did not work. Marc's way can work.
>>
>> Thanks,
>> Wen
>>
>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>
>>> If this is the case with the latest version of BioPerl it should
>>> be filed as a bug report for the embl parser. The ID ought to be
>>> reported in $seq->get_secondary_accessions() (which returns an
>>> array). If it doesn't, it sounds like a bug to me.
>>>
>>> -hilmar
>>>
>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>> Hi Wen,
>>>> A dump of that sequence object (Data::Dumper is your friend !)
>>>> reveals
>>>> that the PA EMBL field is not saved into the object. However, you
>>>> will
>>>> find the string 'AB000170.1' in the embedded CDS feature, more
>>>> precisely
>>>> the seqid of the location object. I don't know whether that is
>>>> always
>>>> the case, but it is in your particular example.
>>>> So, to get your hands on that value you have to do:
>>>>
>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures;
>>>> my $parent_id = $cds->location->seq_id;
>>>>
>>>> HTH,
>>>> Marc
>>>>
>>>> Marc Logghe
>>>> Senior Bioinformatician
>>>> Ablynx nv
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a EMBL file that I want to extract one of the line
>>>>>
>>>>> ###file###
>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>> XX
>>>>> PA AB000170.1
>>>>> XX
>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>> XX
>>>>> OS Sus scrofa (pig)
>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>>>>> Euteleostomi;
>>>>> Mammalia;
>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae;
>>>>> Sus.
>>>>> OX NCBI_TaxID=9823;
>>>>> .........
>>>>>
>>>>> I want the accession number in the line that starts with PA,
>>>>> AB000170
>>>>> in this example.
>>>>>
>>>>> Can anybody kindly help, tell me which module and method I
>>>>> should use?
>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>> get_secondary_id, etc.. they did not work...
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> Wen
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list