[Bioperl-l] EMBL format field
Hilmar Lapp
hlapp at gmx.net
Wed Jun 11 01:35:50 UTC 2008
On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote:
> I agree if it isn't the accession # it shouldn't be stored there.
> I guess it is a DBlink, but it is going to be hacky to round-trip
> this as you'll have to have a special case for records that are
> mRNAs...
I think I agree with that - didn't realize it is the accession of the
(translated) protein. It would be ideal to convert this into a DBLink
annotation indeed, but that's an opinion and an interpretation of the
file (even if a very useful one). As such I believe it should be the
matter of a SeqProcessor.
Hmm - except that at that point the information has been lost already
so there's actually nothing that the SeqProcessor could massage.
So what if the line would simply be a B::Annotation::SimpleValue with
'PA' as key and the accession# as value? That wouldn't be an
interpretation, and yet would make the value available to a
SeqProcessor for converting into a DBLink.
-hilmar
>
> -jason
> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:
>
>> PA is an odd field; it isn't described in the EMBL user manual:
>>
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>>
>> but appears in mRNA files, so I'm guessing it stands for the (p)
>> rotein (a)ccession. I don't think this should be stored as
>> primary/secondary accession, but maybe as a DBLink annootation?
>>
>> chris
>>
>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>>
>>> PA is a field that we don't currently parse, something that
>>> should be filed as a bug on bugzilla.
>>> Would you be able to do this?
>>>
>>> -jason
>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>>
>>>> Hilmar,
>>>>
>>>> I tried that, it did not work. Marc's way can work.
>>>>
>>>> Thanks,
>>>> Wen
>>>>
>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>>
>>>>> If this is the case with the latest version of BioPerl it
>>>>> should be filed as a bug report for the embl parser. The ID
>>>>> ought to be reported in $seq->get_secondary_accessions() (which
>>>>> returns an array). If it doesn't, it sounds like a bug to me.
>>>>>
>>>>> -hilmar
>>>>>
>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>> Hi Wen,
>>>>>> A dump of that sequence object (Data::Dumper is your friend !)
>>>>>> reveals
>>>>>> that the PA EMBL field is not saved into the object. However,
>>>>>> you will
>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more
>>>>>> precisely
>>>>>> the seqid of the location object. I don't know whether that is
>>>>>> always
>>>>>> the case, but it is in your particular example.
>>>>>> So, to get your hands on that value you have to do:
>>>>>>
>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq-
>>>>>> >get_SeqFeatures;
>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>
>>>>>> HTH,
>>>>>> Marc
>>>>>>
>>>>>> Marc Logghe
>>>>>> Senior Bioinformatician
>>>>>> Ablynx nv
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>
>>>>>>> ###file###
>>>>>>> ID BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>> XX
>>>>>>> PA AB000170.1
>>>>>>> XX
>>>>>>> DE Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>> XX
>>>>>>> OS Sus scrofa (pig)
>>>>>>> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>>>>>>> Euteleostomi;
>>>>>>> Mammalia;
>>>>>>> OC Eutheria; Laurasiatheria; Cetartiodactyla; Suina;
>>>>>>> Suidae; Sus.
>>>>>>> OX NCBI_TaxID=9823;
>>>>>>> .........
>>>>>>>
>>>>>>> I want the accession number in the line that starts with PA,
>>>>>>> AB000170
>>>>>>> in this example.
>>>>>>>
>>>>>>> Can anybody kindly help, tell me which module and method I
>>>>>>> should use?
>>>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>> Wen
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> --
>>>>> ===========================================================
>>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>>>>> ===========================================================
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list