[Biopython] missing fields in SeqIO EMBL parser?
Wim De Smet
Wim.DeSmet at UGent.be
Fri May 7 14:36:09 UTC 2010
On 07-05-10 15:23, Peter wrote:
> On Fri, May 7, 2010 at 2:04 PM, Wim De Smet<Wim.DeSmet at ugent.be> wrote:
>> Hi,
>>
>> I'm trying to parse an embl file using Bio.SeqIO but I'm missing some
>> metadata fields in the parsed object. For one, I can't find any reference to
>> the DT (date) fields or any of the database cross references. I'm using
>> biopython 1.53.
>>
>> Is this simply not implemented yet or are there options to include this data
>> in the SeqRecord object returned?
>
> The DT lines are currently ignored, please file an enhancement bug.
> This is complicated by the fact the GenBank files have only one date,
> and the EMBL parser shares a lot of code with the GenBank parser.
Okay, thanks for your help. I'll file a bug for it then.
> Could you be a bit more precise about missing database cross references?
> i.e. What line type are you looking for?
Sure, take this record:
http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+EntryPage+-id+7BIdF1bEbRt+-e+[EMBL:FJ904258]+-vn+2
I'm looking for the data from the database cross reference lines (DR), i.e.:
DR RFAM; RF00177; SSU_rRNA_5.
DR SILVA-SSU; FJ904258.
I assumed this would be in the record.dxrefs fields, but it's empty when
I parse this file. It's more of a nice to have than anything else at
this point, but I'll have to figure out another way to get a hold of
these elements then.
cheers,
Wim
--
Wim De Smet
http://www.straininfo.net/
More information about the Biopython
mailing list