[Biopython] missing fields in SeqIO EMBL parser?

Peter biopython at maubp.freeserve.co.uk
Fri May 7 15:10:01 UTC 2010


On Fri, May 7, 2010 at 3:59 PM, Wim De Smet <Wim.DeSmet at ugent.be> wrote:
>
> On 07-05-10 16:50, Peter wrote:
>>
>> That was also left as a TODO - the dbxrefs list is normally used for
>> single identifiers - here it would be "RFAM:RF00177" and
>> "SILVA-SSU:FJ904258" for consistency with the other parsers. At the
>> time I was undecided on how to handle any secondary identifier Would
>> you need/want this too? Maybe as  "RFAM:RF00177:SSU_rRNA_5"?
>
> I don't really need it as such, I'm just parsing the file and dropping the
> fields in the database, so they could be in there verbatim for all I care.
> (I'm not even sure what the secondary identifier means in this case.)

Are you using BioSQL or some other schema?

> For what I'm doing the easiest fix would really be if the parser took these
> lines it didn't understand and just add them to the record anyway as extra
> 'stuff' that I can extract the rest out of.
>
> For example, for those DR lines it might look a bit like this:
>>>> print record.unknown['DR']
> ('RFAM; RF00177; SSU_rRNA_5.', 'SILVA-SSU; FJ904258')
>
> That way, you'd be (sorta) Future Proof(TM). Just a suggestion anyway.
> Thanks for taking the time to respond.

I'm not keen on that approach.

Peter




More information about the Biopython mailing list