[Bioperl-l] Fix for fasta loading into bioperl-db
Hilmar Lapp
hlapp@gnf.org
Wed, 12 Jun 2002 09:27:03 -0700
> -----Original Message-----
> From: Elia Stupka [mailto:elia@fugu-sg.org]
> Sent: Wednesday, June 12, 2002 12:43 AM
> To: Bioperl
> Subject: [Bioperl-l] Fix for fasta loading into bioperl-db
>
>
> It turns out that loading simple fasta files into bioperl-db
> hadn't really
> been checked so far.
>
> I made a few changes to make it work. The first one is
> non-controversial,
> which is that a fasta parsed seq does not return a RichSeqI object and
> thus does not have a seq_version. I simply set seq_version to
> zero if the
> object is not a RichSeqI compliant object.
>
> The second one is trickier, and the fix is temporary. All fasta parsed
> sequences come back with accession unknown, they just have a
> display_id
> and a description.
>
> Possible solutions:
>
> 1)Decide in bioperl-live that when parsing fasta files, the
> accession_number is set to the display_id (this kind of makes sense
> because if you load a genbank file and dump it as fasta, the accession
> number gets put as display_id at the beginning of the
> header). I didn't go
> ahead because I wanted to hear what people thought before touching our
> sacred parsers
>
> 2)In bioperl-db, when trying to store a sequence that has accession
> unknown, change the accession to the display_id, this is my
> temporary fix
IMO this is the right one. Bioperl (the semantics I mean) should not be driven by bioperl-db nor biosql (they may reveal bugs though). This should happen in adaptors.
>
> 3)Change the actual sql of bioperl-db constraint. This one is not easy
> because there is no easy way to tell mysql to put the constraint on
> accession, version and division, OR (if accession unknown) to
> display_id,
> version, division. So from the SQL point of view all we could
> do is remove
> the constraint and trust on checking the constraint only in the code
I'd vote against that.
There is no easy solution for guessing missing attributes.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------