[Bioperl-l] Fix for fasta loading into bioperl-db

Elia Stupka elia@fugu-sg.org
Wed, 12 Jun 2002 15:42:31 +0800 (SGT)


It turns out that loading simple fasta files into bioperl-db hadn't really
been checked so far.

I made a few changes to make it work. The first one is non-controversial,
which is that a fasta parsed seq does not return a RichSeqI object and
thus does not have a seq_version. I simply set seq_version to zero if the
object is not a RichSeqI compliant object.

The second one is trickier, and the fix is temporary. All fasta parsed
sequences come back with accession unknown, they just have a display_id
and a description.

Possible solutions:

1)Decide in bioperl-live that when parsing fasta files, the
accession_number is set to the display_id (this kind of makes sense
because if you load a genbank file and dump it as fasta, the accession
number gets put as display_id at the beginning of the header). I didn't go
ahead because I wanted to hear what people thought before touching our
sacred parsers

2)In bioperl-db, when trying to store a sequence that has accession
unknown, change the accession to the display_id, this is my temporary fix

3)Change the actual sql of bioperl-db constraint. This one is not easy
because there is no easy way to tell mysql to put the constraint on
accession, version and division, OR (if accession unknown) to display_id,
version, division. So from the SQL point of view all we could do is remove
the constraint and trust on checking the constraint only in the code

I favour 1, so far I have implemented 2 because I was bored of having an
internal hack...

Elia

-- 
********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 874 1467         *
* mobile: +65 90307613         *
* fax:    +65 777 0402         *
********************************