[Bioperl-l] Fix for fasta loading into bioperl-db
Elia Stupka
elia@fugu-sg.org
Wed, 12 Jun 2002 15:42:31 +0800 (SGT)
It turns out that loading simple fasta files into bioperl-db hadn't really
been checked so far.
I made a few changes to make it work. The first one is non-controversial,
which is that a fasta parsed seq does not return a RichSeqI object and
thus does not have a seq_version. I simply set seq_version to zero if the
object is not a RichSeqI compliant object.
The second one is trickier, and the fix is temporary. All fasta parsed
sequences come back with accession unknown, they just have a display_id
and a description.
Possible solutions:
1)Decide in bioperl-live that when parsing fasta files, the
accession_number is set to the display_id (this kind of makes sense
because if you load a genbank file and dump it as fasta, the accession
number gets put as display_id at the beginning of the header). I didn't go
ahead because I wanted to hear what people thought before touching our
sacred parsers
2)In bioperl-db, when trying to store a sequence that has accession
unknown, change the accession to the display_id, this is my temporary fix
3)Change the actual sql of bioperl-db constraint. This one is not easy
because there is no easy way to tell mysql to put the constraint on
accession, version and division, OR (if accession unknown) to display_id,
version, division. So from the SQL point of view all we could do is remove
the constraint and trust on checking the constraint only in the code
I favour 1, so far I have implemented 2 because I was bored of having an
internal hack...
Elia
--
********************************
* http://www.fugu-sg.org/~elia *
* tel: +65 874 1467 *
* mobile: +65 90307613 *
* fax: +65 777 0402 *
********************************