[Bioperl-l] genpept/swiss
Ewan Birney
birney@ebi.ac.uk
Mon, 4 Sep 2000 17:45:01 +0100 (GMT)
On Mon, 4 Sep 2000 hilmar.lapp@pharma.Novartis.com wrote:
>
> I didn't follow what you said, so I don't know. Part of the problem
> may be that I don't know much about how complete the bioperl parsers are.
>
> Unfortunately, there are other very sad points, for instance some
> types of location (compound locations with cross-references, fuzzy
> locations) cannot be handled because the data model is not yet
> prepared for them. (This means that you e.g. lose the translation tag
> for those sequences, and since the CDS coordinates are not handled
> either, you basically cannot tell the correct translation.)
There are two questions here:
(a) is our data/object model rich enough to cope with what we want
to do? Possibly not completely yet, but heading that way
(b) is that data model compatible with the
EMBL/GenBank/Swissprot/Whatever data model.
(a) is the work we have to do. (b) is a decision we have to make.
both are open to people arguing one way or the other and more importantly,
providing *code*
>
> Maybe it's a good time to bring up this painful discussion again: What
> do people think about a rewrite of the SeqIO parsers? What should the
> re-design provide for? Given the current maturity of XML
> representations of the major databanks (can anyone comment on this,
> that is, what is the maturity?), does it make sense to go directly for
> an XML mapping? Do the advantages of such an approach justify the
> price in overhead (performance-wise)? Would it be realistic to limit
> future support (meaning maintenance) in BioPerl to XML dumps provided
> by the major database providers?
>
I *don't* think a re-write of the SeqIO system is warranted yet.
I would be supportive of someone who wanted to *reorganise* the
EMBL/GenBank/Swissprot parsing to be more flexible. If they so wished.
I think XML read/write fits in perfectly fine with the current SeqIO
system at the moment, but we have to bootstrap ourselves into this problem
- learning to read the XML format of others, dumping XML formats to be
sure.
> And last not least: who would be volunteering to do what?
>
Indeed...
> Have the Ensembl people done some work in this direction that could be
> back-ported?
Ensembl makes heavy use of EMBL/GenBank dumping "with all the bells and
whistles". This goes via the Bio::SeqI interface (Ensembl "sequences" are
Bio::SeqI compliant)
We also have a GAME dumper in development, mainly waiting for a bunch of
people wanting to use it. The GAME dumper works directly off Ensembl, not
via the Bio::SeqI interface as both Ensembl and GAME are richer than the
basic Sequence (Bio::Seq etc), for example with supporting evidence.
>
> I guess Ewan wants to comment on these questions...
>
Have done...
> Hilmar
>
>
>
>
>
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------