[Bioperl-l] Reading all sequences using Bio::DB::Flat inSwissProtfile

Brian Osborne brian_osborne at cognia.com
Wed Jan 26 21:20:53 EST 2005


Chris and Kenny,

Bio::Index::Swissprot has an id_parser() method now but the uniqueness of
the key will be a concern, yes.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Chris Mungall
Sent: Friday, January 21, 2005 12:33 PM
To: Brian Osborne
Cc: Daily, Kenneth Michael; bioperl-l at portal.open-bio.org
Subject: RE: [Bioperl-l] Reading all sequences using Bio::DB::Flat
inSwissProtfile



Brian,

Unfortunately the id_parser method isn't supported in
Bio::Index::Swissprot

Even if it was I don't think it would be sufficient here - Kenny needs to
index using the feature fields. This implies that the search key wouldn't
be unique. Bio::Index::Abstract requires a unique key for the index.

Flexible indexing and retrieval such as this is best handled using some
generic non-bioperl specific solution - RDB, XMLDB, SRS, Lucene, LuceGene
etc

I forgot to mention Don Gilbert's LuceGene in my original reply - it's a
fairly sane open-source alternative to SRS. It handles lots of
bioinformatics file formats (not sure about swissprot but I'm sure it
could be added)

See:
http://www.gmod.org/lucegene/index.shtml

Cheers
Chris

On Fri, 21 Jan 2005, Brian Osborne wrote:

> Kenny,
>
> Did you take a look at Bio/Index/Swissprot.pm? What's important for you
will
> be building the index using the keys you're interested in as opposed to
the
> default key, using the id_parser method. See the Bio::Index section in the
> bptutorial for an example.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Daily,
> Kenneth Michael
> Sent: Wednesday, January 19, 2005 11:49 AM
> To: bioperl-l at portal.open-bio.org
> Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
> SwissProtfile
>
>
> I want to work with a local copy of the SwissProt database, and need to
> search through all of the entries. I only see methods to return sequences
by
> accession. However, I cannot use just FASTA format of the SwissProt
records,
> as I need to use the feature fields. What I need to learn is how to do a
DB
> search on the features field of the SwissProt records, if its possible.
> Would there be any advantage do doing it with the DB instead of just using
> SeqIO as an input stream? I think it might, since every time I want to do
a
> search I must read in the entire file again, which is very costly. Thank
> you.
>
> Kenny Daily
> Indiana University
> School of Informatics
> kmdaily [at] indiana [dot] edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list