[Bioperl-l] Reading all sequences using Bio::DB::Flat in SwissProt file

Chris Mungall cjm at fruitfly.org
Wed Jan 19 15:37:25 EST 2005


This is a good solution. If you're after something a bit more lightweight
than a relational solution, which typically involves a lot of admin and
(often slow) database loading (although this isn't a problem here as the
UCSC folks are nice enough to make their SP db available), then you may
want to look into an xml db solution

For example, you can download the swiss xml from the EBI and stick it into
something like Apache Xindice, then grab the sequences you want using an
arbitrary XPath query, and transform the results with something like XSLT
or XML::Twig. There's more of an initial learning curve but the same
solution pattern is reusable in lots of other contexts.

XPath isn't as powerful as SQL, but on the other hand the admin & coding
overhead is lower. It's very similar to the Bio::Index solution, with the
additional advantage of more queries & indexing.

There's also SRS too, which give you fairly flexible querying
capabilities. YMMV.

Cheers
Chris

On Wed, 19 Jan 2005, Sean Davis wrote:

> Kenny,
>
> If this is something you are going to be doing often, you might want to
> look at bioperl-db.  Alternatively, UCSC maintains a fully-relational
> swissprot database
> (http://hgdownload.cse.ucsc.edu/goldenPath/swissProt/database/) that
> you could pretty easily load into a mysql server.  You can access their
> mysql server directly (let me know if you want to do this), also, but
> if you are running any kind of batch query, I would suggest you
> download the tables and load them yourself (really pretty easy to do).
>
> Sean
>
> On Jan 19, 2005, at 11:48 AM, Daily, Kenneth Michael wrote:
>
> > I want to work with a local copy of the SwissProt database, and need
> > to search through all of the entries. I only see methods to return
> > sequences by accession. However, I cannot use just FASTA format of the
> > SwissProt records, as I need to use the feature fields. What I need to
> > learn is how to do a DB search on the features field of the SwissProt
> > records, if its possible. Would there be any advantage do doing it
> > with the DB instead of just using SeqIO as an input stream? I think it
> > might, since every time I want to do a search I must read in the
> > entire file again, which is very costly. Thank you.
> >
> > Kenny Daily
> > Indiana University
> > School of Informatics
> > kmdaily [at] indiana [dot] edu
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


More information about the Bioperl-l mailing list