[Bioperl-l] GeneDB Question
Heikki Lehvaslaiho
heikki at nildram.co.uk
Tue Sep 2 04:31:42 EDT 2003
Markus,
Since screen-scraping is what is needed the absolutely easiest way to do
it is to use WWW::Mechanize. If you want to be a bit more compatible to
most installations, you can use bioperl module Bio::WebAgent which is
built on top of LWP::UserAgent. Incidently, WWW::Mechanize is a subclass
of LWP::UserAgent, too, so you could test for the availability and
sneakily bless Bio::WebAgent into WWW::Mechanize!
Have a look at Bio::DB::MeSH for examples. I got carried away and
included code based on several different modules. (The MeSH modulue will
be renamed at some point.)
-Heikki
On Mon, 2003-09-01 at 17:47, Keith James wrote:
> >>>>> "Markus" == Markus Kador <markus at kador.de> writes:
>
> Markus> Hi, I would like to get sequence data form GeneDB
> Markus> (http://www.genedb.org/) in my perl script. Since there
> Markus> is no module available I wanted to ask if anyone has ever
> Markus> done that or has any pointers on how to achive
> Markus> that. Specifically the blast server would be interesting.
>
> As I'm at Sanger I've just been round to the genedb office to ask
> about this.
>
> I think that you will have to try screen-scraping the omniblast page
> (rather than the individual organism blast pages). This way you can
> search all the data but only have to maintain your script to mirror
> the changes to one submission web page. However, that page is subject
> to periodic changes in formatting and in the number and labelling of
> radio buttons and checkboxes.
>
> As you know, there is no public server or API. There is no likelihood
> of these becoming available in the forseeable future, so a web-scraper
> may be worth the effort.
>
> I also asked about ftp availability of the data because I think that
> if you have the resources (disk space & local blast) your best option
> is to ftp the data to your local machine. Due to ongoing data-release
> policy issues the ftp site data is not complete for some
> organisms. You would need to contact the genedb people directly about
> that.
>
> HTH
>
> Keith
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list