[Bioperl-l] GeneDB Question

Heikki Lehvaslaiho heikki at nildram.co.uk
Tue Sep 2 04:31:42 EDT 2003


Markus,

Since screen-scraping is what is needed the absolutely easiest way to do
it is to use WWW::Mechanize. If you want to be a bit more compatible to
most installations, you can use bioperl module Bio::WebAgent which is
built on top of LWP::UserAgent. Incidently, WWW::Mechanize is a subclass
of LWP::UserAgent, too, so you could test for the availability and
sneakily bless Bio::WebAgent into WWW::Mechanize! 

Have a look at Bio::DB::MeSH for examples. I got carried away and
included code based on several different modules. (The MeSH modulue will
be renamed at some point.)


	-Heikki

On Mon, 2003-09-01 at 17:47, Keith James wrote:
> >>>>> "Markus" == Markus Kador <markus at kador.de> writes:
> 
>     Markus> Hi, I would like to get sequence data form GeneDB
>     Markus> (http://www.genedb.org/) in my perl script.  Since there
>     Markus> is no module available I wanted to ask if anyone has ever
>     Markus> done that or has any pointers on how to achive
>     Markus> that. Specifically the blast server would be interesting.
> 
> As I'm at Sanger I've just been round to the genedb office to ask
> about this.
> 
> I think that you will have to try screen-scraping the omniblast page
> (rather than the individual organism blast pages). This way you can
> search all the data but only have to maintain your script to mirror
> the changes to one submission web page. However, that page is subject
> to periodic changes in formatting and in the number and labelling of
> radio buttons and checkboxes.
> 
> As you know, there is no public server or API. There is no likelihood
> of these becoming available in the forseeable future, so a web-scraper
> may be worth the effort.
> 
> I also asked about ftp availability of the data because I think that
> if you have the resources (disk space & local blast) your best option
> is to ftp the data to your local machine. Due to ongoing data-release
> policy issues the ftp site data is not complete for some
> organisms. You would need to contact the genedb people directly about
> that.
> 
> HTH
> 
> Keith
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list