[Biopython-dev] Possible Contribution: UCSC Blat and Ensembl SSAHA Sequence Locator

Jeffrey Chang jchang at smi.stanford.edu
Fri Feb 7 15:42:29 EST 2003


On Fri, Feb 07, 2003 at 10:43:29AM +0200, Anthony Metzidis wrote:
[I've reordered some paragraphs...]
> Hello,
> We've developed a Python API for the UCSC 
> BLAT(http://genome.ucsc.edu/cgi-bin/hgBlat?command=start) and Ensembl 
> SSAHA (http://www.ensembl.org/Homo_sapiens/ssahaview) genome search tools.

> We would like to contribute this to BioPython, if you think there would 
> be an interest in it.

Yes, there would definitely be interest in it!

> If so, could you offer advise about other existing BioPython interfaces 
> that we should model ours after?  I would like the interface to be as 
> consistent as possible with the rest of BioPython.

There's a few data types that should be supported.  More below...

> Using our tool, you can input a series of dna sequences in Fasta format 
> and then get the results back as dictionaries, indexed by the Fasta 
> title, of dictionaries indexed by the fields presented by the web 
> interfaces.

The DNA sequences should be Bio.Seq objects, and not require FASTA
format.  Also, the results should be in defined and documented objects
(for an example, see Bio.Blast.Record), rather than dictionaries.

> The http connection and parsing of the HTML results pages are handled by 
> our tool.

Also, make sure that these are decoupled.  That is, you can use the
tool to make HTTP connections and save the HTML results for processing
later.  Also, you can take HTML results (saved to disk, database, etc)
and parse it into an appropriate object.

Be sure to check out the FAQ, which gives some guidelines on
submitting code.  Basically, you have to agree to license your code
under the Biopython license, and also that you can legally do that!

Jeff


More information about the Biopython-dev mailing list