[Bioperl-l] getting upstream regions

Thu Mar 6 21:44:11 EST 2003

> I was wondering what the current standard way to get sequence just
> upstream of a gene was in bioperl.  We're mostly using the UCSC
> dataset/tools at the moment.
> 
> At the moment, we're using a Perl module which calls the "nibFrag"
> program from UCSC.  If people think this would be useful, I'd be
> happy to contribute it, although I don't know bioperl's object
> system terribly well, so it would probably need some rewriting.
> (I gather there's some C code to do this, but the actual "nibFrag" 
> program itself is quite fast, and this avoids making native calls,
> which is nice, although slower.)

See <http://bugzilla.bioperl.org/show_bug.cgi?id=1405> for my
naive take on it.

The efetch <http://www.ncbi.nih.gov/entrez/query/static/efetchseq_help.html>
interface to GenBank supports arbitrary subsequence retrieval with the
seq_start and seq_stop parameters. The Bio::DB::Genbank uses efetch
but does not support these parameters. I hacked them in though. See the
patches on the bugzilla page.

I didn't test this very much because I just ended up downloading the
entire genomes that I needed.

-- 
Mark Wagner mark at lanfear.net