[Bioperl-l] bioperl newcomer's questions

Jason Stajich jason@chg.mc.duke.edu
Tue, 21 Aug 2001 09:15:36 -0400 (EDT)

Jason -

You can do all of these with some logic around the Bio::Tools::BPlite
to parse your reports and obtain accession numbers. 

To retrieve sequences you can either retrieve from a local index
(recommended) using a Bio::Index:: module or a remote db using one of the 
Bio::DB:: modules (less efficient as network latency will come into play
big time).  

I already have a script that essentially does all of this for you,
download the developer release 0.9.0 from the website. Script is in

You can also the script from CVSweb at

You will need a local copy of the blast db that you are blasting your
query against in order to run this script.

Related to Task1, give a look at the mechanisms for running task1 and then
see if you can make some of the connections. For finding codons upstream
you may need to do some pattern matching with perl.  Bsaically I'm dodging
the bullet here for now until you can task1 solved.

I'm sure various questions about perl/bioperl will come up if you are
unfamiliar with bioperl/perl so feel free to ask.  I'd suggest going
through the online perldoc for a module at:
and checking out Peter Schattner's tutorial at:
for more detailed info about how to use a module.

Hope this helps.


On Mon, 20 Aug 2001, Jason Raymond wrote:

> Greetings, I'm fairly new to Perl and brand new to Bioperl but I'm
> excited about what I've seen so far.  Specifically, I want to learn
> Bioperl to perform two immediate tasks (which will hopefully be
> elaborated upon in the long run).  I have checked quite a few of the
> news archives and am not sure if these are current tasks or perhaps
> readily available scripts; if not any pointers on how to get started
> are greatly appreciated! thanks in advance, JR
> task 1: full sequence (not HSP) retrieval from online db's; so that
> given a query sequence, bioperl would blast (for example) the ncbi
> database, extract all accession numbers above a given threshold, and
> then (rather than just parse and return HSP's as this is frustrating
> in sequence alignment) return the entire protein or gene corresponding
> to that accession number.
> task 2 (perhaps computationally related to task 1): local sequence
> retrieval given a local genome database and a query sequence; given a
> query sequence, blast against an organism's genome (or multiple
> organism's genomes) and, upon finding the best hits above a certain
> threshold, attempt to extract the gene coding for this match by
> finding, in frame with the HSP, an upstream start codon and a
> downstream stop codon.  Once the full genes are extracted it would be
> good to do a quick pairwise alignment of them versus the query so that
> false positives can thereby be eliminated.