[Bioperl-l] Hit length using length_aln()

Ken kjgraham@ucdavis.edu
Mon, 15 Jul 2002 13:29:12 -0700


A little background on what I'm trying to do will make things clearer.

Fortunately for this specific application I am working with bacterial genomes 
(2 different strains of the same species) but I know that I'll want to keep 
everything as general as possible (Human genome applications are right around 
the corner).

The overall goal is to design PCR primers that will amplify a gene in both 
stains (if it is present in both strains). And not amplify other genes.

I have results from one strain BLASTed against itself and the other strain 
for every gene. Obviously the first hit in each report is the gene itself. 
Then the same gene if it is present in the other strain. And finally other 
genes in either strain that produce a hit. 
I need to know how closely, and where, the other strain matches on a gene by 
gene basis. 

I've got the basic code working, more or less in this order;  design primers 
for gene in species A, read in BLAST report, find mismatches in species B, 
find matches on other genes (not to amplify), list primers that are in both 
species but not in other genes. This works if everthing is simple. 

But now I'm working on the gory details such as, a hit (possibly the gene in 
question) in the other species that has gaps reported as separate HSPs. I 
want to treat the individual HSPs as one entity for purposes of this 
application.

In response to Brian's post regarding if I want the entire hit or the 
sequence that matches the query. For well annontated bacterial genomes I can 
work with the hit. However, your point about a hit potentially being 5 Mb is 
right. I'm not at that point yet but I probably will be sooner than I expect. 
We're starting to look at some human gene families and I'm just now getting 
used to the annotations and biology of the human genome (I'm used to S. 
cervisiae and bacteria).

So I guess my question is, I want to find the easiest/best way to work with 
the HSPs of a hit as a single entity. What objects would you recommend I use?
Sorry for the length of this post.
Thanks again,
Ken