[Bioperl-l] Local bl2seq

Tue Aug 23 07:50:36 EDT 2005

Hi,
I am trying to use BLAST to compare the sequences. I did the program in
Bioperl. Below is my piece of code
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
$seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
                             -format => "fasta" );
$seq_obj = $seqio_obj->next_seq;
$input2 = Bio::Seq->new(-id=>"testquery2",

-seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");

$factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
'blastn','outfile' => 'bl2seq1.out');
$blast_report = $factory->bl2seq($seq_obj, $input2);

I need help for looping input2. I want to extract this part of sequence
from a file containing 200000 records. Using perl I am extracting the
sequence part for file of format given below.
SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG

code for extracting PROBE_SEQUENCE looks like this

$NemSeq =<STDIN>;

chomp $NemSeq;

unless (open(seqfile, $NemSeq)) {
print "Cannot open file \n";
exit;
}
@NemSeq = <seqfile> ;

close seqfile;

for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
    #print $k, $NemSeq[$k];
    @Nem =split(/\t/,$NemSeq[$k]);
    $input= $Nem[3];

    #print scalar(@Nem);
    #print $Nem[3], "\n";

}

@Nem =split(/\t/,$NemSeq)

$input2 = substr(@NemSeq,4,32);

So far I could successfully use bioperl(bl2seq) to compare whole genome
with single probe. 
I want to compare all the 200000 thousand probes. I am interested only
in mismatches, for this particular scenario my assumption is that more
than 90% of them will match. I want to send only the mismatches to
output file and discard the matches. I would like to classify the
mismatches based on the percentage dissimilarity, is there a way in
Bioperl for this? Thanks a lot for the reply. Please help me with this.
Thanks
Usha

----- Original Message -----
From: Barry Moore <barry.moore at genetics.utah.edu>
Date: Monday, August 22, 2005 11:45 pm
Subject: Re: [Bioperl-l] bl2seq

> Usha,
> 
> The best advice I can give you is that you need to focus your 
> question a 
> bit more.  What method are you using to compare your probe to your 
> fasta?  Regex, BLAST, Needle, RNAHybrid...?  You say your sequence 
> is 
> working fine for single sequence.  Are you using Bioperl for that?  
> Can 
> you tell us exactly what isn't working for you or what questions 
> you 
> have about working with multiple sequences?  Are you already using 
> Bioperl with your single sequence comparison? Can you show us some 
> code?
> Barry
> 
> Usha Rani Reddi wrote:
> 
> >Hi,
> >I am trying to compare two hundred thousand probes(each one of 
> them) to
> >another genome. Format of the file containing probes is like this
> >SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
> >NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
> >NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
> >NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
> >NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
> >NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
> >NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
> >NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
> >NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
> >What I am trying to do is compare PROBE_SEQUENCE to fasta file of
> >Streptococcus agalactiae. I am trying to loop through the probes 
> but not
> >sure how to proceed. My program is working fine for single 
> sequence. One
> >more thing is I am not interested in matches, I want to display only
> >mismatches. I am new to Bioperl, some one please help me with this.
> >Thanks
> >Usha
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >  
> >
> 
> -- 
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
> 
> 
> 
>