[Bioperl-l] Local bl2seq
Usha Rani Reddi
ureddi at emich.edu
Tue Aug 23 07:50:36 EDT 2005
Hi,
I am trying to use BLAST to compare the sequences. I did the program in
Bioperl. Below is my piece of code
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
$seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
-format => "fasta" );
$seq_obj = $seqio_obj->next_seq;
$input2 = Bio::Seq->new(-id=>"testquery2",
-seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
$factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
'blastn','outfile' => 'bl2seq1.out');
$blast_report = $factory->bl2seq($seq_obj, $input2);
I need help for looping input2. I want to extract this part of sequence
from a file containing 200000 records. Using perl I am extracting the
sequence part for file of format given below.
SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE
NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC
NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT
NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG
code for extracting PROBE_SEQUENCE looks like this
$NemSeq =<STDIN>;
chomp $NemSeq;
unless (open(seqfile, $NemSeq)) {
print "Cannot open file \n";
exit;
}
@NemSeq = <seqfile> ;
close seqfile;
for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
#print $k, $NemSeq[$k];
@Nem =split(/\t/,$NemSeq[$k]);
$input= $Nem[3];
#print scalar(@Nem);
#print $Nem[3], "\n";
}
@Nem =split(/\t/,$NemSeq)
$input2 = substr(@NemSeq,4,32);
So far I could successfully use bioperl(bl2seq) to compare whole genome
with single probe.
I want to compare all the 200000 thousand probes. I am interested only
in mismatches, for this particular scenario my assumption is that more
than 90% of them will match. I want to send only the mismatches to
output file and discard the matches. I would like to classify the
mismatches based on the percentage dissimilarity, is there a way in
Bioperl for this? Thanks a lot for the reply. Please help me with this.
Thanks
Usha
----- Original Message -----
From: Barry Moore <barry.moore at genetics.utah.edu>
Date: Monday, August 22, 2005 11:45 pm
Subject: Re: [Bioperl-l] bl2seq
> Usha,
>
> The best advice I can give you is that you need to focus your
> question a
> bit more. What method are you using to compare your probe to your
> fasta? Regex, BLAST, Needle, RNAHybrid...? You say your sequence
> is
> working fine for single sequence. Are you using Bioperl for that?
> Can
> you tell us exactly what isn't working for you or what questions
> you
> have about working with multiple sequences? Are you already using
> Bioperl with your single sequence comparison? Can you show us some
> code?
> Barry
>
> Usha Rani Reddi wrote:
>
> >Hi,
> >I am trying to compare two hundred thousand probes(each one of
> them) to
> >another genome. Format of the file containing probes is like this
> >SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE
> >NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC
> >NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT
> >NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG
> >NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC
> >NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA
> >NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG
> >NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG
> >NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA
> >What I am trying to do is compare PROBE_SEQUENCE to fasta file of
> >Streptococcus agalactiae. I am trying to loop through the probes
> but not
> >sure how to proceed. My program is working fine for single
> sequence. One
> >more thing is I am not interested in matches, I want to display only
> >mismatches. I am new to Bioperl, some one please help me with this.
> >Thanks
> >Usha
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> --
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
>
>
>
>
More information about the Bioperl-l
mailing list