[Bioperl-l] Re: Thanks

Wed Aug 24 12:39:21 EDT 2005

Thanks Barry for this excellent answer. It couldn't have been written  
better. -hilmar

On Aug 24, 2005, at 5:30 AM, Barry Moore wrote:

> Usha-
>
> It is important that you keep Bioperl related discussions on the  
> bioperl list, that way others can benefit from the discussion in the  
> future by searching the archives.  Having said that, I am constantly  
> accidentally replying directly to people and not to the list because I  
> hit reply instead of reply all, so I'm not really a good one to talk.
>
> It seems from nature of your questions to this list that you might be  
> quite new to perl programming.  You participation on the list is still  
> welcome, but this is the kind of problem that you want to learn how to  
> solve yourself.  You want to to avoid at all cost giving the  
> impression that you are asking the list to do your debugging for you   
> - otherwise people will just stop replying to your messages.  An very  
> valuable article about asking questions to forums like this one is  
> found at http://www.catb.org/~esr/faqs/smart-questions.html#answers  
> <http://www.catb.org/%7Eesr/faqs/smart-questions.html#answers>.  Read  
> it.  Live it.  O.K.  enough preaching...
>
> If you have Programming Perl  
> <http://www.amazon.com/exec/obidos/tg/detail/-/0596000278/ 
> qid=1124883861/sr=8-1/ref=pd_bbs_1/102-6339742-6061723? 
> v=glance&s=books&n=507846> read the first half of chapter 20.  If you  
> don't have the Perl book, you can get the same info from  
> http://www.perldoc.com/perl5.8.4/pod/perldebug.html.  After you  
> understand the perl debugger (or if you already do) look at the error  
> message that you got.  In the first line it reports "seq doesn't  
> validate".  So something is wrong with some sequence that you are  
> trying to use in your script.  Since perl itself isn't' aware of  
> sequences, and since the stack trace below shows that the exception  
> occurred while in Bio::PrimarySeq you can conclude that some sequence  
> that your are sending to bioperl is bad.  Now fire up perl running  
> your script with the debugger like this perl -d yourscript.pl.  Use  
> 'n' to step through your code to the open command on line 14.  No  
> errors yet?  You've just loaded your sequences.fasta sequence, so that  
> sequence must be OK.  Continue stepping through your code until you  
> get the error again.  Where exactly did the error occur?  When you try  
> to set $input2 as a new Bio::Seq object?  Run the debugger again, and  
> step through to the line just before where the error occurred.  Use  
> the debugger's x command to see what the values of  $seq_id,  
> $probe_id, $position, $probe_sequence are.  This should give you a  
> clue s to what your problem is.  One more clue comes from the error  
> message.  It says "Attempting to set the sequence to [PROBE_SEQUENCE]  
> which does not look healthy".  The error message says that you are  
> trying to set the sequence to PROBE_SEQUENCE.  Try to figure out why  
> this error is occurring and how to solve it.  If you're still stuck  
> let us know what you've tried and ask us again.
> Barry
>
>
> Usha Rani Reddi wrote:
>
>> Hi,
>> Thanks a lot for your help. When I tried to run the given code I got  
>> the
>> following message.
>>
>> MSG: seq doesn't validate, mismatch is 1
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION  -------------
>> MSG: Attempting to set the sequence to [PROBE_SEQUENCE] which does not
>> look healthy
>> STACK Bio::PrimarySeq::seq
>> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:268
>> STACK Bio::PrimarySeq::new
>> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:217
>> STACK Bio::Seq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:498
>> STACK toplevel barr:23
>>
>> What should I do next? Please help me.
>> Thanks
>> Usha.
>>
>> ----- Original Message -----
>> From: Barry Moore <bmoore at genetics.utah.edu>
>> Date: Tuesday, August 23, 2005 2:40 pm
>> Subject: [Bioperl-l] RE: Local bl2seq
>>
>>
>>> Usha-
>>>
>>> I think the code below will wrap your existing code in the loop you
>>> need.  You will want to get a copy of a good perl programming book  
>>> likeProgramming Perl from O'Reilly.  It will help you out with all  
>>> thoselittle perl details like loop structures etc.
>>>
>>> Barry
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::StandAloneBlast;
>>> use Bio::Seq;
>>>
>>> my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta",
>>>                               -format => "fasta" );
>>>
>>> my $seq_obj = $seqio_obj->next_seq;
>>>
>>> open (IN, " location/of/your/probe/file") or die "Can't open IN";
>>>
>>> while (my $row = <IN>) {
>>>   chomp $row;
>>>   #Assuming your file is tab delimited...
>>>   my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/,
>>> $row;
>>>
>>>   my $input2 = Bio::Seq->new(-id=>"testquery2",
>>>                              -seq=> $probe_sequence
>>>                           );
>>>
>>>   my $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>> 'blastn',
>>>                                                       'outfile' =>
>>> 'bl2seq1.out');
>>>
>>>   my $blast_report = $factory->bl2seq($seq_obj, $input2);
>>>
>>>   #Here is where you want to throw out good matches.  You'll need to
>>> determine
>>>   #what method you want to do that.  Maybe since you want there to be
>>> no good
>>>   #hits you would just call $blast_report->max_significance and make
>>> sure it's
>>>   #value is too high to be significant.
>>>   if ($blast_report->max_significance > 0.01) {
>>>       print "$row\n";
>>>   }
>>> }
>>>
>>> -----Original Message-----
>>> From: Usha Rani Reddi [mailto:ureddi at emich.edu] Sent: Tuesday,  
>>> August 23, 2005 5:51 AM
>>> To: Barry Moore
>>> Cc: bioperl-l at portal.open-bio.org
>>> Subject: Local bl2seq
>>>
>>> Hi,
>>> I am trying to use BLAST to compare the sequences. I did the program  
>>> in
>>> Bioperl. Below is my piece of code
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::StandAloneBlast;
>>> use Bio::Seq;
>>> $seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
>>>                            -format => "fasta" );
>>> $seq_obj = $seqio_obj->next_seq;
>>> $input2 = Bio::Seq->new(-id=>"testquery2",
>>>                                 
>>> -seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
>>>         $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>> 'blastn','outfile' => 'bl2seq1.out');
>>> $blast_report = $factory->bl2seq($seq_obj, $input2);
>>>
>>> I need help for looping input2. I want to extract this part of  
>>> sequencefrom a file containing 200000 records. Using perl I am  
>>> extracting the
>>> sequence part for file of format given below.
>>> SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
>>> NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
>>> NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>> NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>>
>>> code for extracting PROBE_SEQUENCE looks like this
>>>
>>> $NemSeq =<STDIN>;
>>>
>>> chomp $NemSeq;
>>>
>>> unless (open(seqfile, $NemSeq)) {
>>> print "Cannot open file \n";
>>> exit;
>>> }
>>> @NemSeq = <seqfile> ;
>>>
>>> close seqfile;
>>>
>>> for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
>>>   #print $k, $NemSeq[$k];
>>>   @Nem =split(/\t/,$NemSeq[$k]);
>>>   $input= $Nem[3];
>>>
>>>   #print scalar(@Nem);
>>>   #print $Nem[3], "\n";
>>>   }
>>>
>>>
>>> @Nem =split(/\t/,$NemSeq)
>>>
>>> $input2 = substr(@NemSeq,4,32);
>>>
>>> So far I could successfully use bioperl(bl2seq) to compare whole  
>>> genomewith single probe. I want to compare all the 200000 thousand  
>>> probes. I am interested only
>>> in mismatches, for this particular scenario my assumption is that  
>>> more
>>> than 90% of them will match. I want to send only the mismatches to
>>> output file and discard the matches. I would like to classify the
>>> mismatches based on the percentage dissimilarity, is there a way in
>>> Bioperl for this? Thanks a lot for the reply. Please help me with  
>>> this.Thanks
>>> Usha
>>>
>>>
>>> ----- Original Message -----
>>> From: Barry Moore <barry.moore at genetics.utah.edu>
>>> Date: Monday, August 22, 2005 11:45 pm
>>> Subject: Re: [Bioperl-l] bl2seq
>>>
>>>
>>>> Usha,
>>>>
>>>> The best advice I can give you is that you need to focus your  
>>>> question a bit more.  What method are you using to compare your  
>>>> probe to
>>> your
>>>> fasta?  Regex, BLAST, Needle, RNAHybrid...?  You say your
>>> sequence
>>>> is working fine for single sequence.  Are you using Bioperl for
>>> that?
>>>> Can you tell us exactly what isn't working for you or what  
>>>> questions you have about working with multiple sequences?  Are you  
>>>> already
>>> using
>>>> Bioperl with your single sequence comparison? Can you show us
>>> some
>>>> code?
>>>> Barry
>>>>
>>>> Usha Rani Reddi wrote:
>>>>
>>>>
>>>>> Hi,
>>>>> I am trying to compare two hundred thousand probes(each one of
>>>> them) to
>>>>
>>>>> another genome. Format of the file containing probes is like this
>>>>> SEQ_ID	PROBE_ID	POSITION	PROBE_SEQUENCE
>>>>> NC_004116	1	1	AATTAACATTGTTGATTTTATTCTTCAACATC
>>>>> NC_004116	3	13	TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>>>> NC_004116	5	25	TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>>>> NC_004116	7	37	GAAAACTTTATTTTTTTATGGTACAATATAAC
>>>>> NC_004116	9	49	TTTTTATGGTACAATATAACAATAATTATCCA
>>>>> NC_004116	11	61	AATATAACAATAATTATCCACAAGACAATAAG
>>>>> NC_004116	13	73	ATTATCCACAAGACAATAAGGAAGAAGCTATG
>>>>> NC_004116	15	85	ACAATAAGGAAGAAGCTATGACGGAAAACGAA
>>>>> What I am trying to do is compare PROBE_SEQUENCE to fasta file of
>>>>> Streptococcus agalactiae. I am trying to loop through the probes
>>>> but not
>>>>
>>>>> sure how to proceed. My program is working fine for single
>>>> sequence. One
>>>>
>>>>> more thing is I am not interested in matches, I want to display
>>> only> >mismatches. I am new to Bioperl, some one please help me with  
>>> this.
>>>
>>>>> Thanks
>>>>> Usha
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> -- 
>>>> Barry Moore
>>>> Dept. of Human Genetics
>>>> University of Utah
>>>> Salt Lake City, UT
>>>>
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
> -- 
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------