[Bioperl-l] Re: Thanks
Hilmar Lapp
hlapp at gmx.net
Wed Aug 24 12:39:21 EDT 2005
Thanks Barry for this excellent answer. It couldn't have been written
better. -hilmar
On Aug 24, 2005, at 5:30 AM, Barry Moore wrote:
> Usha-
>
> It is important that you keep Bioperl related discussions on the
> bioperl list, that way others can benefit from the discussion in the
> future by searching the archives. Having said that, I am constantly
> accidentally replying directly to people and not to the list because I
> hit reply instead of reply all, so I'm not really a good one to talk.
>
> It seems from nature of your questions to this list that you might be
> quite new to perl programming. You participation on the list is still
> welcome, but this is the kind of problem that you want to learn how to
> solve yourself. You want to to avoid at all cost giving the
> impression that you are asking the list to do your debugging for you
> - otherwise people will just stop replying to your messages. An very
> valuable article about asking questions to forums like this one is
> found at http://www.catb.org/~esr/faqs/smart-questions.html#answers
> <http://www.catb.org/%7Eesr/faqs/smart-questions.html#answers>. Read
> it. Live it. O.K. enough preaching...
>
> If you have Programming Perl
> <http://www.amazon.com/exec/obidos/tg/detail/-/0596000278/
> qid=1124883861/sr=8-1/ref=pd_bbs_1/102-6339742-6061723?
> v=glance&s=books&n=507846> read the first half of chapter 20. If you
> don't have the Perl book, you can get the same info from
> http://www.perldoc.com/perl5.8.4/pod/perldebug.html. After you
> understand the perl debugger (or if you already do) look at the error
> message that you got. In the first line it reports "seq doesn't
> validate". So something is wrong with some sequence that you are
> trying to use in your script. Since perl itself isn't' aware of
> sequences, and since the stack trace below shows that the exception
> occurred while in Bio::PrimarySeq you can conclude that some sequence
> that your are sending to bioperl is bad. Now fire up perl running
> your script with the debugger like this perl -d yourscript.pl. Use
> 'n' to step through your code to the open command on line 14. No
> errors yet? You've just loaded your sequences.fasta sequence, so that
> sequence must be OK. Continue stepping through your code until you
> get the error again. Where exactly did the error occur? When you try
> to set $input2 as a new Bio::Seq object? Run the debugger again, and
> step through to the line just before where the error occurred. Use
> the debugger's x command to see what the values of $seq_id,
> $probe_id, $position, $probe_sequence are. This should give you a
> clue s to what your problem is. One more clue comes from the error
> message. It says "Attempting to set the sequence to [PROBE_SEQUENCE]
> which does not look healthy". The error message says that you are
> trying to set the sequence to PROBE_SEQUENCE. Try to figure out why
> this error is occurring and how to solve it. If you're still stuck
> let us know what you've tried and ask us again.
> Barry
>
>
> Usha Rani Reddi wrote:
>
>> Hi,
>> Thanks a lot for your help. When I tried to run the given code I got
>> the
>> following message.
>>
>> MSG: seq doesn't validate, mismatch is 1
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Attempting to set the sequence to [PROBE_SEQUENCE] which does not
>> look healthy
>> STACK Bio::PrimarySeq::seq
>> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:268
>> STACK Bio::PrimarySeq::new
>> /usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:217
>> STACK Bio::Seq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:498
>> STACK toplevel barr:23
>>
>> What should I do next? Please help me.
>> Thanks
>> Usha.
>>
>> ----- Original Message -----
>> From: Barry Moore <bmoore at genetics.utah.edu>
>> Date: Tuesday, August 23, 2005 2:40 pm
>> Subject: [Bioperl-l] RE: Local bl2seq
>>
>>
>>> Usha-
>>>
>>> I think the code below will wrap your existing code in the loop you
>>> need. You will want to get a copy of a good perl programming book
>>> likeProgramming Perl from O'Reilly. It will help you out with all
>>> thoselittle perl details like loop structures etc.
>>>
>>> Barry
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::StandAloneBlast;
>>> use Bio::Seq;
>>>
>>> my $seqio_obj = Bio::SeqIO->new(-file => " sequences.fasta",
>>> -format => "fasta" );
>>>
>>> my $seq_obj = $seqio_obj->next_seq;
>>>
>>> open (IN, " location/of/your/probe/file") or die "Can't open IN";
>>>
>>> while (my $row = <IN>) {
>>> chomp $row;
>>> #Assuming your file is tab delimited...
>>> my ($seq_id, $probe_id, $position, $probe_sequence) = split /\t/,
>>> $row;
>>>
>>> my $input2 = Bio::Seq->new(-id=>"testquery2",
>>> -seq=> $probe_sequence
>>> );
>>>
>>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>> 'blastn',
>>> 'outfile' =>
>>> 'bl2seq1.out');
>>>
>>> my $blast_report = $factory->bl2seq($seq_obj, $input2);
>>>
>>> #Here is where you want to throw out good matches. You'll need to
>>> determine
>>> #what method you want to do that. Maybe since you want there to be
>>> no good
>>> #hits you would just call $blast_report->max_significance and make
>>> sure it's
>>> #value is too high to be significant.
>>> if ($blast_report->max_significance > 0.01) {
>>> print "$row\n";
>>> }
>>> }
>>>
>>> -----Original Message-----
>>> From: Usha Rani Reddi [mailto:ureddi at emich.edu] Sent: Tuesday,
>>> August 23, 2005 5:51 AM
>>> To: Barry Moore
>>> Cc: bioperl-l at portal.open-bio.org
>>> Subject: Local bl2seq
>>>
>>> Hi,
>>> I am trying to use BLAST to compare the sequences. I did the program
>>> in
>>> Bioperl. Below is my piece of code
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::StandAloneBlast;
>>> use Bio::Seq;
>>> $seqio_obj = Bio::SeqIO->new(-file => "sequences.fasta",
>>> -format => "fasta" );
>>> $seq_obj = $seqio_obj->next_seq;
>>> $input2 = Bio::Seq->new(-id=>"testquery2",
>>>
>>> -seq=>"ggacccgatgactagccccttgatcgtagcagtggcaagtca");
>>> $factory = Bio::Tools::Run::StandAloneBlast->new('program' =>
>>> 'blastn','outfile' => 'bl2seq1.out');
>>> $blast_report = $factory->bl2seq($seq_obj, $input2);
>>>
>>> I need help for looping input2. I want to extract this part of
>>> sequencefrom a file containing 200000 records. Using perl I am
>>> extracting the
>>> sequence part for file of format given below.
>>> SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE
>>> NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC
>>> NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>> NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>>
>>> code for extracting PROBE_SEQUENCE looks like this
>>>
>>> $NemSeq =<STDIN>;
>>>
>>> chomp $NemSeq;
>>>
>>> unless (open(seqfile, $NemSeq)) {
>>> print "Cannot open file \n";
>>> exit;
>>> }
>>> @NemSeq = <seqfile> ;
>>>
>>> close seqfile;
>>>
>>> for (my $k = 0 ; $k < scalar @NemSeq ; ++$k) {
>>> #print $k, $NemSeq[$k];
>>> @Nem =split(/\t/,$NemSeq[$k]);
>>> $input= $Nem[3];
>>>
>>> #print scalar(@Nem);
>>> #print $Nem[3], "\n";
>>> }
>>>
>>>
>>> @Nem =split(/\t/,$NemSeq)
>>>
>>> $input2 = substr(@NemSeq,4,32);
>>>
>>> So far I could successfully use bioperl(bl2seq) to compare whole
>>> genomewith single probe. I want to compare all the 200000 thousand
>>> probes. I am interested only
>>> in mismatches, for this particular scenario my assumption is that
>>> more
>>> than 90% of them will match. I want to send only the mismatches to
>>> output file and discard the matches. I would like to classify the
>>> mismatches based on the percentage dissimilarity, is there a way in
>>> Bioperl for this? Thanks a lot for the reply. Please help me with
>>> this.Thanks
>>> Usha
>>>
>>>
>>> ----- Original Message -----
>>> From: Barry Moore <barry.moore at genetics.utah.edu>
>>> Date: Monday, August 22, 2005 11:45 pm
>>> Subject: Re: [Bioperl-l] bl2seq
>>>
>>>
>>>> Usha,
>>>>
>>>> The best advice I can give you is that you need to focus your
>>>> question a bit more. What method are you using to compare your
>>>> probe to
>>> your
>>>> fasta? Regex, BLAST, Needle, RNAHybrid...? You say your
>>> sequence
>>>> is working fine for single sequence. Are you using Bioperl for
>>> that?
>>>> Can you tell us exactly what isn't working for you or what
>>>> questions you have about working with multiple sequences? Are you
>>>> already
>>> using
>>>> Bioperl with your single sequence comparison? Can you show us
>>> some
>>>> code?
>>>> Barry
>>>>
>>>> Usha Rani Reddi wrote:
>>>>
>>>>
>>>>> Hi,
>>>>> I am trying to compare two hundred thousand probes(each one of
>>>> them) to
>>>>
>>>>> another genome. Format of the file containing probes is like this
>>>>> SEQ_ID PROBE_ID POSITION PROBE_SEQUENCE
>>>>> NC_004116 1 1 AATTAACATTGTTGATTTTATTCTTCAACATC
>>>>> NC_004116 3 13 TGATTTTATTCTTCAACATCTGTGGAAAACTT
>>>>> NC_004116 5 25 TCAACATCTGTGGAAAACTTTATTTTTTTATG
>>>>> NC_004116 7 37 GAAAACTTTATTTTTTTATGGTACAATATAAC
>>>>> NC_004116 9 49 TTTTTATGGTACAATATAACAATAATTATCCA
>>>>> NC_004116 11 61 AATATAACAATAATTATCCACAAGACAATAAG
>>>>> NC_004116 13 73 ATTATCCACAAGACAATAAGGAAGAAGCTATG
>>>>> NC_004116 15 85 ACAATAAGGAAGAAGCTATGACGGAAAACGAA
>>>>> What I am trying to do is compare PROBE_SEQUENCE to fasta file of
>>>>> Streptococcus agalactiae. I am trying to loop through the probes
>>>> but not
>>>>
>>>>> sure how to proceed. My program is working fine for single
>>>> sequence. One
>>>>
>>>>> more thing is I am not interested in matches, I want to display
>>> only> >mismatches. I am new to Bioperl, some one please help me with
>>> this.
>>>
>>>>> Thanks
>>>>> Usha
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Barry Moore
>>>> Dept. of Human Genetics
>>>> University of Utah
>>>> Salt Lake City, UT
>>>>
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
> --
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list