[Bioperl-l] SimpleAlign Query
Jun Yin
yinjun111 at gmail.com
Thu Jun 20 16:24:56 UTC 2013
Hi, Chris Field,
I have tried your script. It worked well in my computer. There should not
be any problem with it.
For your question, in BioPerl, the sequence object always has the start
coordinate smaller than the end coordinate. It is not a bug.
The strand information is kept in the sequence object. You can retrieve
the strand information by $seq->strand .
As suggested by Chris J Fields, It is more common to use SearchIO to deal
with blast output. You can try that module too.
Cheers,
--
Jun Yin, Ph.D.
Postdoc Associate
James Noonan Group
Department of Genetics
Yale University School of Medicine
Sterling Hall of Medicine, I-120A
333 Cedar Street
New Haven, CT 06510
http://www.yale.edu/noonanlab/Jun.html
On 6/19/13 9:15 AM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:
>The alignments from BLAST reports are generally parsed via Bio::SearchIO.
> There is a HOWTO on SearchIO:
>
>http://www.bioperl.org/wiki/HOWTO:SearchIO
>
>Tiling in BioPerl uses these objects, but you can get Bio::SimpleAlign
>from the HSPs.
>
>chris
>
>On Jun 18, 2013, at 10:49 AM, "Yin, Jun" <jun.yin at yale.edu>
> wrote:
>
>> Hi, Chris,
>>
>> No problem. I can help to look at what is the problem. Yes, usually in
>> Bio::SimpleAlign even if the sequence is reverse complementary, the
>>start
>> coordinate should be smaller than the end coordinate. But it seems
>> Bio::AlignIO::bl2seq is a wrapper for Bio::SearchIO::blast. It is a bit
>> different from the other Bio::AlignIO modules, so there may be some
>> inconsistency.
>>
>> Can you send me the input file for your script? So I can try to figure
>>out
>> where the problem is.
>>
>> I would like to recommend you to ask in the bioperl mail list. There
>>will
>> be more experts in BioPerl there.
>>
>>
>> Cheers,
>> --
>> Jun Yin, Ph.D.
>> Postdoc Associate
>>
>> James Noonan Group
>> Department of Genetics
>> Yale University School of Medicine
>> Sterling Hall of Medicine, I-120A
>> 333 Cedar Street
>> New Haven, CT 06510
>> http://www.yale.edu/noonanlab/Jun.html
>>
>>
>>
>>
>> On 6/18/13 11:23 AM, "Christopher Field" <chris.field at unibas.ch> wrote:
>>
>>> Dear Dr. Yin,
>>>
>>> I've got a problem with SimpleAlign.pm, and since you are responsible
>>>for
>>> the most recent updates on this module, I hope it's ok to email and ask
>>> for your help.
>>> To summarise my short program, it uses bl2seq to throw some sanger
>>>reads
>>> against a genome, and then I'm tiling the reported hits into a visually
>>> accessible format.
>>> The problem has arisen, that sometimes the alignment is with the
>>>reverse
>>> complement, but the start and end of both sequences in the Align object
>>> are such that the start is always lower than the end. The strand flag
>>> also appears uninitialised. Is this a straightforward bug, or a result
>>>of
>>> something I'm doing in my code?
>>> Any help would be greatly appreciated.
>>>
>>> Chris Field
>>> Group van Nimwegen
>>> Biozentrum der Universität Basel
>>> 4056-CH Basel
>>>
>>> Code:
>>>
>>> #!/usr/bin/env perl;
>>>
>>> use v5.8.1;
>>> use warnings;
>>> use strict;
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> use Bio::AlignIO;
>>> use Bio::SimpleAlign;
>>> use Bio::Tools::Run::StandAloneBlast;
>>>
>>> my $usage = "align.pl reference query\n";
>>>
>>> open (my $fo,'>',"align.out",) or die $!;
>>>
>>> my $rfile = shift or die $usage;
>>> my $qfile = shift or die $usage;
>>>
>>> my $rstream = Bio::SeqIO->new(-file => $rfile);
>>> my $qstream = Bio::SeqIO->new(-file => $qfile);
>>>
>>> my $rseq = $rstream->next_seq;
>>>
>>> my $align = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastn',
>>> -outfile => 'bl2seq.out');
>>>
>>> while (my $qseq = $qstream->next_seq){
>>> $align->bl2seq($qseq,$rseq);
>>> my $astream = Bio::AlignIO->new(-file => 'bl2seq.out', -format =>
>>> 'bl2seq');
>>> my @seq = ("n") x $qseq->length;
>>> my @match = (" ") x $qseq->length;
>>> my @pos = (" ") x $qseq->length;
>>> while (my $align = $astream->next_aln){
>>> my $qhit = $align->get_seq_by_pos(1);
>>> my $rhit = $align->get_seq_by_pos(2);
>>> if(!grep(/[acgt]/, at seq[$qhit->start-1 .. $qhit->end-1])){
>>> print $rhit->start,":",$rhit->end,"\n";
>>> @seq[$qhit->start-1 .. $qhit->end-1] =
>>> split("",$rhit->seq);
>>> @match[$qhit->start-1 .. $qhit->end-1] =
>>> split("",$align->match_line);
>>> @pos[$qhit->start-1 .. $qhit->end-1] =
>>> ("<","-","-",split("",$rhit->start),("-") x
>>>
>>>($qhit->length-(length($rhit->start)+length($rhit->end)+6)),split("",$rh
>>>it
>>> ->end),"-","-",">");
>>> }
>>> }
>>> print $fo $qseq->display_id,"\n";
>>> print $fo $qseq->seq,"\n";
>>> print $fo @match,"\n";
>>> print $fo @seq,"\n";
>>> print $fo @pos,"\n\n";
>>> }
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list