[Bioperl-l] needle parser in bioperl?

Jason Stajich jason at bioperl.org
Fri Dec 15 14:48:32 UTC 2006


I get the impression you are trying to use the wrong tool for the  
job.  Can you explain a little more generally what you want to do?

Semantically FASTA in Bio::SearchIO is much different from FASTA in  
Bio::AlignIO.  We explain this on the wiki, please have a look on the  
FASTA page.

  do not use Bio::SearchIO to parse multi-fasta alignment output  
Bio::SearchIO is for pairwise alignment reports
  use Bio::AlignIO for a multi-fasta format or for msf - you just  
provide a different field to '-format'.

But none of that is going to help you get start/end for your  
alignment because that is not part of the output format - do the  
experiment of looking at the file and figuring out what are the  
actual fields you want output, if they don't exist then you either  
have a format that won't work for your question, or you will have to  
calculate additional .  If you trying to align transcripts to genome  
please consider tools that are built for it (and referenced on the  
wiki like Sim4, est2genome, exonerate, BLAT).

-jason
On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote:

> I ran needle like this
>
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out
>
> Please find the output attached.
>
> When I run the following :-
>
> use Bio::SearchIO;
>
> my $io = Bio::SearchIO->new(-file   => "1.out",
>                           -format => "fasta" );
>
> while ( my $result = $io->next_result() )
> {
>       while( my $hit = $result->next_hit)
>      {
>
>               print "yes\n";
>       }
> }
>
>
> It says :-
>
> -------------------- WARNING ---------------------
> MSG: unrecognized FASTA Family report file!
> ---------------------------------------------------
>
> What should I do?
>
> ~Neeti.
>
> On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>>
>> Neeti,
>>
>> In lieu of a response from a BioPerl guru... why not use Needle to
>> generate your pairwise alignment in fasta format, rather than msf  
>> format?
>> The sequence you want should correspond to a single HSP which you  
>> can get
>> directly from the fasta alignment with Bio::SearchIO:
>> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need  
>> to use
>> Bio::AlignIO at all.
>>
>> Derek.
>>
>>
>> -----Original Message-----
>> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
>> Sent: 15 December 2006 05:22
>> To: Fairley, Derek; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> Hi,
>>
>> Thanks a lot for your response.
>> I ran needle like this
>> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
>> It gave me the output in format msf.
>> But now my problem is, if I use Bio::AlignIO module of Bioperl,  
>> how can I
>> get the alignment start and stop coordinates on the sequence. I mean
>> something like hsp->query->start which gives us the alignment  
>> start position
>> on query sequence in a blast output when using Bio::SearchIO.
>> Please help.
>> Like I explained with an example in my previous mail, I want the
>> coordinate where the alignment starts on the sequence.
>>
>> ~Neeti.
>> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>> Neeti,
>>
>> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>>
>> "The results can be output in one of several styles by using the
>> command-line qualifier -aformat xxx, where 'xxx' is replaced by  
>> the name of
>> the required format. Some of the alignment formats can cope with an
>> unlimited number of sequences, while others are only for pairs of  
>> sequences.
>>
>> The available multiple alignment format names are: unknown, multiple,
>> simple, fasta, msf, trace, srs
>>
>> The available pairwise alignment format names are: pair, markx0,  
>> markx1,
>> markx2, markx3, markx10, srspair, score
>>
>> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
>> information on alignment formats."
>>
>> Not sure based on this whether you can get pairwise alignment in .msf
>> format; can't think of a good reason why not. The BioPerl  
>> Align::IO module
>> will allow you to parse alignments in .msf format.
>>
>> HTH,
>>
>> Derek.
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:
>> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
>> Sent: 14 December 2006 08:03
>> To: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> How do I run needle specifying that I want the MSF format, on a  
>> linux box?
>> The help doesnt show me any format option. Is there anything  
>> available to
>> pasre MSF format?
>> Please find an example alignment file attached. Here the  
>> seq_of_contig
>> aligns with the reference sequence (i.e. SEQ_1.REF) starting at  
>> position
>> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate  
>> from the
>> output alignment, how can I parse the result to get this?
>>
>> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>> >
>> >
>> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>> >
>> > > Hi,
>> > >
>> > > Does anyone know of a bioperl parser for needle output,  
>> basically I
>> > > won't
>> > > where the target sequence aligns on the template (i.e. coordinate
>> > > on the
>> > > template where the taget aligns).
>> > >
>> > > --
>> > > -Neeti
>> > > Even my blood says, B positive
>> >
>> > I answered this a number of months back:
>> >
>> > http://tinyurl.com/yzlbx5
>> >
>> > Basically, newer versions of EMBOSS have changed the output for the
>> > AlignIO::emboss parser (which parses needle). I don't believe the
>> > parser has been fixed to deal with that, but Jason has pointed out
>> > you can use MSF output when running needle, then parse using  
>> AlignIO
>> > with the format set to 'msf'.
>> >
>> > chris
>> >
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>
>
>
> -- 
> -Neeti
> Even my blood says, B positive
> <1.out>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html





More information about the Bioperl-l mailing list