[Bioperl-l] Bio::LocatableSeq warning
Chris Fields
cjfields at illinois.edu
Wed Dec 17 22:59:41 UTC 2008
It is a problem with clustalw parsing not accounting for frameshifts,
mapping, and other odd bits coming from this format. That could be
added in; LocatableSeq now allows this by specifying position => shift
in frameshifts() as a hash ref.
There are other significant problems with LocatableSeq, though.
Gaps, frameshifts, and residues are set and checked via globals and
aren't set per instance (which is definitely not optimal). I'll file
a bug report on this to track it.
chris
On Dec 17, 2008, at 12:41 PM, Mark A. Jensen wrote:
> Yes- my bad- if the sequence name contains a range, then clustalw
> does use this range for the length; if it does not, it counts as Roy
> says-
> I agree; This is prob an AlignIO bug by now, eh?
> ----- Original Message ----- From: "Roy Chaudhuri" <roy.chaudhuri at gmail.com
> >
> To: "Sendu Bala" <bix at sendu.me.uk>
> Cc: "bioperl-l" <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 17, 2008 1:05 PM
> Subject: Re: [Bioperl-l] Bio::LocatableSeq warning
>
>
>> Depends what you mean by valid. Your file contains asterisks and
>> digits, representing stop codons and frameshifts (using Genewise
>> notation according to the Pal2Nal paper). Bio::AlignIO::clustalw
>> ignores those by doing an s/[^A-Za-z]//g before calculating the
>> sequence length. Bio::LocatableSeq notices the discrepancy and
>> corrects the length while issuing a warning. Bio::AlignIO::clustalw
>> would need to be fixed if you want it to parse files with non-
>> letter residues correctly. I think ClustalW itself removes non-
>> letter residues from the input data so will never output such files.
>>
>> Roy.
>> --
>> Dr. Roy Chaudhuri
>> Department of Veterinary Medicine
>> University of Cambridge, U.K.
>>
>> Sendu Bala wrote:
>>> I've just committed a test alignment file to bioperl-run t/data,
>>> and Bio::LocatableSeq spurts up a warning about it:
>>>
>>> perl -MBio::AlignIO -e '$ai = Bio::AlignIO->new(-file => "t/data/
>>> pal2nal.aln"); $aln = $ai->next_aln;'
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: In sequence pseudogene residue count gives end value 183.
>>> Overriding value [178] with value 183 for Bio::LocatableSeq::end().
>>> ----LNCIVNDSQKMGIIRNGDLP*PQLKNKF2-
>>> FQRMTTPSSAEGKENLVFLIRKNWFSITEKNQPLKYIINLVVSRESKEPPQRPPFLD
>>> *SLGDALKRIEQLKLANKQDVFFTVGGSSVYKESMN*-
>>> DHFKLFVTWIMQDFQSDTFFS4EGDLEKYKLLPEYPQGVVSDVEEEKGIKYKFEVYEKND
>>> ---------------------------------------------------
>>>
>>>
>>> Is there simply something wrong with the alignment file (quite
>>> possible), and this warning means something?
>>>
>>> Or is this just normal behaviour now for valid alignment files?
>>> What is this warning supposed to mean to the user? What should I
>>> do about it? Why do I need to see it?
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list