[Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/1065
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Mon, 21 Jan 2002 15:43:36 +0000
Steve,
Let's move this this into bioperl-l where it belongs...
OK, I think I've got phylip.pm to work properly now. The last problems all
came from $name =~ /.{10}/ not matching anything when the sequence name was
<10 characters long. We did not have any in our test suite. Thanks for
spotting this.
In summary: If you need phylip format output that works with Joe
Felsenstein's PHYLIP programs (a reasonable request!), you need the latest
(v.1.7) from cvs. Also, I forgot to say it to Steve that spurious warnings
printed when importing gapped sequences are generated in Bio::LocatableSeq.
The warnings are now silenced by default.
-heikki
Steven Cannon wrote:
>
> On Friday, January 18, 2002, at 03:30 AM, Heikki Lehvaslaiho wrote:
>
> > Steve,
> >
> > Thanks for the bug report.
> >
> >> First, phylip.pm is placing three line returns between sequence blocks.
> >> Felsenstein's programs in the Phylip suite can't deal with this --
> >> they require
> >> two returns between blocks (that is, one blank line rather than two;
> >> illustrated
> >> below).
> >
> > That is easy enought to change.
> >
> >> Second (just an annoyance), when converting from, say, fasta to phylip
> >> format,
> >> any dashes in the fasta-format alignment generate STDIO warnings --
> >> one warning
> >> per sequence (annoying, since any decent alignment will have gaps,
> >> usually
> >> indicated by dashes). Typical warning:
> >>
> >> -------------------- WARNING ---------------------
> >> MSG: In sequence MtTC36450 residue count gives value 64.
> >> Overriding value [65] with value 64 for Bio::LocatableSeq::end().
> >> ---------------------------------------------------
> >
> > Hmm. That warning proved useful when debugging parsers, so let's turn it
> > into a proper debugging statement. From now on the warning will be
> > printed
> > only
> > if $locatableseq->verbose > 0.
> >
> >> Third (just an annoyance), some garbage is inserted into the
> >> phylip-formatted
> >> sequence names, in the form of truncated "start-end position" numbers.
> >> For
> >> example, if the original sequence name has the 7 characters
> >> 'ABCDEFG', three
> >> characters indicating the start position of the sequence will be
> >> added to the
> >> name, bringing the name to the allowed 10-character phylip name length:
> >> 'ABCDEFG/1-'. This added information is never useful in the
> >> 10-character names,
> >> and will usually have to be subsequently stripped out.
> >
> > You are right. Ten characters is too short to hold "start-end
> > position", so
> > lets
> > dump them.
> >
> > All these changes are in phylip.pm file so once I've updated the cvs
> > repository, you can go into WebCVS and copy the file over the old one.
> > If
> > you do that, could you let me know that everything works.
> >
> > Yours,
> > -Heikki
>
> Heikki -
>
> I did a test, using phylip.pm from CVS, and it looks like the fixes
> introduced some new problems. Here are my test file and output:
>
> >H122_HMM
> tyvklatlavfmltqflivqtknveagqcpragracsqaesnacgdieecicvsegshydggick
> >MtNP212753
> tyvklatlavfmltqflivqtknveegqcpfagrvcsqyesnacgdseecicvsewshydggick
> >MtTC30424
> -----------------------iearecpsfgtvcsilrsnscgniieyiciphwih--ggick
> >MtTC4140912341234
> tyvklailavlhltiflifqtknveaascpnvgavcspfetkpcgnvkdcrclpwglff--gtc-
> >MtTC28
> tyvklitlalflvttllmfqtknveaefcssvgsfcspfntnpcgylgncrcvpy--ylyggtce
>
> 5 65
> tyvklitlal flvttllmfq tknveaefcs svgsfcspfn tnpcgylgnc
> MtNP212753 tyvklatlav fmltqflivq tknveegqcp fagrvcsqye snacgdseec
> tyvklitlal flvttllmfq tknveaefcs svgsfcspfn tnpcgylgnc
> MtTC414091 tyvklailav lhltiflifq tknveaascp nvgavcspfe tkpcgnvkdc
> tyvklitlal flvttllmfq tknveaefcs svgsfcspfn tnpcgylgnc
>
> rcvpy--yly ggtce
> icvsewshyd ggick
> rcvpy--yly ggtce
> rclpwglff- -gtc-
> rcvpy--yly ggtce
>
> So, the line return problem is fixed (3 -> 2 between interleaved
> blocks), and start-end information is being omitted from the shortened
> name (good), but sequence names shorter than 10 characters are being
> dropped (bad!).
>
> I'm also still getting, e.g.,
> -------------------- WARNING ---------------------
> MSG: In sequence MtTC36450 residue count gives value 64.
> Overriding value [65] with value 64 for Bio::LocatableSeq::end().
> ---------------------------------------------------
>
> I don't know if you were suggesting that I set $locatableseq->verbose >
> 0 ?
>
> Steve
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-guts-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________