[Gmod-gbrowse] Re: [Bioperl-l] Windows bug in Bio::DB::Fasta?

Lincoln Stein lstein at cshl.edu
Wed Aug 24 17:30:16 EDT 2005


Glad it fixed the problem. Much thanks to Scott who correctly diagnosed the 
problem.

Lincoln

On Tuesday 23 August 2005 04:03 pm, Chris Fields wrote:
> That did the trick!  Everything looks fine now.  Thanks Lincoln!
>
> Chris
>
> At 05:18 PM 8/22/2005, Lincoln Stein wrote:
> >I've just looked into this. The bug occurs when Windows opens the FASTA
> > file in text mode rather than binary mode; when in text mode the "\r\n"
> > sequence is invisibly mapped to "\n" during readline operations, so
> > Bio::DB::Fasta thinks that it is dealing with a Unix-format file; then
> > when the module tries to seek() to the proper line number, Windows
> > doesn't do the line end mapping, so it seeks to the wrong offset.  (sound
> > of hairs being pulled)
> >
> >I've fixed the problem by explicitly calling binmode() on all filehandles
> >that
> >Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS
> > and the gbrowse 1.63 CVS version. It ought to fix Chris' GC content
> > weirdness.
> >
> >Lincoln
> >
> >On Monday 15 August 2005 01:22 pm, Scott Cain wrote:
> > > Just to follow up on my own email with a little more information: in
> > > Fasta.pm, line 697:
> > >
> > >   $termination_length ||= /\r\n$/ ? 2 : 1;  # account for
> > > crlf-terminated Windows files
> > >
> > > The pattern match is failing on DOS formatted files; I don't know why.
> > > Does anyone else?
> > >
> > > On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote:
> > > > Hello all,
> > > >
> > > > I am investigating a bug in GBrowse that seems to only surface when
> > > > people are using the memory (ie, file) adaptor on Windows systems.
> > > > Here's the bug report:
> > > >
> > > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&
> > > >grou p_id=27707
> > > >
> > > > I've tracked the problem down to Bio::DB::Fasta when the file is dos
> > > > formatted (that is, it has both line feeds and carriage returns), BDF
> > > > returns the wrong string when a subsequence is requested, but when
> > > > the file is unix formatted (ie only CR (or is it only LF?)), it
> > > > returns the right string.  I wrote the very simple test script below
> > > > and stepped it through the perl debugger.  It looks like the bug is
> > > > in the caloffset method, as it returns the same offsets regardless of
> > > > the file type, which then makes the subsequent seek into the file go
> > > > to the wrong coordinates of dos formatted files.
> > > >
> > > > Unfortunately, I don't really know what is going on caloffset, so I
> > > > don't know how to fix it, but it presumably has to check the format
> > > > of the file somewhere and take that into account.
> > > >
> > > > Thanks,
> > > > Scott
> >
> >--
> >Lincoln D. Stein
> >Cold Spring Harbor Laboratory
> >1 Bungtown Road
> >Cold Spring Harbor, NY 11724
> >FOR URGENT MESSAGES & SCHEDULING,
> >PLEASE CONTACT MY ASSISTANT,
> >SANDRA MICHELSEN, AT michelse at cshl.edu
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


More information about the Bioperl-l mailing list