[Bioperl-l] LocusLink IO
Allen Day
allenday@ucla.edu
Tue, 3 Dec 2002 02:59:02 -0800 (PST)
looks like a \r\n problem. did you happen to untar this file on a windows
or non-os-x-mac box, or do something else with it on one of these
platforms? i got nipped by this problem using winzip to decompress a
.tar.gz once. when i transferred the contents (a bunch of c source files)
over to a linux box, the compiler had problems reading them b/c the
newlines were mangled.
i would try doing the whole decompress/test process on a linux box, or see
if you can find a way to do a binary decompress on the files if you're
stuck doing it on old-mac/windows. good luck.
-ad
On Mon, 2 Dec 2002, Paul Boutros wrote:
> I followed the suggestion (I think from Allan Day) of extracting &
> diff'ing record 27 from the file. This is what I got:
>
> ===========================
> pcboutro@engmail[5] diff testLL.txt LL-sample.seq | more
> 43a44
> > UNIGENE: Hs.75741
> 46d46
> < UNIGENE: Hs.75741
> 52a53,54
> > BUTTON: homol.gif
> > LINK:
> http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=26[loc]&TAXID
> =9606
> 133a136
> > UNIGENE: Hs.121521
> 136d138
> < UNIGENE: Hs.121521
> 144a147,148
> > BUTTON: homol.gif
> > LINK:
> http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=27[loc]&TAXID
> =9606
> ==========================
>
> The file does indeed terminate with a >> but I didn't see any empty lines
> after that. I'll submit this as a bug report along with everything I've
> tested so far.
>
> Paul
>
> On Mon, 2 Dec 2002, Hilmar Lapp wrote:
>
> > Maybe an end of file (recognition-) problem. Could be pretty simple.
> > If you visit the end of your offending input file, are there strange
> > things or excessive empty lines? Does it terminate with a record
> > delimiter (>>)?
> >
> > I may not get a chance to investigate this before Wednesday. Can you
> > submit it as a bug report to make sure it's in the queue?
> >
> > -hilmar
> >
> > On Friday, November 29, 2002, at 02:52 PM, Paul Boutros wrote:
> >
> > > Hi again,
> > >
> > > I don't encounter any problems parsing the test file:
> > > t\data\LL-sample.seq
> > >
> > > If I run the LocusLink test
> > > c:\perl\bioperl-live> perl -w t\LocusLink.t
> > > 1..23
> > > ok 1
> > > ok 2
> > > Use of uninitialized value in pattern match (m//) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 384, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 600, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 603, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 604, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 605, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 606, <GEN0> line 2.
> > > Use of uninitialized value in length at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 618, <GEN0> line 2.
> > > Use of uninitialized value in substr at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 633, <GEN0> line 2.
> > > Use of uninitialized value in pattern match (m//) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 384, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 600, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 603, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 604, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 605, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 606, <GEN0> line 3.
> > > Use of uninitialized value in length at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 618, <GEN0> line 3.
> > > Use of uninitialized value in substr at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 633, <GEN0> line 3.
> > > ok 3
> > >
> > > and okay through the rest of the tests.
> > >
> > > Visually the two files look very similar, and there are no obvious
> > > formatting differences. And it does take quite a few seconds of
> > > running
> > > before the two "Deep Recursion" warnings come up, then a few more
> > > before I
> > > get the exception.
> > >
> > > When I run:
> > >
> > > use Bio::SeqIO;
> > > use strict;
> > > my $file = $ARGV[0];
> > > my $seqio = Bio::SeqIO->new(
> > > -format => 'locuslink',
> > > -file => $file
> > > );
> > >
> > > while (my $seq = $seqio->next_seq()) {
> > > my $acc = $seq->annotation();
> > > print $seq->accession(), "\n";
> > > }
> > >
> > > The two deep recursion warnings come as:
> > >
> > > 15601
> > > Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > > C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15465.
> > > 15731
> > > Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > > C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15595.
> > > 15874
> > > 15890
> > >
> > > And the exception is thrown as:
> > > 24785
> > > 24786
> > > 24787
> > >
> > > ------------- EXCEPTION -------------
> > > MSG: No LOCUSID in first line of record. Not LocusLink in my book.
> > > STACK Bio::SeqIO::locuslink::next_seq
> > > C:/Perl/site/lib/Bio\SeqIO\locuslink.pm:435
> > > STACK toplevel testLL.pl:11
> > >
> > > --------------------------------------
> > >
> > > On Fri, 29 Nov 2002, Hilmar Lapp wrote:
> > >
> > >> I will check what's happening. There is a test case and a sample
> > >> file in the repository; while you're at it, do you see what the
> > >> fundamental difference is between the sample and the your input
> > >> file? (Or does the test fail as well for you?)
> > >>
> > >> -hilmar
> > >>
> > >>> -----Original Message-----
> > >>> From: Paul Boutros [mailto:pcboutro@engmail.uwaterloo.ca]
> > >>> Sent: Friday, November 29, 2002 9:21 AM
> > >>> To: Hilmar Lapp
> > >>> Cc: bioperl-l@bioperl.org
> > >>> Subject: Re: [Bioperl-l] LocusLink IO
> > >>>
> > >>>
> > >>> I tried again on today's (11/29/2002) LL_tmpl file and same error:
> > >>>
> > >>> C:\paul\dev\LocusLink>perl -w testLL.pl
> > >>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15465.
> > >>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15595.
> > >>>
> > >>> ------------- EXCEPTION -------------
> > >>> MSG: No LOCUSID in first line of record. Not LocusLink in my book.
> > >>> STACK Bio::SeqIO::locuslink::next_seq
> > >>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm:435
> > >>> STACK toplevel testLL.pl:8
> > >>>
> > >>> --------------------------------------
> > >>>
> > >>>
> > >>> On Thu, 28 Nov 2002, Hilmar Lapp wrote:
> > >>>
> > >>>> The input file needs to be the LL_tmpl or in that format. Does your
> > >>>> input file satisfy this? (NCBI releases several files for LL. Many
> > >>>> are in tab-format; the LL_tmpl format is a tagged-line format.)
> > >>>>
> > >>>> -hilmar
> > >>>>
> > >>>> On Thursday, November 28, 2002, at 03:05 PM, Paul Boutros wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I'm using the LocusLink SeqIO parser with a download of
> > >>> LocusLink from
> > >>>>> NCBI today (LL3_021128.txt). When I just parse through
> > >>> the file doing
> > >>>>> nothing except for checking the organism annotation with:
> > >>>>>
> > >>>>> use Bio::SeqIO;
> > >>>>>
> > >>>>> my $seqio = Bio::SeqIO->new(
> > >>>>> -format => 'locuslink',
> > >>>>> -file => 'LL3_021128.txt'
> > >>>>> );
> > >>>>>
> > >>>>> while (my $acc = $seqio->next_seq()->annotation()) {
> > >>>>> if ($acc->get_Annotations('ORGANISM') =~ /rattus norvegicus/i) {
> > >>>>> print "Rat!\n";
> > >>>>> }
> > >>>>> }
> > >>>>>
> > >>>>> I get:
> > >>>>>
> > >>>>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>>>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0>
> > >>> chunk 15465.
> > >>>>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>>>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0>
> > >>> chunk 15595.
> > >>>>>
> > >>>>>
> > >>>>> ------------- EXCEPTION -------------
> > >>>>> MSG: No LOCUSID in first line of record. Not LocusLink in my book.
> > >>>>> STACK Bio::SeqIO::locuslink::next_seq
> > >>>>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm:435
> > >>>>> STACK toplevel testLL.pl:10
> > >>>>> --------------------------------------
> > >>>>>
> > >>>>> Interestingly enough I also don't get any output from the
> > >>> ORGANISM
> > >>>>> check,
> > >>>>> so I must be doing that wrong, too. I notice that the thing
> > >>>>> processes a
> > >>>>> fair chunk of time before spitting out the two "Deep recursion"
> > >>>>> warnings,
> > >>>>> and then a fair bit longer before hitting the exception.
> > >>>>>
> > >>>>> Any ideas if I'm doing something unusual, or if maybe I should
> > >>>>> submit this
> > >>>>> as a bug report?
> > >>>>>
> > >>>>> Paul
> > >>>>>
> > >>>>> OS: Win XP SP 1 and Win2K SP2
> > >>>>> Perl: 5.8.0 and 5.6.1
> > >>>>> BioPerl: CVS yesterday
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Bioperl-l mailing list
> > >>>>> Bioperl-l@bioperl.org
> > >>>>> http://bioperl.org/mailman/listinfo/bioperl-l
> > >>>>>
> > >>>> --
> > >>>> -------------------------------------------------------------
> > >>>> Hilmar Lapp email: lapp at gnf.org
> > >>>> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> > >>>> -------------------------------------------------------------
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> > >
> > >
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>