[Bioperl-l] LocusLink IO

Allen Day allenday@ucla.edu
Tue, 3 Dec 2002 02:59:02 -0800 (PST)


looks like a \r\n problem.  did you happen to untar this file on a windows
or non-os-x-mac box, or do something else with it on one of these
platforms?  i got nipped by this problem using winzip to decompress a
.tar.gz once.  when i transferred the contents (a bunch of c source files)  
over to a linux box, the compiler had problems reading them b/c the
newlines were mangled.

i would try doing the whole decompress/test process on a linux box, or see
if you can find a way to do a binary decompress on the files if you're
stuck doing it on old-mac/windows. good luck.

-ad

On Mon, 2 Dec 2002, Paul Boutros wrote:

> I followed the suggestion (I think from Allan Day) of extracting &
> diff'ing record 27 from the file.  This is what I got:
> 
> ===========================
> pcboutro@engmail[5] diff testLL.txt LL-sample.seq | more
> 43a44
> > UNIGENE: Hs.75741
> 46d46
> < UNIGENE: Hs.75741
> 52a53,54
> > BUTTON: homol.gif
> > LINK:
> http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=26[loc]&TAXID
> =9606
> 133a136
> > UNIGENE: Hs.121521
> 136d138
> < UNIGENE: Hs.121521
> 144a147,148
> > BUTTON: homol.gif
> > LINK:
> http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=27[loc]&TAXID
> =9606
> ==========================
> 
> The file does indeed terminate with a >> but I didn't see any empty lines
> after that.  I'll submit this as a bug report along with everything I've
> tested so far.
> 
> Paul
> 
> On Mon, 2 Dec 2002, Hilmar Lapp wrote:
> 
> > Maybe an end of file (recognition-) problem. Could be pretty simple. 
> > If you visit the end of your offending input file, are there strange 
> > things or excessive empty lines? Does it terminate with a record 
> > delimiter (>>)?
> > 
> > I may not get a chance to investigate this before Wednesday. Can you 
> > submit it as a bug report to make sure it's in the queue?
> > 
> > 	-hilmar
> > 
> > On Friday, November 29, 2002, at 02:52 PM, Paul Boutros wrote:
> > 
> > > Hi again,
> > >
> > > I don't encounter any problems parsing the test file:
> > > t\data\LL-sample.seq
> > >
> > > If I run the LocusLink test
> > > c:\perl\bioperl-live> perl -w t\LocusLink.t
> > > 1..23
> > > ok 1
> > > ok 2
> > > Use of uninitialized value in pattern match (m//) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 384, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 600, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 603, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 604, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 605, <GEN0> line 2.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 606, <GEN0> line 2.
> > > Use of uninitialized value in length at 
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 618, <GEN0> line 2.
> > > Use of uninitialized value in substr at 
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 633, <GEN0> line 2.
> > > Use of uninitialized value in pattern match (m//) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 384, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 600, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 603, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 604, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 605, <GEN0> line 3.
> > > Use of uninitialized value in transliteration (tr///) at
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm line 606, <GEN0> line 3.
> > > Use of uninitialized value in length at 
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 618, <GEN0> line 3.
> > > Use of uninitialized value in substr at 
> > > C:/Perl/site/lib/Bio\SeqIO\embl.pm
> > > line 633, <GEN0> line 3.
> > > ok 3
> > >
> > > and okay through the rest of the tests.
> > >
> > > Visually the two files look very similar, and there are no obvious
> > > formatting differences.  And it does take quite a few seconds of 
> > > running
> > > before the two "Deep Recursion" warnings come up, then a few more 
> > > before I
> > > get the exception.
> > >
> > > When I run:
> > >
> > > use Bio::SeqIO;
> > > use strict;
> > > my $file = $ARGV[0];
> > > my $seqio = Bio::SeqIO->new(
> > > 			-format	=> 'locuslink',
> > > 			-file	=> $file
> > > 			);
> > >
> > > while (my $seq = $seqio->next_seq()) {
> > > 	my $acc = $seq->annotation();
> > > 	print $seq->accession(), "\n";
> > > 	}
> > >
> > > The two deep recursion warnings come as:
> > >
> > > 15601
> > > Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > > C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15465.
> > > 15731
> > > Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > > C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15595.
> > > 15874
> > > 15890
> > >
> > > And the exception is thrown as:
> > > 24785
> > > 24786
> > > 24787
> > >
> > > ------------- EXCEPTION  -------------
> > > MSG: No LOCUSID in first line of record. Not LocusLink in my book.
> > > STACK Bio::SeqIO::locuslink::next_seq
> > > C:/Perl/site/lib/Bio\SeqIO\locuslink.pm:435
> > > STACK toplevel testLL.pl:11
> > >
> > > --------------------------------------
> > >
> > > On Fri, 29 Nov 2002, Hilmar Lapp wrote:
> > >
> > >> I will check what's happening. There is a test case and a sample 
> > >> file in the repository; while you're at it, do you see what the 
> > >> fundamental difference is between the sample and the your input 
> > >> file? (Or does the test fail as well for you?)
> > >>
> > >> 	-hilmar
> > >>
> > >>> -----Original Message-----
> > >>> From: Paul Boutros [mailto:pcboutro@engmail.uwaterloo.ca]
> > >>> Sent: Friday, November 29, 2002 9:21 AM
> > >>> To: Hilmar Lapp
> > >>> Cc: bioperl-l@bioperl.org
> > >>> Subject: Re: [Bioperl-l] LocusLink IO
> > >>>
> > >>>
> > >>> I tried again on today's (11/29/2002) LL_tmpl file and same error:
> > >>>
> > >>> C:\paul\dev\LocusLink>perl -w testLL.pl
> > >>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15465.
> > >>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0> chunk 15595.
> > >>>
> > >>> ------------- EXCEPTION  -------------
> > >>> MSG: No LOCUSID in first line of record. Not LocusLink in my book.
> > >>> STACK Bio::SeqIO::locuslink::next_seq
> > >>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm:435
> > >>> STACK toplevel testLL.pl:8
> > >>>
> > >>> --------------------------------------
> > >>>
> > >>>
> > >>> On Thu, 28 Nov 2002, Hilmar Lapp wrote:
> > >>>
> > >>>> The input file needs to be the LL_tmpl or in that format. Does your
> > >>>> input file satisfy this? (NCBI releases several files for LL. Many
> > >>>> are in tab-format; the LL_tmpl format is a tagged-line format.)
> > >>>>
> > >>>> 	-hilmar
> > >>>>
> > >>>> On Thursday, November 28, 2002, at 03:05 PM, Paul Boutros wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I'm using the LocusLink SeqIO parser with a download of
> > >>> LocusLink from
> > >>>>> NCBI today (LL3_021128.txt).  When I just parse through
> > >>> the file doing
> > >>>>> nothing except for checking the organism annotation with:
> > >>>>>
> > >>>>> use Bio::SeqIO;
> > >>>>>
> > >>>>> my $seqio = Bio::SeqIO->new(
> > >>>>> 			-format	=> 'locuslink',
> > >>>>> 			-file	=> 'LL3_021128.txt'
> > >>>>> 			);
> > >>>>>
> > >>>>> while (my $acc = $seqio->next_seq()->annotation()) {
> > >>>>> 	if ($acc->get_Annotations('ORGANISM') =~ /rattus norvegicus/i) {
> > >>>>> 		print "Rat!\n";
> > >>>>> 		}
> > >>>>> 	}
> > >>>>>
> > >>>>> I get:
> > >>>>>
> > >>>>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>>>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0>
> > >>> chunk 15465.
> > >>>>> Deep recursion on subroutine "Bio::SeqIO::locuslink::next_seq" at
> > >>>>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm line 456, <GEN0>
> > >>> chunk 15595.
> > >>>>>
> > >>>>>
> > >>>>> ------------- EXCEPTION  -------------
> > >>>>> MSG: No LOCUSID in first line of record. Not LocusLink in my book.
> > >>>>> STACK Bio::SeqIO::locuslink::next_seq
> > >>>>> C:/Perl/site/lib/Bio\SeqIO\locuslink.pm:435
> > >>>>> STACK toplevel testLL.pl:10
> > >>>>> --------------------------------------
> > >>>>>
> > >>>>> Interestingly enough I also don't get any output from the
> > >>> ORGANISM
> > >>>>> check,
> > >>>>> so I must be doing that wrong, too.  I notice that the thing
> > >>>>> processes a
> > >>>>> fair chunk of time before spitting out the two "Deep recursion"
> > >>>>> warnings,
> > >>>>> and then a fair bit longer before hitting the exception.
> > >>>>>
> > >>>>> Any ideas if I'm doing something unusual, or if maybe I should
> > >>>>> submit this
> > >>>>> as a bug report?
> > >>>>>
> > >>>>> Paul
> > >>>>>
> > >>>>> OS: Win XP SP 1 and Win2K SP2
> > >>>>> Perl: 5.8.0 and 5.6.1
> > >>>>> BioPerl: CVS yesterday
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Bioperl-l mailing list
> > >>>>> Bioperl-l@bioperl.org
> > >>>>> http://bioperl.org/mailman/listinfo/bioperl-l
> > >>>>>
> > >>>> --
> > >>>> -------------------------------------------------------------
> > >>>> Hilmar Lapp                            email: lapp at gnf.org
> > >>>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > >>>> -------------------------------------------------------------
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> > >
> > >
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> > 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>