[Bioperl-l] SeqIO out of memory

Brian Osborne brian_osborne at cognia.com
Mon Mar 3 12:24:57 EST 2003


Ewan,

Here's the relevant section:

     misc_feature    1553..1564
                     /gene="aap-1"
                     /note="Region: [PSORT] ER membrane domain: KSIP"
     misc_feature    bond(174,175)
                     /gene="aap-1"
                     /note="Intron length 47 bp, type gt_ag"
     exon            175..434
                     /gene="aap-1"
                     /note="Exon 2 length 260 bp"
     misc_feature    bond(434,435)
                     /gene="aap-1"
                     /note="Intron length 181 bp, type gt_ag"
     exon            435..700
                     /gene="aap-1"
                     /note="Exon 3 length 266 bp"
     misc_feature    bond(700,701)
                     /gene="aap-1"
                     /note="Intron length 227 bp, type gt_ag"
     exon            701..1132
                     /gene="aap-1"
                     /note="Exon 4 length 432 bp"
     misc_feature    bond(1132,1133)
                     /gene="aap-1"
                     /note="Intron length 80 bp, type gt_ag"
     exon            1133..1802
                     /gene="aap-1"
                     /note="Exon 5 length 670 bp"
     3'UTR           1571..1802
                     /gene="aap-1"
                     /evidence=experimental
     polyA_signal    1789..1794
                     /gene="aap-1"
                     /note="variant  attaaa"
BASE COUNT      560 a    368 c    355 g    519 t
ORIGIN
        1 aatgagcaca acacctggaa ctcctcatgg agtcactcat agtcttatgg aacaaggatg


Yes, it should not be there but there it is. I'm guessing that the thought
is that "bond" is useful for mRNAs, indicating the positions of the exons on
the sequence. In the RefSeq file I have there are >100,000 entries and some
7,000 misc_feature's with "bond", all of them <num>,<num>, not surprisingly.
The documentation you've provided shows that the general format
(<num>,<num>) is a valid one, whether to handle "bond" is another
question....

Brian O.


-----Original Message-----
From: bioperl-l-bounces at bioperl.org [mailto:bioperl-l-bounces at bioperl.org]On
Behalf Of Ewan Birney
Sent: Monday, March 03, 2003 11:20 AM
To: Brian Osborne
Cc: Hilmar Lapp; Bioperl
Subject: RE: [Bioperl-l] SeqIO out of memory

On Mon, 3 Mar 2003, Brian Osborne wrote:

> Ewan and Hilmar,
> >> Aha. This a better explanation. ;)
>
> Not so fast there my friends, I now have a _real_ bug!    ;-)
> Check this one out:
> ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> Bio::SeqIO->new(-file=>"aap-1.gb",-
> format => "genbank" ); open MYOUT,">aap-1.fa"; while ( $seq =
> $in->next_seq ){ p
> rint MYOUT ">",$seq->accession_number,"\n",$seq->seq,"\n"; }'
>
> -------------------- WARNING ---------------------
> MSG: exception while parsing location line [bond(174,175)] in reading
> EMBL/GenBa
> nk/SwissProt, ignoring feature misc_feature (seqid=aap-1):
> ------------- EXCEPTION  -------------
> MSG: operator "bond" unrecognized by parser


Well... is this a bug?


http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#op
erators

bond is not in there. Can you send the entire line?





> STACK Bio::Factory::FTLocationFactory::from_string
> /usr/lib/perl5/site_perl/5.8.
> 0/Bio/Factory/FTLocationFactory.pm:160
> STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:124
> STACK Bio::SeqIO::FTHelper::_generic_seqfeature
> /usr/lib/perl5/site_perl/5.8.0/B
> io/SeqIO/FTHelper.pm:123
> STACK Bio::SeqIO::genbank::next_seq
> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/gen
> bank.pm:394
> STACK toplevel -e:1
>
> So Factory/FTLocationFactory only wants to see the operators "complement",
> "join", or "order" in Genbank files, but I have a Genbank file with "bond"
> in that same position (NM_059121, C. elegans aap-1):
>
> ~/data/refseq>grep bond aap-1.gb
>      misc_feature    bond(174,175)
>      misc_feature    bond(434,435)
>      misc_feature    bond(700,701)
>      misc_feature    bond(1132,1133)
>
> It's easy to change line 160 in FTLocationFactory from
> "} elsif(($op eq "join") || ($op eq "order") ) {"
> to
> "} elsif(($op eq "join") || ($op eq "order") ) || ($op eq "bond")) {
> and now the file is parsed without complaint, but does SplitLocation
> correctly handle "(<num>,<num>)" in addition to
> "(<num>..<num>,<num>..<num>)"?
>
> Brian O.
>
>
>
> >       -hilmar
> >
> > On Friday, February 28, 2003, at 01:06  PM, Brian Osborne wrote:
> >
> > > Bioperl-l,
> > > Check out this one-liner, where the input file is rscu.gbff, a
> > > Genbank-formatted file with 111,220 entries. The fasta file that's
made
> > > contains only 42,451 entries. Is "Out of memory" the expected result
> > > for an
> > > input file this size?
> > >
> > > ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> > > Bio::SeqIO->new(-file=>"rscu.gbff",
> > > -format=>"genbank"); open MYOUT,">rscu.fa"; while ( $seq =
> > > in->next_seq ){ print
> > >  MYOUT ">" . $seq->accession_number . "\n" . $seq->seq . "\n"; }'
> > >
> > > Out of memory during "large" request for 33558528 bytes, total sbrk()
> > > is
> > > 3822837
> > > 76 bytes at /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/RichSeq.pm line
114,
> > > <GEN0> l
> > > ine 6433958.
> > >
> > > Brian O.
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
>
>

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney at ebi.ac.uk>.
-----------------------------------------------------------------

_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list