[Bioperl-l] Bug in genbank parsing: CONTIG gaps

Chris Fields cjfields at uiuc.edu
Thu May 4 22:27:57 UTC 2006


Here's another odd bit.  This is what I get for the CONTIG line when I
passed a simple contig file (NW_925062, with one join) through Bio::SeqIO:

-----------------------------------
....
FEATURES             Location/Qualifiers
     source          1..8541
                     /db_xref="taxon:9606"
                     /mol_type="genomic DNA"
                     /chromosome="11"
                     /organism="Homo sapiens"
CONTIG      AADB02014027.1:1..8541

//
-----------------------------------
Here's the original:
-----------------------------------
FEATURES             Location/Qualifiers
     source          1..8541
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014027.1:1..8541)
//
-----------------------------------

Looks like it lopped out the 'join' here as well.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, May 04, 2006 1:41 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> Are you using the CONTIG record or the full GenBank file? 	I see
> problems with both (using bioperl-live) which seem unrelated to one
> another.
> The full file seems to be running a bit slow b/c the full GenBank record
> is
> huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
> memory).
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > Sent: Tuesday, May 02, 2006 10:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> >
> > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> > genbank
> > files that contain CONTIG entries with gaps.  One such record is
> > NW_925173.
> >
> > When I try to parse this file using Bio::SeqIO::genbank, it will enter
> an
> > infinite loop and spin until it runs out of memory.
> >
> > I'm pretty certain it relates to this bug:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> > that
> > genbank records with CONTIG gaps are not valid and can't be parsed.  But
> > this
> > bug actually claims to be fixed, which is strange, since looking at the
> > code for
> > FTLocationFactory (where the loop is) it's still right there.  I assume
> > that
> > this may be fixed in other contexts but is still not fixed in
> > Bio::SeqIO::genbank?  Or am I doing something wrong?
> >
> > I think that this should probably be filed as an open bug.  I would
> think
> > that
> > even if bioperl isn't interested in parsing this type of file via SeqIO,
> > certainly you'd want to ensure that no finite input file would send the
> > parser
> > into an infinite loop.  Have others encountered this problem?  Is there
> > any plan
> > to address it?
> >
> > Thanks very much for any information or help!
> >
> > -Mike
> >
> > P.S.  I've played around with my version of FTLocationFactory and it
> seems
> > to
> > actually work and parse the gaps.  I'm not sure if I've created other
> bugs
> > or if
> > it works in all cases, but at least the parser doesn't die.  I also
> don't
> > know
> > that my hacky code is appropriate for putting back in to BioPerl, but
> I'm
> > happy
> > to provide it if someone wants to check it out and/or consider it for
> > checkin.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list