[Bioperl-l] Bug in genbank parsing: CONTIG gaps
Chris Fields
cjfields at uiuc.edu
Thu May 4 22:27:57 UTC 2006
Here's another odd bit. This is what I get for the CONTIG line when I
passed a simple contig file (NW_925062, with one join) through Bio::SeqIO:
-----------------------------------
....
FEATURES Location/Qualifiers
source 1..8541
/db_xref="taxon:9606"
/mol_type="genomic DNA"
/chromosome="11"
/organism="Homo sapiens"
CONTIG AADB02014027.1:1..8541
//
-----------------------------------
Here's the original:
-----------------------------------
FEATURES Location/Qualifiers
source 1..8541
/organism="Homo sapiens"
/mol_type="genomic DNA"
/db_xref="taxon:9606"
/chromosome="11"
CONTIG join(AADB02014027.1:1..8541)
//
-----------------------------------
Looks like it lopped out the 'join' here as well.
Chris
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, May 04, 2006 1:41 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>
> Are you using the CONTIG record or the full GenBank file? I see
> problems with both (using bioperl-live) which seem unrelated to one
> another.
> The full file seems to be running a bit slow b/c the full GenBank record
> is
> huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
> memory).
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > Sent: Tuesday, May 02, 2006 10:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> >
> > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> > genbank
> > files that contain CONTIG entries with gaps. One such record is
> > NW_925173.
> >
> > When I try to parse this file using Bio::SeqIO::genbank, it will enter
> an
> > infinite loop and spin until it runs out of memory.
> >
> > I'm pretty certain it relates to this bug:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> > that
> > genbank records with CONTIG gaps are not valid and can't be parsed. But
> > this
> > bug actually claims to be fixed, which is strange, since looking at the
> > code for
> > FTLocationFactory (where the loop is) it's still right there. I assume
> > that
> > this may be fixed in other contexts but is still not fixed in
> > Bio::SeqIO::genbank? Or am I doing something wrong?
> >
> > I think that this should probably be filed as an open bug. I would
> think
> > that
> > even if bioperl isn't interested in parsing this type of file via SeqIO,
> > certainly you'd want to ensure that no finite input file would send the
> > parser
> > into an infinite loop. Have others encountered this problem? Is there
> > any plan
> > to address it?
> >
> > Thanks very much for any information or help!
> >
> > -Mike
> >
> > P.S. I've played around with my version of FTLocationFactory and it
> seems
> > to
> > actually work and parse the gaps. I'm not sure if I've created other
> bugs
> > or if
> > it works in all cases, but at least the parser doesn't die. I also
> don't
> > know
> > that my hacky code is appropriate for putting back in to BioPerl, but
> I'm
> > happy
> > to provide it if someone wants to check it out and/or consider it for
> > checkin.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list