[Bioperl-l] Bug in genbank parsing: CONTIG gaps
Hilmar Lapp
hlapp at gmx.net
Thu May 4 22:39:05 UTC 2006
The two notations are equivalent and syntactically correct, or so I
believe ... I don't think 100% verbatim preservation should be the
goal. Or am I missing the point?
On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> Here's another odd bit. This is what I get for the CONTIG line when I
> passed a simple contig file (NW_925062, with one join) through
> Bio::SeqIO:
>
> -----------------------------------
> ....
> FEATURES Location/Qualifiers
> source 1..8541
> /db_xref="taxon:9606"
> /mol_type="genomic DNA"
> /chromosome="11"
> /organism="Homo sapiens"
> CONTIG AADB02014027.1:1..8541
>
> //
> -----------------------------------
> Here's the original:
> -----------------------------------
> FEATURES Location/Qualifiers
> source 1..8541
> /organism="Homo sapiens"
> /mol_type="genomic DNA"
> /db_xref="taxon:9606"
> /chromosome="11"
> CONTIG join(AADB02014027.1:1..8541)
> //
> -----------------------------------
>
> Looks like it lopped out the 'join' here as well.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, May 04, 2006 1:41 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>
>> Are you using the CONTIG record or the full GenBank file? I see
>> problems with both (using bioperl-live) which seem unrelated to one
>> another.
>> The full file seems to be running a bit slow b/c the full GenBank
>> record
>> is
>> huge (~55 MB) but the CONTIG file does exactly what you said (runs
>> out of
>> memory).
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
>>> Sent: Tuesday, May 02, 2006 10:32 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>>
>>>
>>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
>>> certain
>>> genbank
>>> files that contain CONTIG entries with gaps. One such record is
>>> NW_925173.
>>>
>>> When I try to parse this file using Bio::SeqIO::genbank, it will
>>> enter
>> an
>>> infinite loop and spin until it runs out of memory.
>>>
>>> I'm pretty certain it relates to this bug:
>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
>>> indicate
>>> that
>>> genbank records with CONTIG gaps are not valid and can't be
>>> parsed. But
>>> this
>>> bug actually claims to be fixed, which is strange, since looking
>>> at the
>>> code for
>>> FTLocationFactory (where the loop is) it's still right there. I
>>> assume
>>> that
>>> this may be fixed in other contexts but is still not fixed in
>>> Bio::SeqIO::genbank? Or am I doing something wrong?
>>>
>>> I think that this should probably be filed as an open bug. I would
>> think
>>> that
>>> even if bioperl isn't interested in parsing this type of file via
>>> SeqIO,
>>> certainly you'd want to ensure that no finite input file would
>>> send the
>>> parser
>>> into an infinite loop. Have others encountered this problem? Is
>>> there
>>> any plan
>>> to address it?
>>>
>>> Thanks very much for any information or help!
>>>
>>> -Mike
>>>
>>> P.S. I've played around with my version of FTLocationFactory and it
>> seems
>>> to
>>> actually work and parse the gaps. I'm not sure if I've created
>>> other
>> bugs
>>> or if
>>> it works in all cases, but at least the parser doesn't die. I also
>> don't
>>> know
>>> that my hacky code is appropriate for putting back in to BioPerl,
>>> but
>> I'm
>>> happy
>>> to provide it if someone wants to check it out and/or consider it
>>> for
>>> checkin.
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list