[Bioperl-l] Bug in genbank parsing: CONTIG gaps

Hilmar Lapp hlapp at gmx.net
Thu May 4 22:39:05 UTC 2006


The two notations are equivalent and syntactically correct, or so I  
believe ... I don't think 100% verbatim preservation should be the  
goal. Or am I missing the point?

On May 4, 2006, at 6:27 PM, Chris Fields wrote:

> Here's another odd bit.  This is what I get for the CONTIG line when I
> passed a simple contig file (NW_925062, with one join) through  
> Bio::SeqIO:
>
> -----------------------------------
> ....
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /db_xref="taxon:9606"
>                      /mol_type="genomic DNA"
>                      /chromosome="11"
>                      /organism="Homo sapiens"
> CONTIG      AADB02014027.1:1..8541
>
> //
> -----------------------------------
> Here's the original:
> -----------------------------------
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG      join(AADB02014027.1:1..8541)
> //
> -----------------------------------
>
> Looks like it lopped out the 'join' here as well.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, May 04, 2006 1:41 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>
>> Are you using the CONTIG record or the full GenBank file? 	I see
>> problems with both (using bioperl-live) which seem unrelated to one
>> another.
>> The full file seems to be running a bit slow b/c the full GenBank  
>> record
>> is
>> huge (~55 MB) but the CONTIG file does exactly what you said (runs  
>> out of
>> memory).
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
>>> Sent: Tuesday, May 02, 2006 10:32 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>>
>>>
>>> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
>>> certain
>>> genbank
>>> files that contain CONTIG entries with gaps.  One such record is
>>> NW_925173.
>>>
>>> When I try to parse this file using Bio::SeqIO::genbank, it will  
>>> enter
>> an
>>> infinite loop and spin until it runs out of memory.
>>>
>>> I'm pretty certain it relates to this bug:
>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
>>> indicate
>>> that
>>> genbank records with CONTIG gaps are not valid and can't be  
>>> parsed.  But
>>> this
>>> bug actually claims to be fixed, which is strange, since looking  
>>> at the
>>> code for
>>> FTLocationFactory (where the loop is) it's still right there.  I  
>>> assume
>>> that
>>> this may be fixed in other contexts but is still not fixed in
>>> Bio::SeqIO::genbank?  Or am I doing something wrong?
>>>
>>> I think that this should probably be filed as an open bug.  I would
>> think
>>> that
>>> even if bioperl isn't interested in parsing this type of file via  
>>> SeqIO,
>>> certainly you'd want to ensure that no finite input file would  
>>> send the
>>> parser
>>> into an infinite loop.  Have others encountered this problem?  Is  
>>> there
>>> any plan
>>> to address it?
>>>
>>> Thanks very much for any information or help!
>>>
>>> -Mike
>>>
>>> P.S.  I've played around with my version of FTLocationFactory and it
>> seems
>>> to
>>> actually work and parse the gaps.  I'm not sure if I've created  
>>> other
>> bugs
>>> or if
>>> it works in all cases, but at least the parser doesn't die.  I also
>> don't
>>> know
>>> that my hacky code is appropriate for putting back in to BioPerl,  
>>> but
>> I'm
>>> happy
>>> to provide it if someone wants to check it out and/or consider it  
>>> for
>>> checkin.
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list