[Bioperl-l] One more odd little parsing problem for your list

Hilmar Lapp hlapp at gnf.org
Sun Jun 1 01:32:47 EDT 2003


Ouch, a space in the locus identifier. Why do they do this to us.

I'm afraid I have to kludge this. We can't afford the genbank parser 
dying on Genbank's random pranks. Isn't this fun.

	-hilmar

On Saturday, May 31, 2003, at 12:00  PM, Michael Muratet wrote:

> Greetings
>
> I was parsing CDS features in Refseq human (hs.gbff.gz) when it died on
> PSMAL/GCP III (NM_153696). The CDS was 527..1855. The error was from
> Bio::PrimarySeq::subseq 'You have to start positive....sequence
> [527:1855] Total III'. The parser in SeqIO is picking up the length 
> from
> the LOCUS line (as I recall) and for this record it sees 'III' and not
> '1992' bp. It seems a lot to ask Bioperl to figure out every possible
> configuration, maybe Genbank needs to have rules about whitespace in
> gene names?
>
> Cheers
>
> Mike
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list