[Bioperl-l] SeqIO embl parser bug?

Jason Stajich jason@cgt.mc.duke.edu
Thu, 17 Oct 2002 14:43:33 -0400 (EDT)


On Thu, 17 Oct 2002, Hilmar Lapp wrote:

> Thanks for spotting this. I know from combing through the genbank parser that there were (hopefully aren't any more or at least are less :) lurking bugs due to multi-line tags assumed to be single line. The embl parser would have been next on my agenda, but haven't reached it yet.
>
> Just to make sure it doesn't get lost can you submit this to our bugzilla? (Jason, that is up I believe, right?)
>
Si - http://bugzilla.bioperl.org


> 	-hilmar
>
> > -----Original Message-----
> > From: Sam Griffiths-Jones [mailto:sgj@sanger.ac.uk]
> > Sent: Thursday, October 17, 2002 9:24 AM
> > To: bioperl-l@bioperl.org
> > Subject: [Bioperl-l] SeqIO embl parser bug?
> >
> >
> >
> > Eeek -- just been bitten badly by this one.
> >
> > <confession> We in Team Pfam are stuck with an old version of bioperl
> > for legacy reasons (not sure why but this must be Ewan's fault :)
> > </confession>, but after a quick cvs update it seems that bioperl-live
> > still has the same behaviour. Apologies if I'm wrong and this has been
> > fixed.
> >
> > Anyway -- embl parser does:
> >
> >        #accession number
> >        if( /^AC\s+(.*)?/ ) {
> >            my @accs = split(/[; ]+/, $1); # allow space in addition
> >            $params{'-accession_number'} = shift @accs;
> >            $params{'-secondary_accessions'} = \@accs;
> >        }
> >
> > This gets it wrong when there's more than one AC line - eg:
> >
> > ID   ECAPAH02   standard; DNA; PRO; 111408 BP.
> > XX
> > AC   D10483; J01597; J01683; J01706; K01298; K01990; M10420;
> > M10611; M12544;
> > AC   V00259; X04711; X54847; X54945; X55034; X56742;
> > XX
> > SV   D10483.2
> > ..
> >
> > The primary accession gets called as V00259, with 5 secondary
> > accessions.  This is particularly nasty in this case as there's
> > another EMBL entry with primary id V00259 and different sequence .....
> > :(
> >
> > Sam
> >
> > --------------------------------------------------------------------
> > Sam Griffiths-Jones                              sgj@sanger.ac.uk
> > http://www.sanger.ac.uk/Users/sgj                +44 (0)1223 834244
> >
> > Wisdom #4885:  It's always darkest before dawn, so if you're going
> > to steal your neighbour's newspaper, that's the time to do it.
> > --------------------------------------------------------------------
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu