[Bioperl-l] SeqIO embl parser bug?

Hilmar Lapp hlapp@gnf.org
Thu, 17 Oct 2002 10:55:41 -0700


Thanks for spotting this. I know from combing through the genbank parser that there were (hopefully aren't any more or at least are less :) lurking bugs due to multi-line tags assumed to be single line. The embl parser would have been next on my agenda, but haven't reached it yet.

Just to make sure it doesn't get lost can you submit this to our bugzilla? (Jason, that is up I believe, right?)

	-hilmar

> -----Original Message-----
> From: Sam Griffiths-Jones [mailto:sgj@sanger.ac.uk]
> Sent: Thursday, October 17, 2002 9:24 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] SeqIO embl parser bug?
> 
> 
> 
> Eeek -- just been bitten badly by this one.
> 
> <confession> We in Team Pfam are stuck with an old version of bioperl
> for legacy reasons (not sure why but this must be Ewan's fault :)
> </confession>, but after a quick cvs update it seems that bioperl-live
> still has the same behaviour. Apologies if I'm wrong and this has been
> fixed.
> 
> Anyway -- embl parser does:
> 
>        #accession number
>        if( /^AC\s+(.*)?/ ) {
>            my @accs = split(/[; ]+/, $1); # allow space in addition
>            $params{'-accession_number'} = shift @accs;
>            $params{'-secondary_accessions'} = \@accs;
>        }
> 
> This gets it wrong when there's more than one AC line - eg:
> 
> ID   ECAPAH02   standard; DNA; PRO; 111408 BP.
> XX
> AC   D10483; J01597; J01683; J01706; K01298; K01990; M10420; 
> M10611; M12544;
> AC   V00259; X04711; X54847; X54945; X55034; X56742;
> XX
> SV   D10483.2
> ..
> 
> The primary accession gets called as V00259, with 5 secondary
> accessions.  This is particularly nasty in this case as there's
> another EMBL entry with primary id V00259 and different sequence .....
> :(
> 
> Sam
> 
> --------------------------------------------------------------------
> Sam Griffiths-Jones                              sgj@sanger.ac.uk
> http://www.sanger.ac.uk/Users/sgj                +44 (0)1223 834244
> 
> Wisdom #4885:  It's always darkest before dawn, so if you're going
> to steal your neighbour's newspaper, that's the time to do it.
> --------------------------------------------------------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>