[Bioperl-l] SeqIO embl parser bug?
Sam Griffiths-Jones
sgj@sanger.ac.uk
Thu, 17 Oct 2002 17:23:43 +0100 (BST)
Eeek -- just been bitten badly by this one.
<confession> We in Team Pfam are stuck with an old version of bioperl
for legacy reasons (not sure why but this must be Ewan's fault :)
</confession>, but after a quick cvs update it seems that bioperl-live
still has the same behaviour. Apologies if I'm wrong and this has been
fixed.
Anyway -- embl parser does:
#accession number
if( /^AC\s+(.*)?/ ) {
my @accs = split(/[; ]+/, $1); # allow space in addition
$params{'-accession_number'} = shift @accs;
$params{'-secondary_accessions'} = \@accs;
}
This gets it wrong when there's more than one AC line - eg:
ID ECAPAH02 standard; DNA; PRO; 111408 BP.
XX
AC D10483; J01597; J01683; J01706; K01298; K01990; M10420; M10611; M12544;
AC V00259; X04711; X54847; X54945; X55034; X56742;
XX
SV D10483.2
..
The primary accession gets called as V00259, with 5 secondary
accessions. This is particularly nasty in this case as there's
another EMBL entry with primary id V00259 and different sequence .....
:(
Sam
--------------------------------------------------------------------
Sam Griffiths-Jones sgj@sanger.ac.uk
http://www.sanger.ac.uk/Users/sgj +44 (0)1223 834244
Wisdom #4885: It's always darkest before dawn, so if you're going
to steal your neighbour's newspaper, that's the time to do it.
--------------------------------------------------------------------