[Bioperl-l] Memory requirements for conversion from embl to genbank
Sendu Bala
bix at sendu.me.uk
Thu Aug 31 17:34:44 UTC 2006
Chris Fields wrote:
> Sendu, Martin,
>
> This has been the problem with these particular example sequences. The
> issue is that they do NOT conform to the EMBL standard or any sane sequence
> format standard. Not that we stick to a standard vehemently ourselves, but
> we expect some sane formatting. IMHO, (as I have repeatedly stated) we
> should not be responsible for trying to 'fix' broken sequence formats unless
> it is sanely possible and doesn't degrade performance/quality.
>
> Saying that, I do believe we should at the least have a warning or throw the
> appropriate error. So if duplicate species are present, shouldn't there be
> a thrown error?
Bio::DB::Taxonomy::list should have been throwing an error before; it
does now. It would be nicer really if embl.pm stopped adding to the
classification array when it finds the end of one species
classification, but then it's just guessing about how broken one
particular file is.
I think the throw is good enough, let the user correct the file.
More information about the Bioperl-l
mailing list