[Bioperl-l] SeqIO alters Genbank files

Chris Fields cjfields at illinois.edu
Thu Aug 25 16:42:30 UTC 2011


Brian,

I think comment out the code; our baked-in validation is only half-correct anyway, and I think it's probably a good idea to veer towards separation of format validation and parsing (they're two related but different concerns).

To tell the truth, I think we should eschew using FTHelper altogether and just use a Bio::SeqFeatureI-based class directly.  I haven't quite grasped the reasoning behind FTHelper.pm, and I would bet removing it as a middleman across the board would help parsing speed.  Anyone have an objection to that, or at least an explanation for generation of tons of FTHelper instances that couldn't be handled by a Factory?

chris

On Aug 25, 2011, at 9:35 AM, Brian Osborne wrote:

> bioperl-l,
> 
> I need to run something by you before I commit code and tests. I have code that takes a Genbank file as input and creates another Genbank file as output. I noticed that SeqIO - specifically FTHelper.pm - was taking a tag like this in the input file:
> 
> /score=100.1
> 
> And adding a "note" tag, so the output file contains this:
> 
> /score=100.1
> /note="score=100.1"
> 
> I'm assuming that the code does this because NCBI will not accept score tags and values even though Bioperl, generally speaking, does not say that NCBI defines the fine details of Genbank format. 
> 
> On the other hand I don't like the idea that SeqIO is altering the content. It also turns out that if you have code that does multiple round-trips you end up with text like this:
> 
> /score=100.1
> /note="score=100.1"
> /note="score=100.1"
> /note="score=100.1"
> /note="score=100.1"
> 
> Should I comment out the code that's doing these edits or not?
> 
> Thanks again,
> 
> Brian O.
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list