[BioPython] Cannot parse ApE plasmid editor GenBank file

Chris Fields cjfields at uiuc.edu
Fri Jun 8 12:57:55 UTC 2007


On Jun 8, 2007, at 5:31 AM, Martin MOKREJŠ wrote:

...

>
> In principle I do agree with you but let me emphasize that I fully  
> agree with Wayne
> who wrote me yesterday in the way that the GenBank format is he way  
> to write down
> your data, and we often really do not need all the fields required  
> for data syubmission
> into the Genbank database:

...

It does make sense to leave some of those fields out except in cases  
where they are needed (with the exception of the '.' fields like  
KEYWORDS), but it never made sense to me to have completely blank  
fields or leave out the locus name.  My guess is that most format  
parsers don't look for empty fields (or complain when one is  
encountered) b/c empty fields haven't been encountered before; they  
were always left out completely.  What would work best for all would  
be optional validation warnings or a separate validation module if  
one worried about checking compliance issues with GenBank format,  
something that hasn't happened yet (and I don't have time to code for!).

Wayne, I would say use Martin's advice for the locus name (file name  
w/o extension), and if the field allows '.' then add it in, otherwise  
it's probably easier to leave the blank fields out completely,  
GenBank compliance or not.  There are several questionably compliant  
files in the genbank test suite in BioPerl so this wouldn't be the  
first one, and if someone wants a validation system they can try  
building one until we have time to do it.

chris





More information about the Biopython mailing list