[Bioperl-l] Genbank file : bad features (tag) order with /translation

Chris Fields cjfields at illinois.edu
Wed Aug 3 17:10:31 UTC 2011


IMHO I find genbank too unwieldy, but it's nice to know the output works for NCBI submission.

chris

On Aug 3, 2011, at 12:06 PM, Brian Osborne wrote:

> Peter,
> 
> I currently use BioPerl and SeqIO::genbank to create the *gbf files for NCBI submission, they've always accepted them. In fact I think they don't even use them, I believe they use the *tbl, *fsa, and *agp files and the ASN file as data sources.
> 
> Brian O
> 
> On Aug 3, 2011, at 12:52 PM, Chris Fields wrote:
> 
>> On Aug 3, 2011, at 11:00 AM, Peter Cock wrote:
>> 
>>> 2011/8/3 Maxime Déraspe <maximilien1er at gmail.com>:
>>>>> 
>>>>> Why do you care about the order?
>>>>> 
>>>> 
>>>> Hi Peter,
>>>> 
>>>> I care about the order for the submission to ncbi.
>>> 
>>> Do the NCBI have some guidelines which ask for a particular order?
>> 
>> No, beyond the feature table there is no specification that indicates such that I am aware of.  Submitted data is tabular; sequin is a nicer GUI API for getting data into a useful format for submission to NCBI, where data is converted to ASN.1 I believe.
>> 
>>>> But I guess they
>>>> will reformat the file before getting it in their database.
>>> 
>>> They seem to generate the official GenBank files from their
>>> database - so I doubt the input order matters.
>> 
>> Yep, that's correct.  If NCBI ruled the world everyone would be using ASN.1 (b/c that's what they use internally).
>> 
>>>> It's also
>>>> visually better when the translation of the protein comes in the end
>>>> of the annotation for the CDS and not before /product, /note ....
>>> 
>>> I do see your point, but if that were the only motivation I wouldn't
>>> want to make generating GenBank output any more complicated
>>> than it already is.
>> ...
>>>> Anyway maybe I'll reformat the file in sequin table for a direct
>>>> submission to ncbi with sequin.
>>>> 
>>>> Thank you.
>>>> 
>>>> Max
>>> 
>>> Peter
>> 
>> 
>> Maxime, I find most users try to avoid using GenBank format except when absolutely needed.  There is a very good reason Sequin and tbl2asn are used by NCBI for submissions; they end up generating simple tabular data that is easier to feed into their internal ASN.1 format.  Genbank is a nice human-readable format, but structure-wise I find it's a pain to deal with, not to mention the variant third-party 'genbank' data that users want us to handle.
>> 
>> We try to support generation of output within reason, but that's never been our primary goal.  As long as the output generated is capable of being re-read by our parsers with the data intact and generates sane data we're pretty happy.
>> 
>> Saying that, any additions to deal with this are perfectly welcome (I pointed out one mechanism that could be used), but they would have to address the concerns Peter and I alluded to previously, and it would be nice to evaluate how any changes affect performance.  You are more than welcome to submit this as a feature request using our redmine server (including patches if you do this yourself):
>> 
>> https://redmine.open-bio.org/
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list