[Bioperl-l] GenBank accession bug?
    Chris Fields 
    cjfields at uiuc.edu
       
    Thu Feb 22 21:01:03 UTC 2007
    
    
  
On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote:
>> The issue at hand is whether we can support GenBank accessions/
>> display_id/version with your naming scheme.
>
> Chris, I'm a little unsure of what you're saying here (which might  
> mean
> that you're already saying what I'm about to...say). Do you mean it  
> might
> be tricky to support both the Genbank standard and Dmitry's
> simultaneously?
>
> I would argue any arbitrary ID should be supported as long as that  
> ID is a
> contiguous non-space word (\S+).
>
> Actually the existing accession regex looks like it already  
> supports IDs
> with '-':
>
> /^ACCESSION\s+(\S.*\S)/
>
> It's only the version regex which doesn't (\w doesn't include '-'):
>
> /^\w+\.(\d+)/
>
>
> Anyone else have thoughts or comments on this? Off the top of my  
> head, I
> can't think of any issues that might arise from doing so (apart from
> having to modify all of the SeqIO modules to support it).
>
> Dave
You're right; the argument comes down simply to whether we would  
support \S+ or just \w+.  I'm neutral on this myself, but I wonder  
how allowing \S+ would affect other modules (for instance, indexing  
for a flat db), where one might just use \w+ for accessions,  
expecting them to be GenBank- or EMBL-like alphanumerics.  The fact  
that \S+ was supported in the past (as indicated in the bug report)  
and then wasn't post 1.2 makes me think there was a reason for  
someone going in and modifying it, but that was before my time on the  
group.
I'll have a look at the CVS history when I have time to see what I  
can dig up.
chris
    
    
More information about the Bioperl-l
mailing list