[Bioperl-l] GenBank accession bug?
Chris Fields
cjfields at uiuc.edu
Thu Feb 22 21:01:03 UTC 2007
On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote:
>> The issue at hand is whether we can support GenBank accessions/
>> display_id/version with your naming scheme.
>
> Chris, I'm a little unsure of what you're saying here (which might
> mean
> that you're already saying what I'm about to...say). Do you mean it
> might
> be tricky to support both the Genbank standard and Dmitry's
> simultaneously?
>
> I would argue any arbitrary ID should be supported as long as that
> ID is a
> contiguous non-space word (\S+).
>
> Actually the existing accession regex looks like it already
> supports IDs
> with '-':
>
> /^ACCESSION\s+(\S.*\S)/
>
> It's only the version regex which doesn't (\w doesn't include '-'):
>
> /^\w+\.(\d+)/
>
>
> Anyone else have thoughts or comments on this? Off the top of my
> head, I
> can't think of any issues that might arise from doing so (apart from
> having to modify all of the SeqIO modules to support it).
>
> Dave
You're right; the argument comes down simply to whether we would
support \S+ or just \w+. I'm neutral on this myself, but I wonder
how allowing \S+ would affect other modules (for instance, indexing
for a flat db), where one might just use \w+ for accessions,
expecting them to be GenBank- or EMBL-like alphanumerics. The fact
that \S+ was supported in the past (as indicated in the bug report)
and then wasn't post 1.2 makes me think there was a reason for
someone going in and modifying it, but that was before my time on the
group.
I'll have a look at the CVS history when I have time to see what I
can dig up.
chris
More information about the Bioperl-l
mailing list