[EMBOSS] Support for multi-line annotation in ig format
Peter Rice
ricepeterm at yahoo.co.uk
Wed Sep 19 15:14:15 UTC 2012
Dear Daniel,
On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable.
Thanks I'll take a look.
We usually index an "access number" in addition to the identifier. Is
there some significance in the parts of the id naming that could be used
as an accession or a sequence version?
> As for the name, how about something like "iguspto"?
Thanks. I may just use USPTO but it's not important.
> Lastly, do you think the patch with this change would be made available for EMBOSS 6.4?
Yes ... it is a fairly straightforward extension to dbxflat so I could
send you a copy but for general release I would prefer to distribute it
only from 6.5 onwards.
regards,
Peter Rice
EMBOSS Team
More information about the EMBOSS
mailing list