[EMBOSS] Support for multi-line annotation in ig format

Rozenbaum, Daniel (Biocceleration Inc) daniel.rozenbaum at USPTO.GOV
Wed Sep 19 15:23:59 UTC 2012


Dear Peter,

At least within the context of USPTO the sequence identifier is the only consistently present piece of information that uniquely identifies the sequence. Does the absence of an accession number field make the task of adding support for this in EMBOSS more complex?

Thank you,
Daniel

On Sep 19, 2012, at 11:14 AM, "Peter Rice" <ricepeterm at yahoo.co.uk> wrote:

> Dear Daniel,
> 
> On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> 
>> I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable.
> 
> Thanks I'll take a look.
> 
> We usually index an "access number" in addition to the identifier. Is 
> there some significance in the parts of the id naming that could be used 
> as an accession or a sequence version?
> 
>> As for the name, how about something like "iguspto"?
> 
> Thanks. I may just use USPTO but it's not important.
> 
>> Lastly, do you think the patch with this change would be made available for EMBOSS 6.4?
> 
> Yes ... it is a fairly straightforward extension to dbxflat so I could 
> send you a copy but for general release I would prefer to distribute  it 
> only from 6.5 onwards.
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 




More information about the EMBOSS mailing list