[EMBOSS] Support for multi-line annotation in ig format

Peter Rice ricepeterm at yahoo.co.uk
Wed Sep 19 15:14:15 UTC 2012


Dear Daniel,

On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:

> I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable.

Thanks I'll take a look.

We usually index an "access number" in addition to the identifier. Is 
there some significance in the parts of the id naming that could be used 
as an accession or a sequence version?

> As for the name, how about something like "iguspto"?

Thanks. I may just use USPTO but it's not important.

> Lastly, do you think the patch with this change would be made available for EMBOSS 6.4?

Yes ... it is a fairly straightforward extension to dbxflat so I could 
send you a copy but for general release I would prefer to distribute  it 
only from 6.5 onwards.

regards,

Peter Rice
EMBOSS Team



More information about the EMBOSS mailing list