[EMBOSS] Support for multi-line annotation in ig format

Rozenbaum, Daniel (Biocceleration Inc) daniel.rozenbaum at USPTO.GOV
Wed Sep 19 14:45:35 UTC 2012


A quick addition to the information on this format: while the example I sent has the records separated by a couple of new lines and a form feed (^L , 0x0c), in the most general case the first line of the next record (a line that starts with a semicolon) could appear immediately after the last sequence data line of the previous record. Empty lines between records are ignored.

On Sep 19, 2012, at 10:09 AM, "Rozenbaum, Daniel (Biocceleration Inc)" <daniel.rozenbaum at USPTO.GOV> wrote:

> Dear Peter,
> 
> This is most wonderful news that's going to make a bunch of users really happy!
> 
> I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable. 
> 
> As for the name, how about something like "iguspto"?
> 
> Lastly, do you think the patch with this change would be made available for EMBOSS 6.4? 
> 
> With gratitude,
> Daniel
> 
> --
> Daniel Rozenbaum
> Biocceleration, Inc.
> OCIO/ Office of Application Engineering & Development/ Patent System Division
> 600 Dulany St.
> Alexandria, VA 22314
> 
> -----Original Message-----
> From: Peter Rice [mailto:ricepeterm at yahoo.co.uk] 
> Sent: Wednesday, September 19, 2012 6:48 AM
> To: Rozenbaum, Daniel (Biocceleration Inc)
> Cc: emboss at lists.open-bio.org
> Subject: Re: [EMBOSS] Support for multi-line annotation in ig format
> 
> Dear Daniel,
> 
> On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote:
>> Greetings again,
>> 
>> If I may, another question on the issue of IG format: how difficult would it be to support database indexing for this format?
> 
> Very easy, a 1-day job including testing and documentation.
> 
> Could you please make some example data available, and indicate which fields could be indexed (including any information in formatted descriptions or in naming conventions), and suggest a format name (e.g. 
> USPTO or Biocceleration)
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 
> 
> <ig_uspto_sample.txt>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss




More information about the EMBOSS mailing list