[EMBOSS] Support for multi-line annotation in ig format
Rozenbaum, Daniel (Biocceleration Inc)
daniel.rozenbaum at USPTO.GOV
Wed Sep 19 14:45:35 UTC 2012
A quick addition to the information on this format: while the example I sent has the records separated by a couple of new lines and a form feed (^L , 0x0c), in the most general case the first line of the next record (a line that starts with a semicolon) could appear immediately after the last sequence data line of the previous record. Empty lines between records are ignored.
On Sep 19, 2012, at 10:09 AM, "Rozenbaum, Daniel (Biocceleration Inc)" <daniel.rozenbaum at USPTO.GOV> wrote:
> Dear Peter,
>
> This is most wonderful news that's going to make a bunch of users really happy!
>
> I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable.
>
> As for the name, how about something like "iguspto"?
>
> Lastly, do you think the patch with this change would be made available for EMBOSS 6.4?
>
> With gratitude,
> Daniel
>
> --
> Daniel Rozenbaum
> Biocceleration, Inc.
> OCIO/ Office of Application Engineering & Development/ Patent System Division
> 600 Dulany St.
> Alexandria, VA 22314
>
> -----Original Message-----
> From: Peter Rice [mailto:ricepeterm at yahoo.co.uk]
> Sent: Wednesday, September 19, 2012 6:48 AM
> To: Rozenbaum, Daniel (Biocceleration Inc)
> Cc: emboss at lists.open-bio.org
> Subject: Re: [EMBOSS] Support for multi-line annotation in ig format
>
> Dear Daniel,
>
> On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote:
>> Greetings again,
>>
>> If I may, another question on the issue of IG format: how difficult would it be to support database indexing for this format?
>
> Very easy, a 1-day job including testing and documentation.
>
> Could you please make some example data available, and indicate which fields could be indexed (including any information in formatted descriptions or in naming conventions), and suggest a format name (e.g.
> USPTO or Biocceleration)
>
> regards,
>
> Peter Rice
> EMBOSS Team
>
>
> <ig_uspto_sample.txt>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
More information about the EMBOSS
mailing list