[EMBOSS] Support for multi-line annotation in ig format

Rozenbaum, Daniel (Biocceleration Inc) daniel.rozenbaum at USPTO.GOV
Tue Sep 18 02:00:58 UTC 2012


Greetings again,

If I may, another question on the issue of IG format: how difficult would it be to support database indexing for this format?

With best regards,
Daniel

--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314

On Sep 14, 2012, at 9:36 AM, "Rozenbaum, Daniel (Biocceleration Inc)" <daniel.rozenbaum at USPTO.GOV> wrote:

> Hello Peter and everyone,
> 
> I was wondering if I could  revive the discussion about the support of IG format if possible. I'm helping deploy EMBOSS at the US Patent and Trademark Office, where this format, in its multi-line sequence annotation form, is used extensively.
> 
> Here's an example of an additional issue I've run into when trying to work with IG format in EMBOSS:
> 
> % makeprotseq -amount 10 -length 10 -nouseinsert -osformat ig -auto -osname ig1
> 
> % cat ig1.ig
> ;, 10 bases
> EMBOSS_001
> hcsptpstas1
> ;, 10 bases
> EMBOSS_002
> rdgwcvmtrm1
> ;, 10 bases
> EMBOSS_003
> fgtifgdgid1
> <snip>
> 
> % entret  -sequence ig1.ig:EMBOSS_001 -nofirstonly -auto -stdout
> ;, 10 bases
> EMBOSS_001
> hcsptpstas1
> ;, 10 bases
> 
> In the entret result above the first annotation line of the subsequent record is returned as part of the requested record.
> 
> Many thanks,
> Daniel
> --
> Daniel Rozenbaum
> Biocceleration, Inc.
> OCIO/ Office of Application Engineering & Development/ Patent System Division 
> 600 Dulany St.
> Alexandria VA 22314
> 
> -------------------------
> On 15/08/2012 17:57, Daniel Rozenbaum wrote:
>> Dear list,
>> 
>> (Peter, many thanks for your prompt reply to my previous inquiry!)
>> 
>> We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description:
>> 
>> % cat /tmp/IGSEQ.ig
>> ; Annotation line 1
>> ; Annotation line 2
>> ; Annotation line 3
>> IGSEQ
>> ACGCATCGCATCAGACTACGC1
>> 
>> 
>> % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp
>> 
>> 
>> % cat /tmp/IGSEQ.emboss_ig2ig.ig
>> ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases
>> IGSEQ
>> ACGCATCGCATCAGACTACGC1
>> 
>> Are there any plans to support multi-line annotation in this format?
> 
> Interesting thought. We will take a look. It will need some care to 
> maintain compatibility with other formats that have single (FASTA) or 
> multiple (swissprot) descriptions.
> 
> Which package is using this IG format?
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 




More information about the EMBOSS mailing list