[EMBOSS] Support for multi-line annotation in ig format
Rozenbaum, Daniel (Biocceleration Inc)
daniel.rozenbaum at USPTO.GOV
Tue Sep 18 02:00:58 UTC 2012
Greetings again,
If I may, another question on the issue of IG format: how difficult would it be to support database indexing for this format?
With best regards,
Daniel
--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314
On Sep 14, 2012, at 9:36 AM, "Rozenbaum, Daniel (Biocceleration Inc)" <daniel.rozenbaum at USPTO.GOV> wrote:
> Hello Peter and everyone,
>
> I was wondering if I could revive the discussion about the support of IG format if possible. I'm helping deploy EMBOSS at the US Patent and Trademark Office, where this format, in its multi-line sequence annotation form, is used extensively.
>
> Here's an example of an additional issue I've run into when trying to work with IG format in EMBOSS:
>
> % makeprotseq -amount 10 -length 10 -nouseinsert -osformat ig -auto -osname ig1
>
> % cat ig1.ig
> ;, 10 bases
> EMBOSS_001
> hcsptpstas1
> ;, 10 bases
> EMBOSS_002
> rdgwcvmtrm1
> ;, 10 bases
> EMBOSS_003
> fgtifgdgid1
> <snip>
>
> % entret -sequence ig1.ig:EMBOSS_001 -nofirstonly -auto -stdout
> ;, 10 bases
> EMBOSS_001
> hcsptpstas1
> ;, 10 bases
>
> In the entret result above the first annotation line of the subsequent record is returned as part of the requested record.
>
> Many thanks,
> Daniel
> --
> Daniel Rozenbaum
> Biocceleration, Inc.
> OCIO/ Office of Application Engineering & Development/ Patent System Division
> 600 Dulany St.
> Alexandria VA 22314
>
> -------------------------
> On 15/08/2012 17:57, Daniel Rozenbaum wrote:
>> Dear list,
>>
>> (Peter, many thanks for your prompt reply to my previous inquiry!)
>>
>> We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description:
>>
>> % cat /tmp/IGSEQ.ig
>> ; Annotation line 1
>> ; Annotation line 2
>> ; Annotation line 3
>> IGSEQ
>> ACGCATCGCATCAGACTACGC1
>>
>>
>> % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp
>>
>>
>> % cat /tmp/IGSEQ.emboss_ig2ig.ig
>> ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases
>> IGSEQ
>> ACGCATCGCATCAGACTACGC1
>>
>> Are there any plans to support multi-line annotation in this format?
>
> Interesting thought. We will take a look. It will need some care to
> maintain compatibility with other formats that have single (FASTA) or
> multiple (swissprot) descriptions.
>
> Which package is using this IG format?
>
> regards,
>
> Peter Rice
> EMBOSS Team
>
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>
More information about the EMBOSS
mailing list