[EMBOSS] Working with Geneseq databases

Rozenbaum, Daniel (Biocceleration Inc) daniel.rozenbaum at USPTO.GOV
Thu Feb 7 14:27:57 UTC 2013


Hi Isabelle,

Thanks a lot for your feedback. I might be doing something wrong, but even after I remove the extra SQ lines, dbiflat still fails with something like

   EMBOSS An error in embdbi.c at line 1569:
Error in embDbiSortWriteFields, expected entry  not found, last was 'BAD10932'

Have you seen anything like this by any chance? Looking at the ".acnum_sort" dbiflat creates, it appears that dbiflat misinterprets some of the lines with non-standard line type codes like "PA", and appears to treat the strings in those lines as accession numbers. Not sure if my interpretation of what I'm seeing is correct, and whether this is what's causing the error I mentioned above. What I did try was to simply add the "CC" line type code at the beginning of any line whose line type code is not in EMBL standard. This seems to have done the trick, but I'm wondering if it's the best way to go.

If it matters, we're using EMBOSS 6.4.0 .

Many thanks,
Daniel
________________________________________
From: emboss-bounces at lists.open-bio.org [emboss-bounces at lists.open-bio.org] On Behalf Of Wells, Isabelle [isabelle.wells at roche.com]
Sent: Thursday, February 07, 2013 5:14 AM
To: emboss at lists.open-bio.org
Subject: Re: [EMBOSS] Working with Geneseq databases

Hi Daniel,
We index the geneseq databases in the EMBL format. The problem with Geneseq format is that each entry has several lines starting with "SQ   ". Therefore in order to make this work you just need to write a program which only prints the first line starting with "SQ   " in each entry and skips the following SQ lines.
Hope this helps!

Best regards,
Isabelle Wells
F. Hoffmann-La Roche Ltd


-----Original Message-----
From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Rozenbaum, Daniel (Biocceleration Inc)
Sent: Wednesday, 6. February 2013 16:50
To: emboss at lists.open-bio.org
Subject: [EMBOSS] Working with Geneseq databases

Dear all,

Does anyone have experience getting EMBOSS to work with the Geneseq database distributed by Thomson Reuters ( http://thomsonreuters.com/products_services/science/science_products/a-z/geneseq/ ) ? This database comes in "EMBL-like" format that uses some line codes that are not defined in EMBL format proper, which in our experiments has caused problems when, for example, trying to index these databases as EMBL-formatted.
--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/Office of Application Engineering and Development/Patent System Division
600 Dulany St., Alexandria, VA 22314



More information about the EMBOSS mailing list