[EMBOSS] FW: Reducing a FASTA repository, new user

Marvin Stodolsky marvin.stodolsky at gmail.com
Thu Feb 17 02:07:51 UTC 2011

All thanks for the suggestions.  A solution to the GeneBegin..GeneEnd
problem has been worked out, per the Attachment, for those interested.

But for me the more important problem is making a FASTA repository,
which is a subset of the gene files in a much larger Repository.  This
is desirable before & after using Usearch -
to select out a minimally homologous gene set of a species.
Elimination of RNA genes, cryptic viruses, SINE/LINE genes are among
the undesirables.

Specifically, is the command using ENTRET or relatives , to accept a list like
for extraction and repacking into a single smaller Repository?

If not, could you recommend a software tool/suite for this type of job.


On Tue, Feb 15, 2011 at 3:59 AM, Peter Rice <pmr at ebi.ac.uk> wrote:
> On 14/02/2011 23:35, Marvin Stodolsky wrote:
>>  This is elementary I’m sure, but I’ve been unable to work out the
>> syntax  from the documentation.
>> More minor issue.
>> When using infoseq to extract all the fasta Headers from a sequence
>> Repository, the GeneBegin..GeneEnd (like   234466..234589) often fails to
>> come as a uniform field/fields in a resultant spreadsheet.  Is there a Fix
>> for this?
> I don't see the genebegin and geneend in EMBOSS infoseq output. Are they
> part of the sequence ID in the FASTA file?
> You can use a delimiter between items for infoseq using:
>  -nocolumn
> on the command line.
> For import into a spreadsheet you can set the delimiter to be tab with:
>  -nocolumn -delimiter "\t"
> on the command line. That should then import nicely into a spreadsheet.
> Hope that helps
> Peter Rice

More information about the EMBOSS mailing list