[EMBOSS] FW: Reducing a FASTA repository, new user
marvin.stodolsky at gmail.com
Thu Feb 17 02:07:51 UTC 2011
All thanks for the suggestions. A solution to the GeneBegin..GeneEnd
problem has been worked out, per the Attachment, for those interested.
But for me the more important problem is making a FASTA repository,
which is a subset of the gene files in a much larger Repository. This
is desirable before & after using Usearch -
to select out a minimally homologous gene set of a species.
Elimination of RNA genes, cryptic viruses, SINE/LINE genes are among
Specifically, is the command using ENTRET or relatives , to accept a list like
for extraction and repacking into a single smaller Repository?
If not, could you recommend a software tool/suite for this type of job.
On Tue, Feb 15, 2011 at 3:59 AM, Peter Rice <pmr at ebi.ac.uk> wrote:
> On 14/02/2011 23:35, Marvin Stodolsky wrote:
>> This is elementary I’m sure, but I’ve been unable to work out the
>> syntax from the documentation.
>> More minor issue.
>> When using infoseq to extract all the fasta Headers from a sequence
>> Repository, the GeneBegin..GeneEnd (like 234466..234589) often fails to
>> come as a uniform field/fields in a resultant spreadsheet. Is there a Fix
>> for this?
> I don't see the genebegin and geneend in EMBOSS infoseq output. Are they
> part of the sequence ID in the FASTA file?
> You can use a delimiter between items for infoseq using:
> on the command line.
> For import into a spreadsheet you can set the delimiter to be tab with:
> -nocolumn -delimiter "\t"
> on the command line. That should then import nicely into a spreadsheet.
> Hope that helps
> Peter Rice
> EMBOSS Team
More information about the EMBOSS