[EMBOSS] how to find unique DNA sequences from a large database

yun zheng mincloud at gmail.com
Fri Dec 8 18:50:40 UTC 2006


Dear All,

Many thanks for your reply.

Best regards.

sincerely
zheng, yun


On 12/8/06, pmr at ebi.ac.uk <pmr at ebi.ac.uk> wrote:
>
> Dear Yun Zheng,
>
> > Are there any tools for find unique sequences from a large database?
> Many
> > thanks.
> >
> > I need to find unique DNA sequences from a large database. A short piece
> > is
> > given as follows.
> >
>
> > All these seem to be the same sequence, since BLASTN gives very small
> > e-values for their alignments.
>
> Remember than BLASTN is a local alignment tool. The small e-values
> indicate that some part of your 001 query sequence is similar to some part
> of a sequence in the database.
>
> You need to check what is matching in the alignments reported by BLASTN.
> One useful test is whether the whole length of your query is matching to
> any of the sequences in the database, also for DNA whether it is matching
> in one or both directions (as sequences can have biologically significant
> inverted repeats).
>
> There are tools (not in EMBOSS) available for building non-redundant
> databases - excluding sequences which are subsequences of others in the
> database, or selecting one of a set of sequences that match closely over
> their whole length. But you do have to decide what you mean by redundancy
> and make sure that the methods you apply are appropriate.
>
> Hope that helps,
>
> Peter Rice
>
>



More information about the EMBOSS mailing list