[Bioperl-l] limit the number of blast output per query
Ross KK Leung
ross at cuhk.edu.hk
Thu Jul 14 23:05:27 UTC 2011
blastall -p blastn -F F -b 50 -v 50 -i query.fna -d genomes.fna -e 1e-30 -o
output.tab -m 8 -a 8
as what Hans-Rudolf Hotz commented, I guess there are just too many HSPs for
hitting human genome and all my downstream analysis programs rely on
blastall m8 output, now I just don't know how to adapt to other new formats
shortly.
-----Original Message-----
From: Frank Schwach [mailto:fs5 at sanger.ac.uk]
Sent: 2011年7月14日 23:09
To: Hans-Rudolf Hotz
Cc: Ross KK Leung; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] limit the number of blast output per query
-b limits the number of output alignments, not sure why it isn't working
for you. How many HSPs do you actually get? Is there a chance that the
output is just so large because of a large number of queries and you are
not seeing 10k+ HSPs but results?
Can you post the entire blastall command? The large number of hits looks
like you are blasting short sequences, there are better algorithms for
short sequence alignments, so if that's what you are doing I could give
you some hints for alternative software to try.
Frank
On Thu, 2011-07-07 at 11:02 +0200, Hans-Rudolf Hotz wrote:
> Hi
>
> just double checking: are you really talking abut "10,000+ hits"? or do
> you mean "10,000+ HSPs" ('high-scoring segment pairs')?
>
> I don't know how your genome database looks like, but assuming you have
> one sequence per chromosome, then you will get just 24 hits (ie each
> chromosome) and then depending on your query each hit will have a lot of
> HSPs.
>
> As far as as I know, there is no way to limit the number of HSPs (you
> might try playing with the E value).
>
> You can try using the tabular output format (this will reduce the file
> size) - or may be BLAST is not the right search tool for your task?
>
>
> Regards, Hans
>
>
> On 07/07/2011 04:43 AM, Ross KK Leung wrote:
> > I know this question should submit to BLAST help but it seems they have
> > already been overwhelmed by incoming emails. I wonder any bioperl users
> > happen to know how to limit the number of blast output per query. For
> > example, for human genome as a database to blast against, a single query
can
> > generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but
> > obviously the blastall from blast2.2.22 does not "obey" my instruction.
> >
> > The output files generated are usually larger than 100G+ but indeed the
> > final ones that I want usually are only of 10M-. Is there any way to
help
> > save our Earth (Not exaggerated, energy is WASTED in a meaningless
manner)?
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list