[Bioperl-l] problems with blast parser

Chris Fields cjfields at uiuc.edu
Thu Apr 6 17:42:16 UTC 2006


I didn't think of that, but makes sense considering he mentioned the file is
huge and the process is killed off.  I agree with Jason, that tabular output
is probably the best way to go here.  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Thursday, April 06, 2006 12:30 PM
> To: Chris Fields; Alessandro S. Nascimento
> Cc: BioPerl list
> Subject: Re: [Bioperl-l] problems with blast parser
> 
> I'm pretty sure for thousands of HSPs this can be an out of memory
> problem.  I've explained workarounds before on the list, but they
> basically mean building a new listener object that creates simple
> hashes (or arrays) instead of full-blown HSP objects.  Personally I
> use a hybrid approach depending on the dataset  - SearchIO can be too
> slow and too memory intensive for the cases where I am just getting
> top hits or summary stats, but if I want the alignment strings, more
> stats, etc then I use SearchIO.
> 
> 
> The question is - do you really want to be parsing a huge file, can
> you get away with using tabular output (-m8 or -m9) from BLAST?  If
> you are balking at re-running the blast something like blast2table is
> simple pure-perl to generate an -m 8 tabular output from BLAST report
> very efficiently.  This is discussed on the bioperl BLAST wiki page I
> believe.
> 
> 
> -jason
> On Apr 6, 2006, at 11:56 AM, Chris Fields wrote:
> 
> > Alessandro,
> >
> > We need to know a few things first:
> >
> > 1)  What version of Bioperl?
> > 2)  BLAST version?
> > 3)  What OS?
> > 4)  Perl version?
> > 5)  Exactly how large is your file?
> >
> > It would also be nice to see at least a chunk of your script to
> > rule out a
> > logic error there.  If you want you can also submit your script by
> > filing
> > this as a bug in Bugzilla and attaching your script.
> >
> > http://www.bioperl.org/wiki/Bugs
> >
> > If you have an older version of Bioperl (such as 1.4) consider
> > upgrading to
> > 1.5.1 or CVS.  Lots of fixes have been incorporated since 1.4,
> > including to
> > SearchIO.
> >
> > Chris
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Alessandro S. Nascimento
> >> Sent: Tuesday, April 04, 2006 10:28 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] problems with blast parser
> >>
> >> Hi all
> >>
> >> I'm trying to parse a blast standalone (blaspgp) result file and
> >> filter
> >> some sequences using length and identity. The script used to work but
> >> this time after several minutes working in 99.9% of my processor I
> >> have
> >> a "killed"message with no more information. The blast file is very
> >> large. Does anyone have any clue ?
> >>
> >> Thanks in advance
> >>
> >> Alessandro
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12





More information about the Bioperl-l mailing list