[Bioperl-l] BLAST parsing Question

Brian Osborne brian_osborne@cognia.com
Thu, 9 May 2002 11:49:34 -0400


Simon,

>> In other words, I don't want any sequence in the database to be hit more
than once.

I'm guessing you're going to have to code this yourself, I don't think this
is built in to any of the Blast modules in Bioperl. Perhaps there's some
example code in there somewhere. Here's a possible starting point but I
haven't thought about this in the context of BPlite :

How do I compute the difference of two arrays? How do I compute the
intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is
unique in a given array:
    @union = @intersection = @difference = ();
    %count = ();
    foreach $element (@array1, @array2) { $count{$element}++ }
    foreach $element (keys %count) {
        push @union, $element;
        push @{ $count{$element} > 1 ? \@intersection : \@difference },
$element;
    }


Brian O.


-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of Simon K. Chan
Sent: Thursday, May 09, 2002 11:14 AM
To: Wiepert, Mathieu; 'Leonardo Marino-Ramirez'
Cc: Bioperl Help
Subject: RE: [Bioperl-l] BLAST parsing Question

Mathieu, Leonardo, and Brian,

thanks for responding.  Leonardo, I have been using
BPlite.  I know that it is a blast parser.  However, I
think you misunderstood my question.

The sequences in the fasta file will no doubt hit more
than one sequence in the database made by formatdb.  I
would like to get what each sequence in the file hits
(if at all).  So, if 2 different sequences in the file
hit the same sequence in the database, I only want the
one with the better P value.  In other words, I don't
want any sequence in the database to be hit more than
once.


But maybe I missed something, so I'll carefully look
at the sites you guys mentioned.

Mathieu,  thanks for the sample script!


Thanks for your help, guys!
--- "Wiepert, Mathieu" <Wiepert.Mathieu@mayo.edu>
wrote:
> Hi,
>
> I believe the class to use is now Bio::SearchIO?
> I'll send you a longwinded
> program that has an example, or check out
>
http://docs.bioperl.org/releases/bioperl-1.0/Bio/SearchIO.html
>
>
>
>
> my $searchio = new Bio::SearchIO(-format => 'blast',
>                                  -file =>
> 'blast.out');
>
> -Mat
>
> -----Original Message-----
> From: Leonardo Marino-Ramirez
> [mailto:marino@tofu.tamu.edu]
> Sent: Thursday, May 09, 2002 6:34 AM
> To: Simon K. Chan
> Cc: Bioperl Help
> Subject: Re: [Bioperl-l] BLAST parsing Question
>
>
> Dear Simon,
>
> I encourage you to read the bioperl tutorial
>
> http://bio.perl.org/Core/bptutorial.html
>
> The module to use is BPlite. For usage see:
>
>
http://docs.bioperl.org/releases/bioperl-1.0/Bio/Tools/BPlite.html
>
> Leonardo
>
> On Wed, 8 May 2002, Simon K. Chan wrote:
>
> > My fellow BioPerl-ers,
> >
> > It took me a couple of hours to realize that what
> I
> > need to do is EXTREMELY common and that it's
> already
> > been done.  D'oh!! :-0
> >
> > So, I made a fasta database by using the formatdb
> > command.  I have a fasta file with a bunch of
> > sequences.
> >
> > Example:
> >
> > fasta file: seq A, seq B, seq C
> > db made with formatdb: seq 1, seq 2, seq 3
> >
> > I blast the sequences in the file against the
> > database.
> > So, let's say that seq A only hits seq 1 with a P
> > value of 10e-100 and seq B only hits seq 1 with a
> P
> > value of 2e-15.  Seq C hits nothing.  How would I
> pull
> > out that Seq A matches Seq1 and that Seq C AND Seq
> B
> > match nothing?
> >
> > I tried to do this with hashes where the keys were
> the
> > matches from the db and the values were arrays
> with
> > the P value and query name. But it go real messy.
> > Yuck!  Does anyone know of a module or some other
> > quick way of doing this?
> >
> > Thanks, all.
> >
> > =====
> >
> > #################
> >
> > Warmest Regards,
> >
> > Simon K. Chan - bioinformatics_rocks@yahoo.com
> >
> > "Great spirits have always encountered violent
> opposition from mediocre
> minds."
> >
> > - Albert Einstein
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Shopping - Mother's Day is May 12th!
> > http://shopping.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> --
>
> ___ _/
>
_______________________________________________________________
>    _/
>   _/    _/      _/_/_/    Leonardo Marino-Ramirez
> lmarino@tamu.edu
>  _/    _/_/  _/_/    _/  Biochemistry Department,
> Texas A&M University
> _/_/_/_/  _/  _/_/_/    2128 TAMU, College Station,
> TX 77843-2128, USA
>      _/      _/     _/ Voice: (979) 862-4055   Fax:
> (979) 845-9274
> ___ _/      _/     _/
> ________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l


__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Mother's Day is May 12th!
http://shopping.yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l