[Bioperl-l] BLAST parsing Question
Brian Osborne
brian_osborne@cognia.com
Thu, 9 May 2002 11:49:34 -0400
Simon,
>> In other words, I don't want any sequence in the database to be hit more
than once.
I'm guessing you're going to have to code this yourself, I don't think this
is built in to any of the Blast modules in Bioperl. Perhaps there's some
example code in there somewhere. Here's a possible starting point but I
haven't thought about this in the context of BPlite :
How do I compute the difference of two arrays? How do I compute the
intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is
unique in a given array:
@union = @intersection = @difference = ();
%count = ();
foreach $element (@array1, @array2) { $count{$element}++ }
foreach $element (keys %count) {
push @union, $element;
push @{ $count{$element} > 1 ? \@intersection : \@difference },
$element;
}
Brian O.
-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of Simon K. Chan
Sent: Thursday, May 09, 2002 11:14 AM
To: Wiepert, Mathieu; 'Leonardo Marino-Ramirez'
Cc: Bioperl Help
Subject: RE: [Bioperl-l] BLAST parsing Question
Mathieu, Leonardo, and Brian,
thanks for responding. Leonardo, I have been using
BPlite. I know that it is a blast parser. However, I
think you misunderstood my question.
The sequences in the fasta file will no doubt hit more
than one sequence in the database made by formatdb. I
would like to get what each sequence in the file hits
(if at all). So, if 2 different sequences in the file
hit the same sequence in the database, I only want the
one with the better P value. In other words, I don't
want any sequence in the database to be hit more than
once.
But maybe I missed something, so I'll carefully look
at the sites you guys mentioned.
Mathieu, thanks for the sample script!
Thanks for your help, guys!
--- "Wiepert, Mathieu" <Wiepert.Mathieu@mayo.edu>
wrote:
> Hi,
>
> I believe the class to use is now Bio::SearchIO?
> I'll send you a longwinded
> program that has an example, or check out
>
http://docs.bioperl.org/releases/bioperl-1.0/Bio/SearchIO.html
>
>
>
>
> my $searchio = new Bio::SearchIO(-format => 'blast',
> -file =>
> 'blast.out');
>
> -Mat
>
> -----Original Message-----
> From: Leonardo Marino-Ramirez
> [mailto:marino@tofu.tamu.edu]
> Sent: Thursday, May 09, 2002 6:34 AM
> To: Simon K. Chan
> Cc: Bioperl Help
> Subject: Re: [Bioperl-l] BLAST parsing Question
>
>
> Dear Simon,
>
> I encourage you to read the bioperl tutorial
>
> http://bio.perl.org/Core/bptutorial.html
>
> The module to use is BPlite. For usage see:
>
>
http://docs.bioperl.org/releases/bioperl-1.0/Bio/Tools/BPlite.html
>
> Leonardo
>
> On Wed, 8 May 2002, Simon K. Chan wrote:
>
> > My fellow BioPerl-ers,
> >
> > It took me a couple of hours to realize that what
> I
> > need to do is EXTREMELY common and that it's
> already
> > been done. D'oh!! :-0
> >
> > So, I made a fasta database by using the formatdb
> > command. I have a fasta file with a bunch of
> > sequences.
> >
> > Example:
> >
> > fasta file: seq A, seq B, seq C
> > db made with formatdb: seq 1, seq 2, seq 3
> >
> > I blast the sequences in the file against the
> > database.
> > So, let's say that seq A only hits seq 1 with a P
> > value of 10e-100 and seq B only hits seq 1 with a
> P
> > value of 2e-15. Seq C hits nothing. How would I
> pull
> > out that Seq A matches Seq1 and that Seq C AND Seq
> B
> > match nothing?
> >
> > I tried to do this with hashes where the keys were
> the
> > matches from the db and the values were arrays
> with
> > the P value and query name. But it go real messy.
> > Yuck! Does anyone know of a module or some other
> > quick way of doing this?
> >
> > Thanks, all.
> >
> > =====
> >
> > #################
> >
> > Warmest Regards,
> >
> > Simon K. Chan - bioinformatics_rocks@yahoo.com
> >
> > "Great spirits have always encountered violent
> opposition from mediocre
> minds."
> >
> > - Albert Einstein
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Shopping - Mother's Day is May 12th!
> > http://shopping.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> --
>
> ___ _/
>
_______________________________________________________________
> _/
> _/ _/ _/_/_/ Leonardo Marino-Ramirez
> lmarino@tamu.edu
> _/ _/_/ _/_/ _/ Biochemistry Department,
> Texas A&M University
> _/_/_/_/ _/ _/_/_/ 2128 TAMU, College Station,
> TX 77843-2128, USA
> _/ _/ _/ Voice: (979) 862-4055 Fax:
> (979) 845-9274
> ___ _/ _/ _/
> ________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Mother's Day is May 12th!
http://shopping.yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l