[Bioperl-l] need BLAT parse code
Henk van den Toorn
h.w.p.vandentoorn at bio.uu.nl
Tue Nov 29 13:08:24 EST 2005
Hello Neeti,
Using the Bio::SearchIO module for this kind of output is just what Bioperl
is for!
Try the following code, for every hit you will get a result object. Because
psl can have several hits for each result (different fragments of the same
result), you will get several 'hit' objects for each result.
It is not the most beautiful code, but it shows the point.
I think it is easier this way than parsing the file, because you don't have
to be bothered with which column has which meaning. Moreover it handles the
different hits per query result, which you would otherwise have to do
yourself.
Greetings, Henk
---- snip ----
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SearchIO;
use Error qw(:try);
if (!defined $ARGV[0] || !-e $ARGV[0]) # Test if you have added a filename
at the command line
{
die ("Usage: $0 filename.psl\n");
}
my $pslfile = $ARGV[0];
my $parser = new Bio::SearchIO(-file => $pslfile, -format => 'psl');
my $result;
while ($result = $parser->next_result())
{
print $result->query_name()."\t";
print $result->query_length()."\t";
print $result->num_hits()."\n";
my $matches;
foreach my $hit ($result->hits())
{
print "\t".$hit->name()."\t";
try
{
$matches = $hit->matches('identities');
}
catch Bio::Root::Exception with
{
no warnings 'all';
$matches = "ERROR";
};
print $matches."\t";
print $hit->score()."\n";
}
}
-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Sean Davis
Sent: 29 November 2005 18:19
To: bioperl-ml List
Subject: Re: [Bioperl-l] need BLAT parse code
Neeti,
You could simply put the file in a text editor and take out the header.
Alternatively, type:
blat
without any arguments. You will notice that there are many options to blat,
one of which is -noHead, which suppresses the header. Or, look at only
lines that begin with a number using a regular expression.
Ultimately, I think that it will serve you well to read a perl book, though,
as parsing a text file is an important and basic topic to grasp if you want
to use perl for data analysis.
Sean
On 11/29/05 9:43 AM, "Jason Stajich" <jason.stajich at duke.edu> wrote:
>
>
> Begin forwarded message:
>
>> From: neeti somaiya <neetisomaiya at gmail.com>
>> Date: November 29, 2005 1:27:27 AM EST
>> To: Jason Stajich <jason.stajich at duke.edu>
>> Subject: Re: [Bioperl-l] need BLAT parse code
>>
>> I use the following code :
>>
>> open(FH,"output.psl");
>> while(<FH>)
>> {
>> if( /^psLayout/ )
>> {
>> for( 1..4 ) { <> }
>> }
>> my @line = split;
>> my ( $matches,$mismatches,$rep_matches,$n_count,
>> $q_num_insert,$q_base_insert,
>> $t_num_insert, $t_base_insert,
>> $strand, $q_name, $q_length, $q_start,
>> $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
>> $block_sizes, $q_starts, $t_starts
>> ) = split;
>>
>>
>> print $t_start;
>> print "\n";
>> print $t_end;
>>
>> }
>>
>> for output.psl file :
>>
>> match mis- rep. N's Q gap Q gap T gap T gap
>> strand Q Q Q Q T
>> T T T block blockSizes qStarts tStarts
>> match match count bases count
>> bases name size start end
>> name size start end count
>> ---------------------------------------------------------------------
>> -
>> ---------------------------------------------------------------------
>> -
>> -------------------
>> 27025 0 0 0 0 0 0 0
>> + query_sequence3 27025 0 27025
>> database_sequence3 57701691 132995 160020 1
>> 27025, 0, 132995,
>> ~
>>
>>
>> It gave me output :
>>
>> Q
>> Q
>>
>> 132995
>> 160020
>>
>> What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
>>
>> Please let me know.
>> Thanks.
>>
>> On 11/28/05, Jason Stajich <jason.stajich at duke.edu> wrote:
>> Bio::SearchIO::psl can parse psl output.
>>
>> or more simply:
>>
>> while(<>) {
>> if( /^psLayout/ ) { # if there is a header
>> for( 1..4 ) { <> } # take next 4 lines to skip the header
>> }
>> my @line = split;
>> my ( $matches,$mismatches,$rep_matches,$n_count,
>> $q_num_insert,$q_base_insert,
>> $t_num_insert, $t_base_insert,
>> $strand, $q_name, $q_length, $q_start,
>> $q_end, $t_name, $t_length,$t_start, $t_end,
>> $block_count,
>> $block_sizes, $q_starts, $t_starts
>> ) = split;
>>
>> # query aln vals are $q_start, and $q_end values
>> # hit aln vals are $t_start, $t_end }
>>
>> On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
>>
>>> Hi,
>>>
>>> I am using BLAT in a project.I am having simple .psl output files
>>> after running BLAT of a gene sequences against full chromosomal
>>> sequences.Doesanyone have a simple BLAT parse code. I am only
>>> interested in obtaining the alignment start and end positions on the
>>> target.
>>> --
>>> -Neeti
>>> Even my blood says, B positive
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list