[Bioperl-l] need BLAT parse code

Jason Stajich jason.stajich at duke.edu
Tue Nov 29 11:41:16 EST 2005


You probably need to change this line,
>  for( 1..4 ) { <> }
to
>  for( 1..4 ) { <FH> }

If your BLAT result file doesn't start with psLayout then you need to  
make move that if statement BEFORE the while(<FH>) loop starts, it is  
stripping off the header lines (and hence where the mysterious 'Q' is  
coming from in you output!).

You can also run blat to just return output without the header and  
you don't need those skipping steps at all.

 From the blat options:
  -noHead     suppress .psl header (so it's just a tab-separated file)


If parsing tab delimited files seems difficult go buy a book on  
programming perl or read some of the myriad of freely available  
online documentation and read about the split function.

-jason

On Nov 29, 2005, at 1:27 AM, neeti somaiya wrote:

> I use the following code :
>
> open(FH,"output.psl");
> while(<FH>)
> {
>     if( /^psLayout/ )
>     {
>           for( 1..4 ) { <FH> }
>       }
>      my @line = split;
>      my ( $matches,$mismatches,$rep_matches,$n_count,
>             $q_num_insert,$q_base_insert,
>             $t_num_insert, $t_base_insert,
>             $strand, $q_name, $q_length, $q_start,
>             $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
>             $block_sizes,  $q_starts,      $t_starts
>             ) = split;
>
>
>       print $t_start;
>       print "\n";
>       print $t_end;
>
> }
>
> for output.psl file :
>
> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap    
> strand  Q               Q       Q       Q       T                
> T       T       T       block   blockSizes      qStarts  tStarts
>         match   match           count   bases   count    
> bases           name            size    start   end      
> name            size    start   end     count
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> -------------------
> 27025   0       0       0       0       0       0       0        
> +       query_sequence3 27025   0       27025    
> database_sequence3      57701691        132995  160020  1        
> 27025,  0,      132995,
> ~
>
>
> It gave me output :
>
> Q
> Q
>
> 132995
> 160020
>
> What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
>
> Please let me know.
> Thanks.
>
> On 11/28/05, Jason Stajich <jason.stajich at duke.edu> wrote:
> Bio::SearchIO::psl can parse psl output.
>
> or more simply:
>
> while(<>) {
>    if( /^psLayout/ ) { # if there is a header
>    for( 1..4 ) { <> }  # take next 4 lines to skip the header
>    }
>   my @line = split;
>   my ( $matches,$mismatches,$rep_matches,$n_count,
>              $q_num_insert,$q_base_insert,
>              $t_num_insert, $t_base_insert,
>              $strand, $q_name, $q_length, $q_start,
>              $q_end, $t_name, $t_length,$t_start, $t_end,  
> $block_count,
>              $block_sizes,  $q_starts,      $t_starts
>              ) = split;
>
>   #  query aln vals are  $q_start, and $q_end values
>   # hit aln vals are $t_start, $t_end
> }
>
> On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > I am using BLAT in a project.I am having simple .psl output files
> > after
> > running BLAT of a gene sequences against full chromosomal
> > sequences.Doesanyone have a simple BLAT parse code. I am only
> > interested in obtaining the
> > alignment start and end positions on the target.
> > --
> > -Neeti
> > Even my blood says, B positive
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>
>
>
> -- 
> -Neeti
> Even my blood says, B positive

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




More information about the Bioperl-l mailing list