[Bioperl-l] need BLAT parse code
Jason Stajich
jason.stajich at duke.edu
Tue Nov 29 11:41:16 EST 2005
You probably need to change this line,
> for( 1..4 ) { <> }
to
> for( 1..4 ) { <FH> }
If your BLAT result file doesn't start with psLayout then you need to
make move that if statement BEFORE the while(<FH>) loop starts, it is
stripping off the header lines (and hence where the mysterious 'Q' is
coming from in you output!).
You can also run blat to just return output without the header and
you don't need those skipping steps at all.
From the blat options:
-noHead suppress .psl header (so it's just a tab-separated file)
If parsing tab delimited files seems difficult go buy a book on
programming perl or read some of the myriad of freely available
online documentation and read about the split function.
-jason
On Nov 29, 2005, at 1:27 AM, neeti somaiya wrote:
> I use the following code :
>
> open(FH,"output.psl");
> while(<FH>)
> {
> if( /^psLayout/ )
> {
> for( 1..4 ) { <FH> }
> }
> my @line = split;
> my ( $matches,$mismatches,$rep_matches,$n_count,
> $q_num_insert,$q_base_insert,
> $t_num_insert, $t_base_insert,
> $strand, $q_name, $q_length, $q_start,
> $q_end, $t_name, $t_length,$t_start, $t_end, $block_count,
> $block_sizes, $q_starts, $t_starts
> ) = split;
>
>
> print $t_start;
> print "\n";
> print $t_end;
>
> }
>
> for output.psl file :
>
> match mis- rep. N's Q gap Q gap T gap T gap
> strand Q Q Q Q T
> T T T block blockSizes qStarts tStarts
> match match count bases count
> bases name size start end
> name size start end count
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -------------------
> 27025 0 0 0 0 0 0 0
> + query_sequence3 27025 0 27025
> database_sequence3 57701691 132995 160020 1
> 27025, 0, 132995,
> ~
>
>
> It gave me output :
>
> Q
> Q
>
> 132995
> 160020
>
> What is the Q? Cant I obtain the coordinates (132995, 160020) alone?
>
> Please let me know.
> Thanks.
>
> On 11/28/05, Jason Stajich <jason.stajich at duke.edu> wrote:
> Bio::SearchIO::psl can parse psl output.
>
> or more simply:
>
> while(<>) {
> if( /^psLayout/ ) { # if there is a header
> for( 1..4 ) { <> } # take next 4 lines to skip the header
> }
> my @line = split;
> my ( $matches,$mismatches,$rep_matches,$n_count,
> $q_num_insert,$q_base_insert,
> $t_num_insert, $t_base_insert,
> $strand, $q_name, $q_length, $q_start,
> $q_end, $t_name, $t_length,$t_start, $t_end,
> $block_count,
> $block_sizes, $q_starts, $t_starts
> ) = split;
>
> # query aln vals are $q_start, and $q_end values
> # hit aln vals are $t_start, $t_end
> }
>
> On Nov 28, 2005, at 8:06 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > I am using BLAT in a project.I am having simple .psl output files
> > after
> > running BLAT of a gene sequences against full chromosomal
> > sequences.Doesanyone have a simple BLAT parse code. I am only
> > interested in obtaining the
> > alignment start and end positions on the target.
> > --
> > -Neeti
> > Even my blood says, B positive
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>
>
>
> --
> -Neeti
> Even my blood says, B positive
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list