[Bioperl-l] Sim4 output parsing using Bioperl
Jason Stajich
jason at cgt.duhs.duke.edu
Tue Jul 1 17:53:44 EDT 2003
Now that you've described the problem, the current code can't distinguish
between an empty report and the end of the file, each loop iteration
while( my $exonset = $parser->next_exonset ) {
}
will end whenever it hits an empty report.
You can short circuit this if you know ahead of time how many reports
there are in your file:
for( my $i = 0; $i < $maxnum; $i++ ){
my $exonset = $parser->next_exonset;
# OR
# my @exons = $parser->parse_next_alignment;
}
The better solution is for the Sim4 parser to return gene objects (which I
thought it did) which will have 0 exons on a no alignment report, but will
return undef when it gets to the end of the report.
That's the best I got right now.
I have written SearchIO::sim4 which doesn't have this problem, but is only
on the main trunk (checkout via CVS) but it returns Bio::Search objects
not Exons.
-jason
On Tue, 1 Jul 2003, Arnaud Kerhornou wrote:
> Selon Jason Stajich <jason at cgt.duhs.duke.edu>:
> >
> > > --->
> > > 138736-138759 (192-218) 75%
> > > <---
> > >
> > > If this line is missing, it stops.
> > >
> > > Am I misusing the parser ?
> >
> > If the line is missing then there is no alignment so it can't build a
> > gene/exon for you.
>
> The sim4 output reports the (potential) alignments against an ESTs database. If
> the first EST doesn't align with the genomic sequence, it can't build an exon
> but the parsing should carry on and parse the information about the second EST
> and so on, until the all set of ESTs data have been processed.
>
> Arnaud
>
> > -jason
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> >
>
>
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list