[Bioperl-l] Bio::Tools::Glimmer
Chris Fields
cjfields at uiuc.edu
Wed Feb 7 18:04:33 UTC 2007
On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote:
> Well, each format has some unique features. If the user
> declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice. I'll take a look.
> I can do all the parsing in one function, in fact I have, just
> to see
> how nasty it would end up being. I just can't stomach having the
> code that
> tightly coupled and hard to read. In the end it'll probably be three
> functions. GlimmermM/HMM are pretty close. Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.
I don't see a problem with passing off the parse to a defined class
method either right off or mid-parse. I'm doing something like this
with a revamped GenBank parser:
# declare local to module
my %GLIMMER_METHODS = (
'GlimmerHMM' => '_parsehmm',
'Glimmer' => '_parsenormal',
....others if needed
'_DEFAULT_' => '_parseabnormal'
);
...
Then either preparse part of file using _readline() to determine
format, or use -format and bypass preparsing:
sub next_thingy {
...
if (!$format) {
while (my $line = $self->_readline()) {
if ($line =~ m{(something)}) {
$format = $1; $self->_pushback($line); last;
}
}
}
my $method = (exists $GLIMMER_METHODS($format)) ?
$GLIMMER_METHODS($format) :
($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one
return $self->$method() # hand off parsing flow to to proper parser
...
}
# all parser variants would have this structure:
sub _parsehmm {
my $self = shift;
... init stuff here
while (my $line = $self->_readline()) {
... do stuff until END of next prediction/report
}
... return data if any
}
chris
> On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> I definitely vote for 1) - worst case you have 4 separate methods
>> if there
>> is no good way to condense the parsing for each format and require
>> the user
>> to specify the format.
>>
>> I have no problem with requiring user to specify what program she
>> used -
>> if we can be fancy and guess the format later (i.e. guess format
>> in SeqIO)
>> -then that's icing.
>>
>> -jason
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list