[Bioperl-l] Bio::Tools::Glimmer

Chris Fields cjfields at uiuc.edu
Wed Feb 7 18:04:33 UTC 2007


On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote:

>     Well, each format has some unique features.  If the user  
> declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just  
> to see
> how nasty it would end up being.  I just can't stomach having the  
> code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.

I don't see a problem with passing off the parse to a defined class  
method either right off or mid-parse.  I'm doing something like this  
with a revamped GenBank parser:

# declare local to module

my %GLIMMER_METHODS = (
     'GlimmerHMM' => '_parsehmm',
     'Glimmer'  => '_parsenormal',
     ....others if needed
     '_DEFAULT_' => '_parseabnormal'
);

...

Then either preparse part of file using _readline() to determine  
format, or use -format and bypass preparsing:

sub next_thingy {
    ...
    if (!$format) {
        while (my $line = $self->_readline()) {
            if ($line =~ m{(something)}) {
                $format = $1; $self->_pushback($line); last;
            }
        }
    }
    my $method =  (exists $GLIMMER_METHODS($format)) ?  
$GLIMMER_METHODS($format) :
                  ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one

    return $self->$method() # hand off parsing flow to to proper parser
    ...
}

# all parser variants would have this structure:

sub _parsehmm {
    my $self = shift;
    ... init stuff here
    while (my $line = $self->_readline()) {
        ... do stuff until END of next prediction/report
    }
    ... return data if any
}

chris

> On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> I definitely vote for 1) - worst case you have 4 separate methods  
>> if there
>> is no good way to condense the parsing for each format and require  
>> the user
>> to specify the format.
>>
>> I have no problem with requiring user to specify what program she  
>> used -
>> if we can be fancy and guess the format later (i.e. guess format  
>> in SeqIO)
>> -then that's icing.
>>
>> -jason
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list