[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit
Aaron J. Mackey
amackey at pcbi.upenn.edu
Fri Jul 16 07:16:54 EDT 2004
On second (or third) thought, perhaps it would be better to formulate
it like this:
# oops, just read the first line of fasta record ...
my $fasta = $line;
while ( # still fasta, not gff # ) {
$fasta .= $line;
}
# convert to seq:
$seq = Bio::SeqIO->new(-fh => IO::String->new($fasta), -format =>
"fasta")->next_seq;
Otherwise, no matter how we fiddle with _readline, it's going to be
ugly to "share" $self->{_readline} between two distinct objects.
-Aaron
On Jul 15, 2004, at 7:00 PM, Aaron J Mackey wrote:
>
> On Thu, 15 Jul 2004, Chris Mungall wrote:
>
>> However, the fasta parser sets the input record seperator $/=">\n",
>> so I
>> actually have to read in up to but NOT including the next ^\> (or end
>> of
>> file). Which means I actually have to switch $/ within the GFF parser!
>
> Hmm, doesn't it switch to "\n>" you mean? Regardless, why should you
> have to worry about it? You _pushback, you send off to the next
> parser; if it changes what $/ is, then Bio::Root::IO::_readline (or
> maybe just a fasta.pm overriden version) could/should be savvy to it
> (comments from the gallery?):
>
> Index: IO.pm
> ===================================================================
> RCS file: /home/repository/bioperl/bioperl-live/Bio/Root/IO.pm,v
> retrieving revision 1.51
> diff -r1.51 IO.pm
> 420,422c420,422
> < Note also that the current implementation does not handle
> pushed
> < back input correctly unless the pushed back input ends
> with the
> < value of $/.
> ---
>> Note also that the current implementation does handle
>> pushed back input correctly when the pushed back input
>> doesn't end with whatever is the local value of $/.
> 441a442,455
>>
>> # If $/ has changed since the push back occurred, we may need to
>> # adjust the buffering ...
>> if (defined($line) && defined($/) && $line =~ m!$/!) {
>> # $/ is defined (not in file-slurp mode); does our current
>> # line have too much stuff already?
>> if (length($')) {
>> $line = "$`$/";
>> unshift @{$self->{'_readbuffer'}}, $';
>> }
>> } elsif (!eof($fh)) {
>> # need to read some more ...
>> $line .= <$fh>;
>> }
>
>
>> The simple solution is to force everyone to preceed the fasta section
>> with
>> a ##FASTA directive - however, the spec says this is optional.
>
> Nah, the simple solution is to fix BioPerl ;)
>
>> Of course, I could just go back to my own 8-line fasta parsing code
>> within GFF.pm.....
>
> No, then you'd need to worry about it keeping in sync with
> SeqIO/fasta.pm, which is what we're trying to avoid, if possible.
>
> I repeat: thanks for the hard work!
>
> -Aaron
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania email: amackey at pcbi.upenn.edu
415 S. University Avenue office: 215-898-1205
Philadelphia, PA 19104-6017 fax: 215-746-6697
More information about the Bioperl-l
mailing list