[Bioperl-l] Bio::SeqIO::new possible wierdness

Jason Stajich jason at cgt.duhs.duke.edu
Wed Jan 28 16:33:45 EST 2004


The bioperl list is searchable - just not the bioperl-guts though -
http://search.open-bio.org
and/or google works fine for me


This is the change Lincoln made though (cvs log on Bio/Root/IO.pm
and found the last commit by lincoln).  I had put the \*ARGV in there so
that we could use the magic <> operator (allows STDIN or a list of files
to all be used as transparent input).  This caused some problems with
tests in GFF, SeqFeature, or Registry.

Here is his log message
revision 1.50
date: 2003/11/21 03:03:38;  author: lstein;  state: Exp;  lines: +2 -2
The following regression tests now pass: GFF, SeqFeature, Registry

--jason

jason at jason $ cvs diff -r 1.49 Bio/Root/IO.pm
Index: Bio/Root/IO.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/Root/IO.pm,v
retrieving revision 1.49
diff -r1.49 IO.pm
1c1
< # $Id: IO.pm,v 1.49 2003/10/28 21:58:54 jason Exp $
---
> # $Id: IO.pm,v 1.50 2003/11/21 03:03:38 lstein Exp $
435c435
<     my $fh = $self->_fh || \*ARGV;
---
>     my $fh = $self->_fh or return;


On Wed, 28 Jan 2004, Peter van Heusden wrote:

> Jason Stajich wrote:
>
> >On Wed, 28 Jan 2004, Donald G. Jackson wrote:
> >
> >
> >
> >>Personally, I like the fall-back but agree that $ARGV[0] shouldn't be it.
> >>I'd suggest STDIN - if somebody calls new without a file/handle I think
> >>they're more likely to be reading.  OTOH, guessing format woud be tough.
> >>
> >>
> >
> >the guess format is trying to read off the top of the file I think - we
> >support a 'peek' type of reading into the file, by having the _pushback
> >functionality in Root::IO.
> >
> >I would like to see something like this go into Root:IO rather than in
> >SeqIO - and have Root::IO give back a filename if it knows what it is.
> >
> >Also the Root::IO code could also do something like this:
> > $file = "-" unless defined $file;
> > open my $fh => $input or die $!;
> >
> >Which will then read from stdin if now filename is sent in - right now we
> >don't really support that anymore because it was causing clog-ups in some
> >of the DB::GFF code/tests I think.
> >
> >Maybe we localize this to 'FormattedReaderWriters' -- all the
> >XXXIO(-format => 'XXX') modules so as to avoid the problems Lincoln saw.
> >
> >
> >
> >
> Can you to where Lincoln "saw" this problem? The BioPerl mailing list
> archive is not searchable, and searching via Google doesn't turn
> anything up.
>
> Anyway, I'll look into Root::IO tomorrow and see what I come up with.
>
> Peter
>
> >
> >
> >>At the very least a warning would be appropriate, perhaps indicating the
> >>course of action.
> >>
> >>For xml handlers we can check the dtd and throw an error.  I will modify
> >>my SeqIO::tinyseq::tinyseqHandler to do so.
> >>
> >>Don Jackson
> >>
> >>
> >>
> >>Peter van Heusden wrote:
> >>
> >>
> >>
> >>>My review of the Bio::SeqIO::new method shows the following behaviour:
> >>>
> >>>Missing both ?file and ?fh arguments: falls back to using $ARGV[0]
> >>>(the first command line argument) as sequence filename. If this fails,
> >>>gives an exception about ?Unknown format?.
> >>>-file argument (without ?fh argument):
> >>>? given, but file unreadable: throws exception
> >>>? undefined: reads $ARGV[0], as above.
> >>>-fh argument (without ?file argument):
> >>>? given, but not a filehandle: gives exception
> >>>? given, but an invalid filehandle (not open): gives exception
> >>>? undefined: reads $ARGV[0], as above.
> >>>-format argument: if the sequence file doesn?t correspond to the given
> >>>format, some parsers give an error (e.g. EMBL), while others do not
> >>>(GenBank), instead silently give wrong results.
> >>>-format argument without ?file argument: Silently creates a SeqIO
> >>>object which writes to STDOUT.
> >>>
> >>>I don't think that this $ARGV[0] shortcut should be in there - it
> >>>causes unnecessary potential confusion. Imagine a situation where -fh
> >>>or -file is specified (using a variable), but that variable somehow
> >>>does not get defined. In that case, the $ARGV[0] fallback behaviour
> >>>would be used, which might lead to a non-obvious error behaviour.
> >>>
> >>>I'd like to propose that either -file or -fh should be specified,
> >>>otherwise an exception is thrown. While I'm about it, I'm thinking of
> >>>migrating the exceptions to the new 'typed exceptions' that BioPerl
> >>>now provides - is there any consensus on exception type names?
> >>>
> >>>Peter
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at portal.open-bio.org
> >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at portal.open-bio.org
> >>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >>
> >
> >--
> >Jason Stajich
> >Duke University
> >jason at cgt.mc.duke.edu
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list