[Bioperl-l] Bio::SeqIO::new possible wierdness

Jason Stajich jason at cgt.duhs.duke.edu
Wed Jan 28 14:51:42 EST 2004


On Wed, 28 Jan 2004, Donald G. Jackson wrote:

> Personally, I like the fall-back but agree that $ARGV[0] shouldn't be it.
> I'd suggest STDIN - if somebody calls new without a file/handle I think
> they're more likely to be reading.  OTOH, guessing format woud be tough.

the guess format is trying to read off the top of the file I think - we
support a 'peek' type of reading into the file, by having the _pushback
functionality in Root::IO.

I would like to see something like this go into Root:IO rather than in
SeqIO - and have Root::IO give back a filename if it knows what it is.

Also the Root::IO code could also do something like this:
 $file = "-" unless defined $file;
 open my $fh => $input or die $!;

Which will then read from stdin if now filename is sent in - right now we
don't really support that anymore because it was causing clog-ups in some
of the DB::GFF code/tests I think.

Maybe we localize this to 'FormattedReaderWriters' -- all the
XXXIO(-format => 'XXX') modules so as to avoid the problems Lincoln saw.



> At the very least a warning would be appropriate, perhaps indicating the
> course of action.
>
> For xml handlers we can check the dtd and throw an error.  I will modify
> my SeqIO::tinyseq::tinyseqHandler to do so.
>
> Don Jackson
>
>
>
> Peter van Heusden wrote:
>
> > My review of the Bio::SeqIO::new method shows the following behaviour:
> >
> > Missing both –file and –fh arguments: falls back to using $ARGV[0]
> > (the first command line argument) as sequence filename. If this fails,
> > gives an exception about ‘Unknown format’.
> > -file argument (without –fh argument):
> > · given, but file unreadable: throws exception
> > · undefined: reads $ARGV[0], as above.
> > -fh argument (without –file argument):
> > · given, but not a filehandle: gives exception
> > · given, but an invalid filehandle (not open): gives exception
> > · undefined: reads $ARGV[0], as above.
> > -format argument: if the sequence file doesn’t correspond to the given
> > format, some parsers give an error (e.g. EMBL), while others do not
> > (GenBank), instead silently give wrong results.
> > -format argument without –file argument: Silently creates a SeqIO
> > object which writes to STDOUT.
> >
> > I don't think that this $ARGV[0] shortcut should be in there - it
> > causes unnecessary potential confusion. Imagine a situation where -fh
> > or -file is specified (using a variable), but that variable somehow
> > does not get defined. In that case, the $ARGV[0] fallback behaviour
> > would be used, which might lead to a non-obvious error behaviour.
> >
> > I'd like to propose that either -file or -fh should be specified,
> > otherwise an exception is thrown. While I'm about it, I'm thinking of
> > migrating the exceptions to the new 'typed exceptions' that BioPerl
> > now provides - is there any consensus on exception type names?
> >
> > Peter
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu



More information about the Bioperl-l mailing list