[Bioperl-l] Bio::SeqIO::new possible wierdness

Chris Dagdigian dag at bioteam.net
Fri Jan 30 09:43:51 EST 2004



We've tried over and over again to get good search going for open-bio.org. 
First we tried ht://dig and then I tried integrating it directly into the 
mailman archives (lots of python hacking).

Nothing works well. htdig takes forever to run, eats disk space and still 
does not provide greate results. 

Right now google is probably best. we may want to discontinue all of our 
own search methods. Or we buy the google appliance thingie that is meant 
for corporate intranets :)


-chris


On Fri, 30 Jan 2004, Jason Stajich wrote:

> I dunnno then - Chris has graciously set it up, either ht://dig is not
> doing its job very well or there something mis-configured. We Have tried
> to make the lists searchable at http://search.open-bio.org/ if it isn't
> working properly that is another issue.  google +
> site:open-bio.org pipermail bioperl-l your-term
> also works pretty well.
> 
> It really is a major job making sure all of the website/cvs/server
> components work correctly all the time.  I wish there was a way to give
> Chris more of hand on these things, as he has a full-time consulting gig
> to keep him around in the first place.
> 
> --jason
> 
> On Wed, 28 Jan 2004, Brian Osborne wrote:
> 
> > Jason,
> >
> > I'm a bit suspicious of search.open-bio.org. I enter a term like 'Root' or
> > 'GFF' and get back a dozen hits or so. It's inconceivable to me that there's
> > only 12 messages in bioperl-l since 1999 containing the string 'GFF'.
> > Something's wrong, either with the search or the display. And if there are
> > no matches I see only a blank page, which is a bit inscrutable. Then if I
> > select 'no restriction', which I guess means everything in the selectable
> > list I don't see the Bioperl matches anymore, I just see a dozen or so
> > Biojava matches.
> >
> > Brian O.
> >
> > -----Original Message-----
> > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Jason Stajich
> > Sent: Wednesday, January 28, 2004 4:34 PM
> > To: Peter van Heusden
> > Cc: bioperl-l at bioperl.org
> > Subject: Re: [Bioperl-l] Bio::SeqIO::new possible wierdness
> >
> > The bioperl list is searchable - just not the bioperl-guts though -
> > http://search.open-bio.org
> > and/or google works fine for me
> >
> >
> > This is the change Lincoln made though (cvs log on Bio/Root/IO.pm
> > and found the last commit by lincoln).  I had put the \*ARGV in there so
> > that we could use the magic <> operator (allows STDIN or a list of files
> > to all be used as transparent input).  This caused some problems with
> > tests in GFF, SeqFeature, or Registry.
> >
> > Here is his log message
> > revision 1.50
> > date: 2003/11/21 03:03:38;  author: lstein;  state: Exp;  lines: +2 -2
> > The following regression tests now pass: GFF, SeqFeature, Registry
> >
> > --jason
> >
> > jason at jason $ cvs diff -r 1.49 Bio/Root/IO.pm
> > Index: Bio/Root/IO.pm
> > ===================================================================
> > RCS file: /home/repository/bioperl/bioperl-live/Bio/Root/IO.pm,v
> > retrieving revision 1.49
> > diff -r1.49 IO.pm
> > 1c1
> > < # $Id: IO.pm,v 1.49 2003/10/28 21:58:54 jason Exp $
> > ---
> > > # $Id: IO.pm,v 1.50 2003/11/21 03:03:38 lstein Exp $
> > 435c435
> > <     my $fh = $self->_fh || \*ARGV;
> > ---
> > >     my $fh = $self->_fh or return;
> >
> >
> > On Wed, 28 Jan 2004, Peter van Heusden wrote:
> >
> > > Jason Stajich wrote:
> > >
> > > >On Wed, 28 Jan 2004, Donald G. Jackson wrote:
> > > >
> > > >
> > > >
> > > >>Personally, I like the fall-back but agree that $ARGV[0] shouldn't be
> > it.
> > > >>I'd suggest STDIN - if somebody calls new without a file/handle I think
> > > >>they're more likely to be reading.  OTOH, guessing format woud be tough.
> > > >>
> > > >>
> > > >
> > > >the guess format is trying to read off the top of the file I think - we
> > > >support a 'peek' type of reading into the file, by having the _pushback
> > > >functionality in Root::IO.
> > > >
> > > >I would like to see something like this go into Root:IO rather than in
> > > >SeqIO - and have Root::IO give back a filename if it knows what it is.
> > > >
> > > >Also the Root::IO code could also do something like this:
> > > > $file = "-" unless defined $file;
> > > > open my $fh => $input or die $!;
> > > >
> > > >Which will then read from stdin if now filename is sent in - right now we
> > > >don't really support that anymore because it was causing clog-ups in some
> > > >of the DB::GFF code/tests I think.
> > > >
> > > >Maybe we localize this to 'FormattedReaderWriters' -- all the
> > > >XXXIO(-format => 'XXX') modules so as to avoid the problems Lincoln saw.
> > > >
> > > >
> > > >
> > > >
> > > Can you to where Lincoln "saw" this problem? The BioPerl mailing list
> > > archive is not searchable, and searching via Google doesn't turn
> > > anything up.
> > >
> > > Anyway, I'll look into Root::IO tomorrow and see what I come up with.
> > >
> > > Peter
> > >
> > > >
> > > >
> > > >>At the very least a warning would be appropriate, perhaps indicating the
> > > >>course of action.
> > > >>
> > > >>For xml handlers we can check the dtd and throw an error.  I will modify
> > > >>my SeqIO::tinyseq::tinyseqHandler to do so.
> > > >>
> > > >>Don Jackson
> > > >>
> > > >>
> > > >>
> > > >>Peter van Heusden wrote:
> > > >>
> > > >>
> > > >>
> > > >>>My review of the Bio::SeqIO::new method shows the following behaviour:
> > > >>>
> > > >>>Missing both ?file and ?fh arguments: falls back to using $ARGV[0]
> > > >>>(the first command line argument) as sequence filename. If this fails,
> > > >>>gives an exception about ?Unknown format?.
> > > >>>-file argument (without ?fh argument):
> > > >>>? given, but file unreadable: throws exception
> > > >>>? undefined: reads $ARGV[0], as above.
> > > >>>-fh argument (without ?file argument):
> > > >>>? given, but not a filehandle: gives exception
> > > >>>? given, but an invalid filehandle (not open): gives exception
> > > >>>? undefined: reads $ARGV[0], as above.
> > > >>>-format argument: if the sequence file doesn?t correspond to the given
> > > >>>format, some parsers give an error (e.g. EMBL), while others do not
> > > >>>(GenBank), instead silently give wrong results.
> > > >>>-format argument without ?file argument: Silently creates a SeqIO
> > > >>>object which writes to STDOUT.
> > > >>>
> > > >>>I don't think that this $ARGV[0] shortcut should be in there - it
> > > >>>causes unnecessary potential confusion. Imagine a situation where -fh
> > > >>>or -file is specified (using a variable), but that variable somehow
> > > >>>does not get defined. In that case, the $ARGV[0] fallback behaviour
> > > >>>would be used, which might lead to a non-obvious error behaviour.
> > > >>>
> > > >>>I'd like to propose that either -file or -fh should be specified,
> > > >>>otherwise an exception is thrown. While I'm about it, I'm thinking of
> > > >>>migrating the exceptions to the new 'typed exceptions' that BioPerl
> > > >>>now provides - is there any consensus on exception type names?
> > > >>>
> > > >>>Peter
> > > >>>_______________________________________________
> > > >>>Bioperl-l mailing list
> > > >>>Bioperl-l at portal.open-bio.org
> > > >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>
> > > >>>
> > > >>>
> > > >>_______________________________________________
> > > >>Bioperl-l mailing list
> > > >>Bioperl-l at portal.open-bio.org
> > > >>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >>
> > > >>
> > > >>
> > > >
> > > >--
> > > >Jason Stajich
> > > >Duke University
> > > >jason at cgt.mc.duke.edu
> > > >
> > > >_______________________________________________
> > > >Bioperl-l mailing list
> > > >Bioperl-l at portal.open-bio.org
> > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> 

-- 
Chris Dagdigian, <dag at sonsorol.org>
BioTeam Inc. - Independent Bio-IT & Informatics consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net



More information about the Bioperl-l mailing list