From M.W.E.J.Fiers@plant.wag-ur.nl Tue Jan 2 12:52:57 2001 Date: Tue, 02 Jan 2001 13:52:57 +0100 From: Fiers, M.W.E.J. M.W.E.J.Fiers@plant.wag-ur.nl Subject: [Bioperl-l] Computation object
Hi Concerning the computation.pm object; I've seem to have made a rather stupid mistake, I seem to have failed to do an actual commit last time. So I've given it another try. If somebody feels like it, please take a look. I didn't implement the structure Ewan proposed. If people like my implementation of this object, I will do it. Mark Fiers Plant Research InternationalFrom jason@chg.mc.duke.edu Tue Jan 2 15:58:18 2001 Date: Tue, 2 Jan 2001 10:58:18 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] call for more tests
In the continued effort to check every module in our distribution before 0.7 is released. I wondered if anyone does use Bio::SeqIO::scf? I need some test files for it. Thanks. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From jason@chg.mc.duke.edu Tue Jan 2 17:19:38 2001 Date: Tue, 2 Jan 2001 12:19:38 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] test framework
while I'm messing with it, does anyone have objections to using the built in perl Test module available since perl 5.004 rather than our I agree it is wasted time to constantly move things from one test suite to another ( I already tried to standardize our existing ones as best as possible). But a nice standard makes it easier for new people to write tests and make them fit. Any comments? sub test ($$;$) { my($num, $true,$msg) = @_; print($true ? "ok $num\n" : "not ok $num $msg\n"); } [ from perldoc Test ] use strict; use Test; # use a BEGIN block so we print our plan before MyModule is loaded BEGIN { plan tests => 14, todo => [3,4] } # load your module... use MyModule; ok(0); # failure ok(1); # success ok(0); # ok, expected failure (see todo list, above) ok(1); # surprise success! ok(0,1); # failure: '0' ne '1' ok('broke','fixed'); # failure: 'broke' ne 'fixed' ok('fixed','fixed'); # success: 'fixed' eq 'fixed' ok('fixed',qr/x/); # success: 'fixed' =~ qr/x/ ok(sub { 1+1 }, 2); # success: '2' eq '2' ok(sub { 1+1 }, 3); # failure: '2' ne '3' ok(0, int(rand(2)); # (just kidding :-) my @list = (0,0); ok @list, 3, "\@list=".join(',',@list); #extra diagnostics ok 'segmentation fault', '/(?i)success/'; #regex match skip($feature_is_missing, ...); #do platform specific test Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From krbou@pgsgent.be Tue Jan 2 21:21:36 2001 Date: Tue, 2 Jan 2001 22:21:36 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SWISS-PROT writing
[ I know there are some specialists on SWISS-PROT on this list, so I might make a fool of me, but here goes ] When chasing down the reason why swiss.pm was not able to read a SWISS-PROT formatted file it wrote itself I found the following things which look suspicious in write_seq() - at line 356 there is $mol = $seq->molecule; I think this should be $seq->moltype; as ->molecule only looks for {'molecule'} which is not set by ->new. Bio::Seq->new only sets {'moltype'}. We should change the 'protein' of ->moltype to 'PRT' to conform to the standard. B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA sequences ? - around line 369 the whole else block should be changed. We should make sure we have a division ($div) in the ID part. The previous version of the code which is now commented out did a better try at this. Looking at next_seq() we why we're not able to read this (entry name must contain an underscore section 3.1.1 of the SWISS-PROT manual). $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/ || $self->throw("swissprot stream with no ID. Not swissprot in my book"); $name = $1."_".$2; $seq->primary_id($1); $seq->division($2); How standard compliant do we want to be with this. If we want to be very strict we should e.g. make sure the 'entry name' (first item on the ID line) is not more then 10 characters. P.S. (very) minor issue: the division we choose 'UNK' for sequences which don't have a division set is not in the standard (speclist.txt), it only contains UNKP Should I try to adopt swiss.pm to the thoughts I (tried to) put out or are there major objections ? Kris,From lapp@gnf.org Tue Jan 2 23:45:28 2001 Date: Tue, 02 Jan 2001 15:45:28 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] SWISS-PROT writing
Kris Boulez wrote: > > > - at line 356 there is > $mol = $seq->molecule; > I think this should be $seq->moltype; as ->molecule only looks for > {'molecule'} which is not set by ->new. Bio::Seq->new only sets > {'moltype'}. > We should change the 'protein' of ->moltype to 'PRT' to conform to the > standard. moltype() is internal to BioPerl. Whenever there is an attribute synonymous to moltype() but defined by a databank, molecule() should be used for that. So the code is correct I think. Bio::Seq->new() indeed only sets moltype(), because at this point there is no databank specificity. molecule() should be set by the parser. If you want to instantiate a swissprot seq from memory and have it written in swissprot format, the way we want to go is have dedicated classes under Bio::Seq::*. If there is need for a swissprot-dedicated class, that one probably would also set molecule() at instantiation time. > > B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA > sequences ? In my opinion there's no need for that, but others may think differently. > > - around line 369 the whole else block should be changed. We should make > sure we have a division ($div) in the ID part. The previous version of > the code which is now commented out did a better try at this. Looking at > next_seq() we why we're not able to read this (entry name must contain > an underscore section 3.1.1 of the SWISS-PROT manual). > > $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/ > || $self->throw("swissprot stream with no ID. Not swissprot in my > book"); > $name = $1."_".$2; > $seq->primary_id($1); > $seq->division($2); > If this is the code you're referring to (sorry, don't have at hand right now), it does ensure that there is a division part. I'm probably missing something. > How standard compliant do we want to be with this. If we want to be very > strict we should e.g. make sure the 'entry name' (first item on the ID > line) is not more then 10 characters. > > P.S. (very) minor issue: the division we choose 'UNK' for sequences > which don't have a division set is not in the standard (speclist.txt), > it only contains UNKP > Sure, can (should) be changed. > Should I try to adopt swiss.pm to the thoughts I (tried to) put out or > are there major objections ? > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects of that class. Apart from this, Lorenz may wish to comment. He's been our Swissprot cruncher for a while, but haven't heard from him for some time. Lorenz, still out there? Happy new year to all. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From schattner@alum.mit.edu Wed Jan 3 02:26:20 2001 Date: Tue, 02 Jan 2001 18:26:20 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] call for more tests
Jason Stajich wrote: > > In the continued effort to check every module in our distribution before > 0.7 is released. I wondered if anyone does use Bio::SeqIO::scf? I need > some test files for it. > Thanks. I can't help you with Bio::SeqIO::scf, but I can add a couple of other missing tests to your list: Bio::Tools::SeqPattern does not have a "t" file. (By the way, seq_pattern.pl in the examples directory crashes - I just submitted a bug report). Bio:Tools:SeqStats currently only has one very simple test (located in Tools.t) Previously there were several more tests that seem to have disappeared. I can upload the additional tests again if you like. PeterFrom schattner@alum.mit.edu Wed Jan 3 02:31:14 2001 Date: Tue, 02 Jan 2001 18:31:14 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] A couple of CVS questions.
A couple of CVS questions. 1. How can one access earlier releases of bioperl? I haven't been able to find them on CVS or elsewhere. Where should I be looking? 2. Some modules were moved to different directories within the CVS structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to be able to find the versions of the modules made prior to the date that the modules were moved. Can someone tell me if these older versions are accessible and if so how to find them. Thanks Peter SchattnerFrom lapp@gnf.org Wed Jan 3 04:16:02 2001 Date: Tue, 02 Jan 2001 20:16:02 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] A couple of CVS questions.
Peter Schattner wrote: > > A couple of CVS questions. > > 1. How can one access earlier releases of bioperl? I haven't been able > to find them on CVS or elsewhere. Where should I be looking? > You can checkout based on one of version, tag, or date. You very likely don't want to checkout a release by version, as each file has a different version. There is a tag for the 0.6.x release branch, and also for other releases. If you want to checkout the whole development trunk in an earlier version, the most sensible way is probably to go by date (option -D). For individual modules you can go either way. Do you have the manpages of cvs? They're actually poor compared to the info-files cvs comes with. On a Unix box with info installed you should be able to type 'info cvs'. > 2. Some modules were moved to different directories within the CVS > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to > Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to > be able to find the versions of the modules made prior to the date that > the modules were moved. Can someone tell me if these older versions are > accessible and if so how to find them. The files were moved without retaining the revision history (cvs is bad at file moving and renaming; you have to mess with the repository in order to have cvs history preserved in this case). The version at the former location was deleted, so you can restore it at the former place only. The file at the new location has lost all its revision information before the move. Hope this helps. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From dagdigian@ComputeFarm.com Wed Jan 3 06:12:04 2001 Date: Wed, 03 Jan 2001 01:12:04 -0500 From: Chris Dagdigian dagdigian@ComputeFarm.com Subject: [Bioperl-l] A couple of CVS questions.
ftp://bioperl.org/pub/DIST/ All of our old 'official' bioperl release tarballs can be found there. Regards, Chris At 06:31 PM 1/2/01 -0800, Peter Schattner wrote: >A couple of CVS questions. > >1. How can one access earlier releases of bioperl? I haven't been able >to find them on CVS or elsewhere. Where should I be looking?From krbou@pgsgent.be Wed Jan 3 07:29:43 2001 Date: Wed, 3 Jan 2001 08:29:43 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SWISS-PROT writing
Quoting Hilmar Lapp (lapp@gnf.org): > Kris Boulez wrote: > > > > > > - at line 356 there is > > $mol = $seq->molecule; > > I think this should be $seq->moltype; as ->molecule only looks for > > {'molecule'} which is not set by ->new. Bio::Seq->new only sets > > {'moltype'}. > > We should change the 'protein' of ->moltype to 'PRT' to conform to the > > standard. > > moltype() is internal to BioPerl. Whenever there is an attribute synonymous > to moltype() but defined by a databank, molecule() should be used for that. > So the code is correct I think. > Then documentation for Bio::Seq->molecule() should be extended a bit. It now reads molecule Title : molecule Usage : $obj->molecule($newval) Function: Returns : type of molecule (DNA, mRNA) Args : newvalue (optional) > Bio::Seq->new() indeed only sets moltype(), because at this point there is > no databank specificity. molecule() should be set by the parser. If you > want to instantiate a swissprot seq from memory and have it written in > swissprot format, the way we want to go is have dedicated classes under > Bio::Seq::*. If there is need for a swissprot-dedicated class, that one > probably would also set molecule() at instantiation time. > > > > > B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA > > sequences ? > > In my opinion there's no need for that, but others may think differently. > > > > > - around line 369 the whole else block should be changed. We should make > > sure we have a division ($div) in the ID part. The previous version of > > the code which is now commented out did a better try at this. Looking at > > next_seq() we why we're not able to read this (entry name must contain > > an underscore section 3.1.1 of the SWISS-PROT manual). > > > > $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/ > > || $self->throw("swissprot stream with no ID. Not swissprot in my > > book"); > > $name = $1."_".$2; > > $seq->primary_id($1); > > $seq->division($2); > > > > If this is the code you're referring to (sorry, don't have at hand right > now), it does ensure that there is a division part. I'm probably missing > something. > Sorry I wasn't clear on this one obviously. The code I pasted is from next_seq(). What I was referring to is the code in write_seq(). In there we do not enforce that there is a division part (I think we should at least check if $seq->display_id() returns an underscore in a reasonable position). The code reads } else { #$temp_line = sprintf ("%10s STANDARD; %3s; %d AA.", # $seq->primary_id()."_".$div,$mol,$len); # Reconstructing the ID relies heavily upon the input source # having # been in a format that is parsed as this routine expects it -- # that is, # by this module itself. This is bad, I think, and immediately # breaks # if e.g. the Bio::DB::GenPept module is used as input. # Hence, switch to display_id(); _every_ sequence is supposed to # have # this. HL 2000/09/03 $temp_line = sprintf ("%10s STANDARD; %3s; %d AA.", $seq->display_id(), $mol, $len); } > > How standard compliant do we want to be with this. If we want to be very > > strict we should e.g. make sure the 'entry name' (first item on the ID > > line) is not more then 10 characters. > > > > P.S. (very) minor issue: the division we choose 'UNK' for sequences > > which don't have a division set is not in the standard (speclist.txt), > > it only contains UNKP > > > > Sure, can (should) be changed. > > > Should I try to adopt swiss.pm to the thoughts I (tried to) put out or > > are there major objections ? > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > of that class. > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do you plan on having a Bio::Seq::* class for every (complex) sequence type ? Kris,From jason@chg.mc.duke.edu Wed Jan 3 14:17:01 2001 Date: Wed, 3 Jan 2001 09:17:01 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] A couple of CVS questions.
On Tue, 2 Jan 2001, Hilmar Lapp wrote: > Peter Schattner wrote: > > > > A couple of CVS questions. > > > > 1. How can one access earlier releases of bioperl? I haven't been able > > to find them on CVS or elsewhere. Where should I be looking? > > > > You can checkout based on one of version, tag, or date. You very likely > don't want to checkout a release by version, as each file has a different > version. There is a tag for the 0.6.x release branch, and also for other > releases. If you want to checkout the whole development trunk in an earlier > version, the most sensible way is probably to go by date (option -D). For > individual modules you can go either way. > > Do you have the manpages of cvs? They're actually poor compared to the > info-files cvs comes with. On a Unix box with info installed you should be > able to type 'info cvs'. > > > 2. Some modules were moved to different directories within the CVS > > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to > > Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to > > be able to find the versions of the modules made prior to the date that > > the modules were moved. Can someone tell me if these older versions are > > accessible and if so how to find them. > > The files were moved without retaining the revision history (cvs is bad at > file moving and renaming; you have to mess with the repository in order to > have cvs history preserved in this case). The version at the former > location was deleted, so you can restore it at the former place only. The > file at the new location has lost all its revision information before the > move. Many apologies, this was my stupidness for not moving the files the correct way. I wish I had waited for Hilmar's email.... Learned my lesson though.... I didn't realize we could move the RCS files (itchy trigger finger) before I moved the src files. If you look at the first date in Bio::Tools::Run::Alignment or Bio::Tools::StandAloneBlast you can see when the move occurred and then checkout with -D as some day or time before then. > > Hope this helps. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From birney@ebi.ac.uk Wed Jan 3 14:50:53 2001 Date: Wed, 3 Jan 2001 14:50:53 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] A couple of CVS questions.
On Wed, 3 Jan 2001, Jason Stajich wrote: > On Tue, 2 Jan 2001, Hilmar Lapp wrote: > > > Peter Schattner wrote: > > > > > > A couple of CVS questions. > > > > > > 1. How can one access earlier releases of bioperl? I haven't been able > > > to find them on CVS or elsewhere. Where should I be looking? > > > > > > > You can checkout based on one of version, tag, or date. You very likely > > don't want to checkout a release by version, as each file has a different > > version. There is a tag for the 0.6.x release branch, and also for other > > releases. If you want to checkout the whole development trunk in an earlier > > version, the most sensible way is probably to go by date (option -D). For > > individual modules you can go either way. > > > > Do you have the manpages of cvs? They're actually poor compared to the > > info-files cvs comes with. On a Unix box with info installed you should be > > able to type 'info cvs'. > > > > > 2. Some modules were moved to different directories within the CVS > > > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to > > > Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to > > > be able to find the versions of the modules made prior to the date that > > > the modules were moved. Can someone tell me if these older versions are > > > accessible and if so how to find them. > > > > The files were moved without retaining the revision history (cvs is bad at > > file moving and renaming; you have to mess with the repository in order to > > have cvs history preserved in this case). The version at the former > > location was deleted, so you can restore it at the former place only. The > > file at the new location has lost all its revision information before the > > move. > > Many apologies, this was my stupidness for not moving the files the > correct way. I wish I had waited for Hilmar's email.... Learned my > lesson though.... I didn't realize we could move the RCS files (itchy > trigger finger) before I moved the src files. If you look at the > first date in Bio::Tools::Run::Alignment or Bio::Tools::StandAloneBlast > you can see when the move occurred and then checkout with -D as some day > or time before then. It is, in my book, bad form to move the actual files. If you move files then CVS checkouts on old versions screw up with sometimes disasterous effects. The removal and cvs add is "The Right Way" tm in my book. > > > > > Hope this helps. > > > > Hilmar > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp@gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Wed Jan 3 17:20:17 2001 Date: Wed, 3 Jan 2001 12:20:17 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] named parameters
This is a bit on inconsistency when we specify parameters to new in some of the bioperl modules. Whenever we don't use named parameters (ie -file=> 'filename'), we are inconsistent with the fact that all modules inherit from Bio::Root::RootI. This is because Bio::Root::RootI will parse a couple of special parameters - specifically -verbose, -strict, -name, -obj, -record_err now we really don't use these that much, however, in the case of Bio::Species one would call my @classification = qw( sapiens Homo Hominidae Catarrhini Primates Eutheria Mammalia Vertebrata Chordata Metazoa Eukaryota ) my $sp = new Bio::Species(@classification); but if one also wanted debugging turned on, one might call this my $sp = new Bio::Species(-verbose=>1, @classification); This won't bother RootI, but Bio::Species expects all the parameters to be part of the classification array. A solution is to change Bio::Species to expect named parameters so an array ref is $sp = new Bio::Species(-verbose=>1, -classification => \@classification ); What are people's reactions to this? If we can agree that this is expected then we can add this to our programming conventions wiki page. -JasonFrom birney@ebi.ac.uk Wed Jan 3 17:31:25 2001 Date: Wed, 3 Jan 2001 17:31:25 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] test failures on main trunk
perl 5.004_04 is failing again. Some I can fix, others Peter/Jason might want to take a peek at. They are Failed Test Status Wstat Total Fail Failed List of failed ------------------------------------------------------------------------------- t/Clustalw.t 9 1 11.11% 4 t/DB.t 0 11 ?? ?? % ?? t/Index.t 2 512 8 3 37.50% 6-8 t/SeqFeature.t 21 ?? % ?? t/TCoffee.t 9 1 11.11% 4 Failed 5/48 test scripts, 89.58% okay. -1/594 subtests failed, 100.17% okay. make: *** [test_dynamic] Error 29 riker:~/src/bioperl-live> perl t/DB/ riker:~/src/bioperl-live> perl t/DB.t IO::String not installed. This means the Bio::DB::* modules are not usable. Skipping tests. 1..1 ok 1 Segmentation fault riker:~/src/bioperl-live> perl t/Clustalw.t 1..9 Clustalw program not found as /clustalw or not executable. Clustalw can be obtained from eg- http://corba.ebi.ac.uk/Biocatalog/Alignment_Search_software.html/ ok 1 -------------------- EXCEPTION -------------------- MSG: Unallowed parameter: NEW ! CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: t/Clustalw.t STACK: Bio::Tools::Run::Alignment::Clustalw::AUTOLOAD(308) main::t/Clustalw.t(52) --------------------------------------------------- riker:~/src/bioperl-live> perl t/SeqFeature.t 1..21 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 ok 10 ok 11 ok 12 ok 13 ok 14 ok 15 ok 16 ok 17 not ok 18 ok 19 not ok 20 ok 21 ok 22 ok 23 ok 24 ok 25 ok 26 ok 27 riker:~/src/bioperl-live> perl t/TCoffee.t 1..9 TCoffee program not found as /t_coffee or not executable. TCoffee can be obtained from eg- http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/t_coffee_home_page.html ok 1 -------------------- EXCEPTION -------------------- MSG: Unallowed parameter: NEW ! CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: t/TCoffee.t STACK: Bio::Tools::Run::Alignment::TCoffee::AUTOLOAD(561) main::t/TCoffee.t(55) --------------------------------------------------- I'll start to work on TCoffee/Clustalw... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Wed Jan 3 17:39:19 2001 Date: Wed, 3 Jan 2001 17:39:19 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] named parameters
On Wed, 3 Jan 2001, Jason Stajich wrote: > This is a bit on inconsistency when we specify parameters to new in some > of the bioperl modules. Whenever we don't use named parameters (ie > -file=> 'filename'), we are inconsistent with the fact that all modules > inherit from Bio::Root::RootI. This is because Bio::Root::RootI will > parse a couple of special parameters - specifically > -verbose, -strict, -name, -obj, -record_err > > now we really don't use these that much, however, in the case of > Bio::Species > > one would call > my @classification = qw( sapiens Homo Hominidae > Catarrhini Primates Eutheria > Mammalia Vertebrata Chordata > Metazoa Eukaryota ) > > my $sp = new Bio::Species(@classification); > > but if one also wanted debugging turned on, one might call this > my $sp = new Bio::Species(-verbose=>1, @classification); > > This won't bother RootI, but Bio::Species expects all the parameters to be > part of the classification array. > > A solution is to change Bio::Species to expect named parameters so an > array ref is > > $sp = new Bio::Species(-verbose=>1, -classification => \@classification ); > > What are people's reactions to this? If we can agree that this is > expected then we can add this to our programming conventions wiki page. I think we should stick to named parameters throughout and have it as a programming convention... > > -Jason > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Wed Jan 3 17:52:50 2001 Date: Wed, 3 Jan 2001 17:52:50 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] test failures on main trunk
Ok. My mistake - we are failing tests but not in the way that I described... TCoffee/ClustalW is waiting on RootI reorganisation, currently being led by Jason SeqFeature was a trivial addition of 21 --> 27 tests to run for the new computation object. Index has a weird dependancy on IO::String - why is this? Who needs IO::String in Index? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Wed Jan 3 17:53:53 2001 Date: Wed, 03 Jan 2001 09:53:53 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] A couple of CVS questions.
Ewan Birney wrote: > > It is, in my book, bad form to move the actual files. If you move files > then CVS checkouts on old versions screw up with sometimes disasterous > effects. > > The removal and cvs add is "The Right Way" tm in my book. > Well, I'm certainly not a CVS expert but when I wrote that you can move the repository files I only quoted the recommendation given in the CVS documentation (the info files that come with it). If you think applying this recommendation can have disastrous effects you should probably write to the CVS people to take this out of their documentation, or better yet, to put in a warning. I'm still not sure what could cause the disastrous effect, as the revision file does not keep any directory information (I may be wrong here though, but I haven't seen any dir info in such files yet), and there is no 'central database' that keeps track of which file is where. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From birney@ebi.ac.uk Wed Jan 3 17:57:17 2001 Date: Wed, 3 Jan 2001 17:57:17 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] A couple of CVS questions.
On Wed, 3 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > It is, in my book, bad form to move the actual files. If you move files > > then CVS checkouts on old versions screw up with sometimes disasterous > > effects. > > > > The removal and cvs add is "The Right Way" tm in my book. > > > > Well, I'm certainly not a CVS expert but when I wrote that you can > move the repository files I only quoted the recommendation given > in the CVS documentation (the info files that come with it). If > you think applying this recommendation can have disastrous effects > you should probably write to the CVS people to take this out of > their documentation, or better yet, to put in a warning. > > I'm still not sure what could cause the disastrous effect, as the > revision file does not keep any directory information (I may be > wrong here though, but I haven't seen any dir info in such files > yet), and there is no 'central database' that keeps track of which > file is where. Yeah, but then what happens is that in OldRelease (real) StableFile XX::YY says use AA:BB File AA::BB is there We now move AA:BB to CC:BB *in the repository* if we checkout the old release we get StableFile XX::YY says use AA:BB File AA::BB ** IS NOT THERE ** File CC::BB is there, but is named wrong! So it is ok from a cvs perspective, but it sucks from a code management perspective! if you cvs remove, cvs add this does not happen. Traditionally you put in your log on the cvs add that is has just come from XXXX, allowing people to track the history ... > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Wed Jan 3 18:19:48 2001 Date: Wed, 03 Jan 2001 10:19:48 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] A couple of CVS questions.
Ewan Birney wrote: > > Yeah, but then what happens is that in > > OldRelease (real) > > StableFile XX::YY says use AA:BB > File AA::BB is there > > We now move AA:BB to CC:BB *in the repository* > > if we checkout the old release we get > > StableFile XX::YY says use AA:BB > File AA::BB ** IS NOT THERE ** > File CC::BB is there, but is named wrong! > > So it is ok from a cvs perspective, but it sucks from a code management > perspective! > > if you cvs remove, cvs add this does not happen. Traditionally you put in > your log on the cvs add that is has just come from XXXX, allowing people > to track the history ... > I see. You could still copy the repository file to the new location, and then cvs remove it from the old. But then, you probably don't want people to be able to restore a previous version at a place where that version didn't sit. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Wed Jan 3 18:21:40 2001 Date: Wed, 03 Jan 2001 10:21:40 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] named parameters
Jason Stajich wrote: > > > A solution is to change Bio::Species to expect named parameters so an > array ref is > > $sp = new Bio::Species(-verbose=>1, -classification => \@classification ); > > What are people's reactions to this? If we can agree that this is > expected then we can add this to our programming conventions wiki page. > Yes, certainly. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From jason@chg.mc.duke.edu Wed Jan 3 19:12:07 2001 Date: Wed, 3 Jan 2001 14:12:07 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] RootI migration and other changes
Hilmar, Ewan, and I came up with the scheme for handling Bio::Root::RootI and all this obnoxious initializations. My apologies for not keeping the list more in the loop, but this was actually really boring. So I have checked in changes that should meet this new spec. There are some parts that were a little tricky, but all the tests pass so the behaviour appears to be consistent. In additione making the changes necessary for the move to a chained new rather than chained _initialize I revamped some modules that needed updating. Here is a summary to the best of my recollection. t/ - I updated some the tests on an ad hoc basis to using the perl Test module. more info on it perldoc Test. I hope this will make test writing even easier so that those interested can jump in and write a test (This might be a good way to get acquainted with a module if you are wanting to contribute to the project). Bio::Tools::Run - this new directory is for modules that serve as wrappers to call outside programs. We should try and have all modules that execute external programs residing in this dir or its subdirs. I added some code using File::Spec to standardize how pathnames to executeables are located. I am not sure if we can expect File::Spec to always be installed in a perl distribution (IT SHOULD BE!), so I revert back to the original way of constructing paths (assuming unix style directory separators '/'). Some cleaning up and standardization. Actually we need to write a module Bio::Tools::Run.pm that will serve as a framework for all modules that execute external programs. There is much code redundancy in these modules right now. Bio::Species - now use named parameter for classification this required updates to a test and some of the SeqIO modules. Bio::SeqFeature::* - I worked on Mark's Computation object a little to take advantage of inheritance, there are still some noises being made in t/SeqFeature with the new tests Ewan added so I'll try and track those down. I also did some work so that feature1 and feature2 of SimilarityPair always return something valid even if you have not initialized it. This was necessary because of the order parameters are set when a subclass is instantiated (ie look at the Bio::Tools::Sim4::Exon heirarchy and trace the calls to new() and you'll start to see what was happening). This was due to our move to chained new(), but it works now so no worries. Bio::AlignIO::clustalw - now supports read and writing of clustalw alignments - only supported writing before. This should work for both clustal 1.4 and 1.8 Bio::SearchDist - I added a test for this - I have not actually had luck loading it on my machine lately so I have written a very simple test that will skip if it cannot load the Bio::Ext::Align module. Bio::SeqIO - genbank/embl/swiss I added the verbose parameters to new Bio::FTHelper(-verbose => $self->verbose) and when instantiating the new Seq so that it will not print the warnings when vebose is set to -1 for the SeqIO object. Bio::DB::GDB - a new module that will query the website www.gdb.org and return simple things (what I needed which was for a markername, the pcrprimers and length of product. This will get much improved later on as we develop objects for storing Markers and other information. This will fail if you overload the GDB server (Trust me I know...) I'm still tinkering with it so the tests may not pass 100% of the time. We can decide if it is good enough to include in the release (I'm not sure yet). It's hairy HTML parsing in there. There are some modules I did not touch - UnivAln, Bio::Tools::Blast, which depend on Bio::Root::Object. We're going to have to decide what we want to do here in the future, but that may not be a job we try to complete for 0.7 release. -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From lapp@gnf.org Wed Jan 3 20:33:32 2001 Date: Wed, 03 Jan 2001 12:33:32 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] SWISS-PROT writing
Kris Boulez wrote: > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > of that class. > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > Yes, we plan to have a specialized class for every databank, for which the attributes its entries carry are not sufficiently reflected in Bio::Seq.pm or an already existing class under Bio::Seq::*. This enables us to free the basic Seq object from definitions that only pertain to databanks and don't make up the essentials of a biological sequence. So, molecule(), division() etc will be eventually moved away from Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of 2, meaning that we want it, but we may decide to skip it this time in order to get the release out of the door. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From birney@ebi.ac.uk Wed Jan 3 21:12:54 2001 Date: Wed, 3 Jan 2001 21:12:54 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] SWISS-PROT writing
On Wed, 3 Jan 2001, Hilmar Lapp wrote: > Kris Boulez wrote: > > > > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > > of that class. > > > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > > > > Yes, we plan to have a specialized class for every databank, for which the > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm > or an already existing class under Bio::Seq::*. This enables us to free the > basic Seq object from definitions that only pertain to databanks and don't > make up the essentials of a biological sequence. > > So, molecule(), division() etc will be eventually moved away from > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of > 2, meaning that we want it, but we may decide to skip it this time in order > to get the release out of the door. For GenBank/EMBL I have prototype code to check in over here. Looks fine to me. Swissprot probably needs its own class. there is a valid debate about whether swissprot and genbank/embl should inheriet off a common base class of "rich database sequence objects" (eg, division is the same) or we should just say that they are different enough not to stretch this. I hae not done anything on swissprot. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From wfish82@hotmail.com Thu Jan 4 00:59:55 2001 Date: Thu, 04 Jan 2001 00:59:55 -0000 From: Fish Fish wfish82@hotmail.com Subject: [Bioperl-l] Bio::Tools::Blast;
Hi, I am trying to pick out those blast results saying "***** No hits found *****", among many other things. But I can't get it work with Bio::Tools::Blast. Can somebody point out what is wrong in the following code? Also, it seems if the first of a multi blast record is a "No hits found", then the 2nd record will be skipped. Thanks in advance! wfish82 ********************************** #!/usr/local/bin/perl -w use strict; use Bio::SeqIO; use Bio::Tools::Blast qw(:obj); my $blastn=$ARGV[0]; my %blastParam=( -file => $blastn, -parse => 1, -filt_func => \&filter, -min_len => 50, -check_all_hits => 0, -strict => 0, -stats => 0, -best => 0, -share => 0, -exec_func => \&process_blast, ); $Blast->parse(%blastParam); sub filter{ my $hit=shift; if(! defined $hit){ print "blahblah...\n"; }else{ return 1; } } sub process_blast{ my $blastObj=shift; if(! defined $blastObj->hit){ printf "BLAHBLAH...\n"; } $blastObj->destroy; } ####################################### # end ############################# _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.comFrom krbou@pgsgent.be Thu Jan 4 08:04:08 2001 Date: Thu, 4 Jan 2001 09:04:08 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SWISS-PROT writing
Quoting Ewan Birney (birney@ebi.ac.uk): > On Wed, 3 Jan 2001, Hilmar Lapp wrote: > > > Kris Boulez wrote: > > > > > > > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > > > of that class. > > > > > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > > > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > > > > > > > Yes, we plan to have a specialized class for every databank, for which the > > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm > > or an already existing class under Bio::Seq::*. This enables us to free the > > basic Seq object from definitions that only pertain to databanks and don't > > make up the essentials of a biological sequence. > > > > So, molecule(), division() etc will be eventually moved away from > > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of > > 2, meaning that we want it, but we may decide to skip it this time in order > > to get the release out of the door. > > For GenBank/EMBL I have prototype code to check in over here. Looks fine > to me. Swissprot probably needs its own class. > > > there is a valid debate about whether swissprot and genbank/embl should > inheriet off a common base class of "rich database sequence objects" (eg, > division is the same) or we should just say that they are different enough > not to stretch this. I hae not done anything on swissprot. > > Last night I thought a bit more about this and have some questions. - will these objects also inherit from Bio::Seq ? - if yes, will these objects be created like my $swiss_seq = Bio::Seq->new( ..., -format => 'swiss'); or my $swiss_seq = Bio::Seq::swiss->new( .. ); - will it be possible to 'promote' a Bio::Seq object to one of these new objects ? Kris,From birney@ebi.ac.uk Thu Jan 4 09:26:59 2001 Date: Thu, 4 Jan 2001 09:26:59 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] SWISS-PROT writing
On Thu, 4 Jan 2001, Kris Boulez wrote: > Quoting Ewan Birney (birney@ebi.ac.uk): > > On Wed, 3 Jan 2001, Hilmar Lapp wrote: > > > > > Kris Boulez wrote: > > > > > > > > > > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > > > > of that class. > > > > > > > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > > > > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > > > > > > > > > > Yes, we plan to have a specialized class for every databank, for which the > > > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm > > > or an already existing class under Bio::Seq::*. This enables us to free the > > > basic Seq object from definitions that only pertain to databanks and don't > > > make up the essentials of a biological sequence. > > > > > > So, molecule(), division() etc will be eventually moved away from > > > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of > > > 2, meaning that we want it, but we may decide to skip it this time in order > > > to get the release out of the door. > > > > For GenBank/EMBL I have prototype code to check in over here. Looks fine > > to me. Swissprot probably needs its own class. > > > > > > there is a valid debate about whether swissprot and genbank/embl should > > inheriet off a common base class of "rich database sequence objects" (eg, > > division is the same) or we should just say that they are different enough > > not to stretch this. I hae not done anything on swissprot. > > > > > Last night I thought a bit more about this and have some questions. > > - will these objects also inherit from Bio::Seq ? yes. > > - if yes, will these objects be created like > my $swiss_seq = Bio::Seq->new( ..., -format => 'swiss'); > No. They will be created though from my $swiss_seq_io = Bio::SeqIO->new( -format => 'swiss' ) ; $swiss_seq = $swiss_seq_io->next_seq; > or > > my $swiss_seq = Bio::Seq::swiss->new( .. ); > This will be achievable. > - will it be possible to 'promote' a Bio::Seq object to one of these new > objects ? > yes.... > > > Kris, > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From heikki@ebi.ac.uk Thu Jan 4 10:38:43 2001 Date: Thu, 04 Jan 2001 10:38:43 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] test framework
Since it is already in perl 5.004, there should be no reason not to use it. I tried it yesterday it really cleans up test code nicely. I am going to use it in the future. -Heikki Jason Stajich wrote: > > while I'm messing with it, does anyone have objections to using the built > in perl Test module available since perl 5.004 rather than our > > I agree it is wasted time to constantly move things from one test suite to > another ( I already tried to standardize our existing ones as best as > possible). But a nice standard makes it easier for new people to write > tests and make them fit. Any comments? > > sub test ($$;$) { > my($num, $true,$msg) = @_; > print($true ? "ok $num\n" : "not ok $num $msg\n"); > } > > [ from perldoc Test ] > > use strict; > use Test; > > # use a BEGIN block so we print our plan before MyModule is loaded > BEGIN { plan tests => 14, todo => [3,4] } > > # load your module... > use MyModule; > > ok(0); # failure > ok(1); # success > > ok(0); # ok, expected failure (see todo list, above) > ok(1); # surprise success! > > ok(0,1); # failure: '0' ne '1' > ok('broke','fixed'); # failure: 'broke' ne 'fixed' > ok('fixed','fixed'); # success: 'fixed' eq 'fixed' > ok('fixed',qr/x/); # success: 'fixed' =~ qr/x/ > > ok(sub { 1+1 }, 2); # success: '2' eq '2' > ok(sub { 1+1 }, 3); # failure: '2' ne '3' > ok(0, int(rand(2)); # (just kidding :-) > > my @list = (0,0); > ok @list, 3, "\@list=".join(',',@list); #extra diagnostics > ok 'segmentation fault', '/(?i)success/'; #regex match > > skip($feature_is_missing, ...); #do platform specific test > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From hlapp@gmx.net Thu Jan 4 17:33:52 2001 Date: Thu, 04 Jan 2001 09:33:52 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] SWISS-PROT writing
Ewan Birney wrote: > > For GenBank/EMBL I have prototype code to check in over here. Looks fine > to me. Swissprot probably needs its own class. > > there is a valid debate about whether swissprot and genbank/embl should > inheriet off a common base class of "rich database sequence objects" (eg, > division is the same) or we should just say that they are different enough > not to stretch this. I hae not done anything on swissprot. > There are probably enough attributes shared (division, molecule, date, secondary accessions, maybe revision of the sequence, ...) to justify creating a rich sequence base class. This would also others wishing to add another rich seq class get started quickly. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Jan 8 09:42:20 2001 Date: Mon, 08 Jan 2001 01:42:20 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] make test
make test presently reveals the following problems (I'm running Perl 5.005003 on Linux 2.2.10). t/Chain.............Warning chain2string: argument LAST:6 overriding LEN:4! at blib/lib/Bio/LiveSeq/Chain.pm line 184. Does this have any significance? There were a couple of others which I (and Ewan and Jason) could fix. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From heikki@ebi.ac.uk Mon Jan 8 10:18:01 2001 Date: Mon, 08 Jan 2001 10:18:01 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Re: make test
Hilmar, The warning is intentional, but I agree it looks alarming to anyone installing bioperl. Test code uses a value outside existing positions. Can you think a way of rewriting the test so that it does not print it out? -Heikki Hilmar Lapp wrote: > > make test presently reveals the following problems (I'm running > Perl 5.005003 on Linux 2.2.10). > > t/Chain.............Warning chain2string: argument LAST:6 > overriding LEN:4! at blib/lib/Bio/LiveSeq/Chain.pm line 184. > > Does this have any significance? > > There were a couple of others which I (and Ewan and Jason) could > fix. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From insana@ebi.ac.uk Mon Jan 8 13:33:13 2001 Date: Mon, 8 Jan 2001 13:33:13 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] Re: make test
> The warning is intentional, but I agree it looks alarming to anyone > installing bioperl. Test code uses a value outside existing positions. > Can you think a way of rewriting the test so that it does not print it > out? Ok, I will change that test not to create the warning. But the whole point of that test was to get that warning and see it was working as expected. JosFrom jason@chg.mc.duke.edu Mon Jan 8 13:57:33 2001 Date: Mon, 8 Jan 2001 08:57:33 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Re: make test
If you made your warnings come from bioperl objects (ie $obj->warn() ) we can turn them off by setting the verbose level on the object (ie $obj->verbose(-1) turns off all warnings). This means you objects have to inherit from Bio::Root::RootI. I didn't change the LiveSeq or Variation objects when I updated all for Bio::Root::RootI chained new for the other modules in the repository because I didn't know what your feelings were on this. Do you want to check to see that the error is thrown or just that the routine returns the correct value? -Jason On Mon, 8 Jan 2001, Joseph Insana wrote: > > The warning is intentional, but I agree it looks alarming to anyone > > installing bioperl. Test code uses a value outside existing positions. > > Can you think a way of rewriting the test so that it does not print it > > out? > > Ok, I will change that test not to create the warning. > But the whole point of that test was to get that warning and see it was > working as expected. > > Jos > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From insana@ebi.ac.uk Mon Jan 8 14:09:35 2001 Date: Mon, 8 Jan 2001 14:09:35 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] Re: make test
> (ie $obj->verbose(-1) turns off all warnings). This means you objects > have to inherit from Bio::Root::RootI. I don't want my objects to inherit from RootI. They are independent and I'd like to have them stay independent. > Do you want to check to see that the error is thrown or just that the > routine returns the correct value? I wanted to check that the third argument ("last") would always override the second argument ("length") since that is the way the method is supposed to work. I am now going to commit a version that won't produce the warning and will check something else. JosephFrom birney@ebi.ac.uk Mon Jan 8 15:08:57 2001 Date: Mon, 8 Jan 2001 15:08:57 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: make test
On Mon, 8 Jan 2001, Joseph Insana wrote: > > (ie $obj->verbose(-1) turns off all warnings). This means you objects > > have to inherit from Bio::Root::RootI. > > I don't want my objects to inherit from RootI. > They are independent and I'd like to have them stay independent. This is cool (I completely understand). I think we should consider moving the variation into its own cvs module, which means that Joseph and Heikki are not tied to the bioperl release schedule etc. This is for post 0.7 branching in my view (Hilmar to make the call). > > > Do you want to check to see that the error is thrown or just that the > > routine returns the correct value? > > I wanted to check that the third argument ("last") would always override > the second argument ("length") since that is the way the method is supposed > to work. > I am now going to commit a version that won't produce the warning > and will check something else. > > Joseph > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Mon Jan 8 18:51:16 2001 Date: Mon, 08 Jan 2001 10:51:16 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: make test
Joseph Insana wrote: > > > The warning is intentional, but I agree it looks alarming to anyone > > installing bioperl. Test code uses a value outside existing positions. > > Can you think a way of rewriting the test so that it does not print it > > out? > > Ok, I will change that test not to create the warning. > But the whole point of that test was to get that warning and see it was > working as expected. > As I understand from your and Heikki's replies in your test you wanted the overriding thing to happen, be accepted (even though a warning was triggered), and the code be able to handle it. I'm not sure what you did by your change of the test, but it looks like you simply don't test that feature anymore. If you do want to keep the warning in the code (and not turn it into an exception, which means to me that the call itself may indicate an error on the client side, but in some cases may be totally sensible), what if you print a message before the test that a warning should be expected? If you feel confident with removing the warning message, what if you test afterwards that your code dealt with the overriding thing as you expected it to do? Just my two pennies. I didn't want to suggest that anyone turns off a test of his code. I just think that a warning message being printed is not really a measurable test result (i.e., it should be either 'passed' or 'failed'). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Mon Jan 8 19:04:04 2001 Date: Mon, 08 Jan 2001 11:04:04 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: make test
Ewan Birney wrote: > > On Mon, 8 Jan 2001, Joseph Insana wrote: > > > > (ie $obj->verbose(-1) turns off all warnings). This means you objects > > > have to inherit from Bio::Root::RootI. > > > > I don't want my objects to inherit from RootI. > > They are independent and I'd like to have them stay independent. > > This is cool (I completely understand). I think we should consider moving > the variation into its own cvs module, which means that Joseph and Heikki > are not tied to the bioperl release schedule etc. > > This is for post 0.7 branching in my view (Hilmar to make the call). > I'm not sure what you mean by post-0.7 branching. I agree that under these premises the Variation code should probably better go into into its own module, even though it's a pity. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From schattner@alum.mit.edu Mon Jan 8 19:44:08 2001 Date: Mon, 08 Jan 2001 11:44:08 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
Hello all I have committed an initial draft of an introductory bioperl tutorial (called "bptutorial.pl") to the bioperl-live (main) repository. The draft tutorial pretty much follows the outline from my proposal: http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html One addition to the original proposal is that I have included an "appendix" which is a working script that demonstrates most of the bioperl features described in the tutorial. (The script is largely cut-and-pasted from various test and example files with print statements added to make it clearer as to what is going on). I believe that having a clear and accurate tutorial could make bioperl more accessible and widely used. On the other hand, if the tutorial is confusing or contains mistakes, it will turn people away from trying bioperl (and probably be worse than not having one at all). So I have a request. I would appreciate it if some of you would read the tutorial and give me feedback in terms of clarity and accuracy. I am interested in both general comments (eg "this section is too long - cut out such-and-such" or "this module description fits better in this section" or "this module will not be included in the 0.7 release so don't include it" ) and specific places where there are errors or misleading or confusing statements. (If you think that the tutorial is clear and/or that specific parts are particularly helpful I'd of course be happy to get that feedback too :--). Suggestions on improving the formatting would also be appreciated. I would definitely like feedback from people who have written modules which are in the 0.7 release to make sure that I have captured your intent and the proper usage of your module(s). I would also like comments from folks who are simply bioperl users and, ideally, from a few people who haven't used bioperl much before to see in what ways the tutorial makes it easier to use or get started using bioperl (or doesn't). Feel free to write to me directly at schattner@alum.mit.edu or via this list. Thanks. If you just want to look at the tutorial, you can view it through the web browsable CVS at : http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. (Note: you may need to view the tutorial through a word processor to get the lines to wrap properly and to get rid of extra '^M's. If someone can tell me how I need to reformat the file so this is not necessary I'd be grateful.) If you want to also run the tutorial script, you will need to have a copy of CVS "bioperl-live". The tutorial script will *not* work with release 0.6. (Note that the contents of bioperl-live are being updated often so some of the demo scripts may fail - they're working for me now and if they start failing I'd appreciate finding out). Cheers PeterFrom jason@chg.mc.duke.edu Mon Jan 8 21:10:26 2001 Date: Mon, 8 Jan 2001 16:10:26 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] ORF identification/prediction
To the best of my knowledge, we don't currently have bioperl modules that predict/identify (depending on your confidence in the software =) Open Reading Frames. Eric and I were thinking of working on a bioperl module for this. Any suggestions, known pitfalls, etc are welcomed. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From lapp@gnf.org Mon Jan 8 22:55:10 2001 Date: Mon, 08 Jan 2001 14:55:10 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] ORF identification/prediction
Jason Stajich wrote: > > To the best of my knowledge, we don't currently have bioperl modules that > predict/identify (depending on your confidence in the software =) Open > Reading Frames. Eric and I were thinking of working on a bioperl module > for this. Any suggestions, known pitfalls, etc are welcomed. > There is the Bio::Tools::ESTScan module, which obviously relies on ESTScan as the ORF predicting external tool. If you plan to implement a full-fledged ORF prediction algorithm in perl that module is not what you want. (BTW ESTScan consists of a driver layer in Perl; the core of the algorithm is written in C. One could try to integrate/rewrite the driver layer into/in Bioperl.) Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From fernan@iib.unsam.edu.ar Mon Jan 8 22:58:17 2001 Date: Mon, 8 Jan 2001 19:58:17 -0300 From: Fernan Aguero fernan@iib.unsam.edu.ar Subject: [Bioperl-l] ORF identification/prediction
Currently I am calling getorf (from the EMBOSS package) in my scripts to do this for me. [fernan@iib4 fernan]$ getorf -h Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA Optional qualifiers: -table list Code to use -minsize integer Minimum nucleotide size of ORF to report -find list This is a small menu of possible output options. The first four options are to select either the protein translation or the original nucleic acid sequence of the open reading frame. There are two possible definitions of an open reading frame: it can either be a region that is free of STOP codons or a region that begins with a START codon and ends with a STOP codon. The last three options are probably only of interest to people who wish to investigate the statistical properties of the regions around potential START or STOP codons. The last option assumes that ORF lengths are calculated between two STOP codons. Advanced qualifiers: -[no]methionine bool START codons at the beginning of protein products will usually code for Methionine, despite what the codon will code for when it is internal to a protein. This qualifier sets all such START codons to code for Methionine by default. -circular bool Is the sequence circular -[no]reverse bool Set this to be false if you do not wish to find ORFs in the reverse complement of the sequence. -flanking integer If you have chosen one of the options of the type of sequence to find that gives the flanking sequence around a STOP or START codon, this allows you to set the number of nucleotides either side of that codon to output. If the region of flanking nucleotides crosses the start or end of the sequence, no output is given for this codon. What i find annoying about EMBOSS apps is that the -h (-help) option prints limited information (unless the options are 'boolean' or 'integer' you don't know what to put there). You have to go to EMBOSS web site to look for extended help! Hope this helps, Fernan On Mon, 08 Jan 2001 18:10:26 Jason Stajich wrote: > To the best of my knowledge, we don't currently have bioperl modules > that > predict/identify (depending on your confidence in the software =) Open > Reading Frames. Eric and I were thinking of working on a bioperl > module > for this. Any suggestions, known pitfalls, etc are welcomed. > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ -- # --------------------------------------------------------- # # _ # # Fernan Aguero | / \ # # Bioinformatics | ASCII \ / against # # IIB-UNSAM | ribbon / HTML # # fernan@iib.unsam.edu.ar | campaign / \ email # # ICQ 100325972 | / \ # # # # --------------------------------------------------------- #From nirav@public.arl.Arizona.EDU Mon Jan 8 23:27:11 2001 Date: Mon, 08 Jan 2001 16:27:11 -0700 (MST) From: nirav@public.arl.Arizona.EDU nirav@public.arl.Arizona.EDU Subject: [Bioperl-l] EMBOSS -h Was : ORF identification/prediction
Quoting Fernan Aguero <fernan@iib.unsam.edu.ar>: . > > What i find annoying about EMBOSS apps is that the -h (-help) option > prints limited information (unless the options are 'boolean' or > 'integer' you don't know what to put there). You have to go to EMBOSS > web site to look for extended help! > use tfm <prog name> for detailed help in EMBOSS regards, NiravFrom dblock@gene.pbi.nrc.ca Tue Jan 9 07:50:42 2001 Date: Tue, 9 Jan 2001 01:50:42 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] [Poop-group] RELEASE: Alzabo 0.20 (fwd)
Just something to think about. Anybody play with this? Would it work with BioPerl Objects? Have we been re-inventing a wheel here? Up late, thinking out loud. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan ---------- Forwarded message ---------- Date: Tue, 9 Jan 2001 00:18:32 -0600 (CST) From: Dave Rolsky <autarch@urth.org> To: poop-group@lists.sourceforge.net, poop-scoop@lists.sourceforge.net Subject: [Poop-group] RELEASE: Alzabo 0.20 (fwd) Alzabo is a data modelling tool and OO-RDBMS mapper written in Perl. This release includes a lot of changes, both internal and external. Users who have older schemas saved to disk will need the eg/convert.pl utility included with this release. Existing users should also make sure to note the deprecations and incompatibilities detailed at the bottom of the change list. Among the most visible changes/updates are a fairly large amount of documentation revamping and support for Postgres. Alzabo is available from either CPAN or http://www.sourceforge.net/projects/alzabo/ The Alzabo homepage is at http://alzabo.sourceforge.net/. The documentation can be read online at http://alzabo.sourceforge.net/docs/. This is a good place to start for those curious about what Alzaob does. Changes -------------- 0.20 - Preliminary Postgres support. There is no support yet for constraints or foreign keys when reverse engineering or making SQL. There is also no support for large objects (I'm hoping that 7.1 will be released soon so I won't have to think about this). Otherwise, the support is about at the same level as MySQL support, though less mature. - Added Alzabo::MethodMaker module. This can be used to auto-generate useful methods for your schema/table/row objects based on the properties of your objects themselves. - Reworking/expanding/clarifying/editing of the docs. - Add sort_by and limit options whenever creating a cursor. - Method documentation POD from the Alzabo::* modules is merged into the relevant Alzabo::Create::* and Alzabo::Runtime::* modules during install. This should make it easier to find what you need since the average user will only need to look at a few modules in Alzabo::Runtime::*. - Reworked exceptions so they are all now Alzabo::Exception::Something. - Added default as a column attribute (thus there are now Alzabo::Column->default and Alzabo::Create::Column->set_default methods). - Added length & precision attributes for columns. Both are set through the Alzabo::Create::Column->set_length method. - This release includes a script in eg/ called convert.pl to convert older schemas. - Alzabo::Schema->tables & Alzabo::Table->columns now take an optional list of tables/columns as an argument and return a list of matching objects. - Added Alzabo::Column->has_attribute method. - The data browser has actually lost some functionality (the filtering). Making this more powerful is a fairly low priority at the moment. - Fix bugs where extra params passed to Alzabo::Runtime::Table->insert were not making it to the Alzabo::Runtime::Row->new method. - Fix for Alzabo::Runtime::Table->set_prefetch method. - Fixed bug in handling of deleted object in Alzabo::ObjectCacheIPC (they were never reported as deleted). - Fix bug that caused schema to get bigger every time it was saved. - Finally switched to regular hashes for objects. - Added Alzabo::SQLMaker classes to handle generating SQL in a cross-platform compatible way. DEPRECATIONS: - Parameters for Alzabo::Create::Column->new: 'null' parameter is now 'nullable'. The use of the parameter 'null' is deprecated. - Alzabo::Column->null & Alzabo::Column->set_null methods are now Alzabo::Column->nullable & Alzabo::Column->set_nullable. The old methods are deprecated. - Alzabo::Create::ForeignKey->new no longer requires table_from & table_to params (it took me this long to realize I can get that from the column passed in. doh!) INCOMPATIBILITIES: - Alzabo::Runtime::Table->rows_where parameters have changed. The from parameter has been removed (use the Alzabo::Runtime::Schema->join method instead). The where parameter expects something different now. - Alzabo::Runtime::Table->rows_by_where_clause method has been removed. - Alzabo::Runtime::Schema->join method's where parameter expects something different. /*================== www.urth.org We await the New Sun ==================*/ _______________________________________________ Poop-group mailing list Poop-group@lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/poop-groupFrom heikki@ebi.ac.uk Tue Jan 9 09:29:23 2001 Date: Tue, 09 Jan 2001 09:29:23 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Re: make test
Ewan propbly means that Variation code should be part of the main bioperl cvs but should form a separate module after 0.7 is out. I do not think this a good idea. I'd like to keep Variation and LiveSeq namespaces within Bioperl main distribution. There is an issue of Ensembl needing a copy of Variation code which would favour moving thing over to a saparate module but it can be handled by other means: e.g. by copying the objects over temporarily. -Heikki Hilmar Lapp wrote: > > Ewan Birney wrote: > > > > On Mon, 8 Jan 2001, Joseph Insana wrote: > > > > > > (ie $obj->verbose(-1) turns off all warnings). This means you objects > > > > have to inherit from Bio::Root::RootI. > > > > > > I don't want my objects to inherit from RootI. > > > They are independent and I'd like to have them stay independent. > > > > This is cool (I completely understand). I think we should consider moving > > the variation into its own cvs module, which means that Joseph and Heikki > > are not tied to the bioperl release schedule etc. > > > > This is for post 0.7 branching in my view (Hilmar to make the call). > > > > I'm not sure what you mean by post-0.7 branching. I agree that > under these premises the Variation code should probably better go > into into its own module, even though it's a pity. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From birney@ebi.ac.uk Tue Jan 9 09:30:28 2001 Date: Tue, 9 Jan 2001 09:30:28 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: make test
On Tue, 9 Jan 2001, Heikki Lehvaslaiho wrote: > Ewan propbly means that Variation code should be part of the main > bioperl cvs but should form a separate module after 0.7 is out. I do > not think this a good idea. I'd like to keep Variation and LiveSeq > namespaces within Bioperl main distribution. I am cool with this as well. <grin>.From heikki@ebi.ac.uk Tue Jan 9 10:28:29 2001 Date: Tue, 09 Jan 2001 10:28:29 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
Dear Peter, Wonderful! Thank you very much for writing the tutorial. Before any of us goes into details I though it best to wrap the words and remove ^Ms for easier viewing. CVS is happier with short lines, too. This was easy enough to do in emacs. Thanks again, -Heikki Peter Schattner wrote: > > Hello all > > I have committed an initial draft of an introductory bioperl tutorial > (called "bptutorial.pl") to the bioperl-live (main) repository. The > draft tutorial pretty much follows the outline from my proposal: > http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html > One addition to the original proposal is that I have included an > "appendix" which is a working script that demonstrates most of the > bioperl features described in the tutorial. (The script is largely > cut-and-pasted from various test and example files with print statements > added to make it clearer as to what is going on). > > I believe that having a clear and accurate tutorial could make bioperl > more accessible and widely used. On the other hand, if the tutorial is > confusing or contains mistakes, it will turn people away from trying > bioperl (and probably be worse than not having one at all). So I have > a request. I would appreciate it if some of you would read the tutorial > and give me feedback in terms of clarity and accuracy. I am interested > in both general comments (eg "this section is too long - cut out > such-and-such" or "this module description fits better in this section" > or "this module will not be included in the 0.7 release so don't include > it" ) and specific places where there are errors or misleading or > confusing statements. (If you think that the tutorial is clear and/or > that specific parts are particularly helpful I'd of course be happy to > get that feedback too :--). Suggestions on improving the formatting > would also be appreciated. > > I would definitely like feedback from people who have written modules > which are in the 0.7 release to make sure that I have captured your > intent and the proper usage of your module(s). I would also like > comments from folks who are simply bioperl users and, ideally, from a > few people who haven't used bioperl much before to see in what ways the > tutorial makes it easier to use or get started using bioperl (or > doesn't). Feel free to write to me directly at schattner@alum.mit.edu > or via this list. Thanks. > > If you just want to look at the tutorial, you can view it through the > web browsable CVS at : > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. > > (Note: you may need to view the tutorial through a word processor to get > the lines to wrap properly and to get rid of extra '^M's. If someone > can tell me how I need to reformat the file so this is not necessary I'd > be grateful.) > > If you want to also run the tutorial script, you will need to have a > copy of CVS "bioperl-live". The tutorial script will *not* work with > release 0.6. (Note that the contents of bioperl-live are being updated > often so some of the demo scripts may fail - they're working for me now > and if they start failing I'd appreciate finding out). > > Cheers > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From heikki@ebi.ac.uk Tue Jan 9 10:45:56 2001 Date: Tue, 09 Jan 2001 10:45:56 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
Peter, Running of any part of the script is dependent on bioperl-ext package. Since I do not have it, I can not run any demos. A workaround is needed. -Heikki odo ~/src/bioperl-live> perl -w bptutorial.pl 0 The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please install the bioperl-ext package odo ~/src/bioperl-live> perl -w bptutorial.pl 4 The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please install the bioperl-ext package odo ~/src/bioperl-live> Peter Schattner wrote: > > Hello all > > I have committed an initial draft of an introductory bioperl tutorial > (called "bptutorial.pl") to the bioperl-live (main) repository. The > draft tutorial pretty much follows the outline from my proposal: > http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html > One addition to the original proposal is that I have included an > "appendix" which is a working script that demonstrates most of the > bioperl features described in the tutorial. (The script is largely > cut-and-pasted from various test and example files with print statements > added to make it clearer as to what is going on). > > I believe that having a clear and accurate tutorial could make bioperl > more accessible and widely used. On the other hand, if the tutorial is > confusing or contains mistakes, it will turn people away from trying > bioperl (and probably be worse than not having one at all). So I have > a request. I would appreciate it if some of you would read the tutorial > and give me feedback in terms of clarity and accuracy. I am interested > in both general comments (eg "this section is too long - cut out > such-and-such" or "this module description fits better in this section" > or "this module will not be included in the 0.7 release so don't include > it" ) and specific places where there are errors or misleading or > confusing statements. (If you think that the tutorial is clear and/or > that specific parts are particularly helpful I'd of course be happy to > get that feedback too :--). Suggestions on improving the formatting > would also be appreciated. > > I would definitely like feedback from people who have written modules > which are in the 0.7 release to make sure that I have captured your > intent and the proper usage of your module(s). I would also like > comments from folks who are simply bioperl users and, ideally, from a > few people who haven't used bioperl much before to see in what ways the > tutorial makes it easier to use or get started using bioperl (or > doesn't). Feel free to write to me directly at schattner@alum.mit.edu > or via this list. Thanks. > > If you just want to look at the tutorial, you can view it through the > web browsable CVS at : > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. > > (Note: you may need to view the tutorial through a word processor to get > the lines to wrap properly and to get rid of extra '^M's. If someone > can tell me how I need to reformat the file so this is not necessary I'd > be grateful.) > > If you want to also run the tutorial script, you will need to have a > copy of CVS "bioperl-live". The tutorial script will *not* work with > release 0.6. (Note that the contents of bioperl-live are being updated > often so some of the demo scripts may fail - they're working for me now > and if they start failing I'd appreciate finding out). > > Cheers > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From heikki@ebi.ac.uk Tue Jan 9 10:50:49 2001 Date: Tue, 09 Jan 2001 10:50:49 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
P.S. The URL for the wrapped version of the text is: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.2&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl With new versions coming in shortly it is best to use: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?cvsroot=bioperl And select the last from there. -Heikki Heikki Lehvaslaiho wrote: > > Dear Peter, > > Wonderful! Thank you very much for writing the tutorial. Before any of > us goes into details I though it best to wrap the words and remove ^Ms > for easier viewing. CVS is happier with short lines, too. This was > easy enough to do in emacs. > > Thanks again, > > -Heikki > > Peter Schattner wrote: > > > > Hello all > > > > I have committed an initial draft of an introductory bioperl tutorial > > (called "bptutorial.pl") to the bioperl-live (main) repository. The > > draft tutorial pretty much follows the outline from my proposal: > > http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html > > One addition to the original proposal is that I have included an > > "appendix" which is a working script that demonstrates most of the > > bioperl features described in the tutorial. (The script is largely > > cut-and-pasted from various test and example files with print statements > > added to make it clearer as to what is going on). > > > > I believe that having a clear and accurate tutorial could make bioperl > > more accessible and widely used. On the other hand, if the tutorial is > > confusing or contains mistakes, it will turn people away from trying > > bioperl (and probably be worse than not having one at all). So I have > > a request. I would appreciate it if some of you would read the tutorial > > and give me feedback in terms of clarity and accuracy. I am interested > > in both general comments (eg "this section is too long - cut out > > such-and-such" or "this module description fits better in this section" > > or "this module will not be included in the 0.7 release so don't include > > it" ) and specific places where there are errors or misleading or > > confusing statements. (If you think that the tutorial is clear and/or > > that specific parts are particularly helpful I'd of course be happy to > > get that feedback too :--). Suggestions on improving the formatting > > would also be appreciated. > > > > I would definitely like feedback from people who have written modules > > which are in the 0.7 release to make sure that I have captured your > > intent and the proper usage of your module(s). I would also like > > comments from folks who are simply bioperl users and, ideally, from a > > few people who haven't used bioperl much before to see in what ways the > > tutorial makes it easier to use or get started using bioperl (or > > doesn't). Feel free to write to me directly at schattner@alum.mit.edu > > or via this list. Thanks. > > > > If you just want to look at the tutorial, you can view it through the > > web browsable CVS at : > > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. > > > > (Note: you may need to view the tutorial through a word processor to get > > the lines to wrap properly and to get rid of extra '^M's. If someone > > can tell me how I need to reformat the file so this is not necessary I'd > > be grateful.) > > > > If you want to also run the tutorial script, you will need to have a > > copy of CVS "bioperl-live". The tutorial script will *not* work with > > release 0.6. (Note that the contents of bioperl-live are being updated > > often so some of the demo scripts may fail - they're working for me now > > and if they start failing I'd appreciate finding out). > > > > Cheers > > > > Peter > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From birney@ebi.ac.uk Tue Jan 9 11:45:33 2001 Date: Tue, 9 Jan 2001 11:45:33 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] bptutorial
Many thanks to Peter for an excellent tutorial. It is well worth a read: I have spotted no obvious errors, but I will reread more carefully. The dependency problem can be solved with a conditional require and then run time skipping of sections. I agree with heikki that this will be a good thing. I will see what I can do here. People may have noticed as well that jason me and hilmar have been struggling through the refactoring of the main trunk towards 0.7. Much praise goes to jason for doing the lion's share of the work. I have only one module failing for unexplained reasons. I am planning to write on my transatlantic flight today the RichSeq style interfaces ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Tue Jan 9 11:46:08 2001 Date: Tue, 9 Jan 2001 11:46:08 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] spoke too soon...
Just cvs update'd and run tests... SeqStats has disappeared. Is this deliberate? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Tue Jan 9 15:38:59 2001 Date: Tue, 9 Jan 2001 10:38:59 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] spoke too soon...
It appears you might have... It is in Bio::Tools::SeqStats, I have updated the test module to reflect this and split the tests into separate ok statements so we can know which ones are failing. It appears some are and I am not sure if it is an error in the tests or the module. 31 helix ../bio/bioperl/bioperl-live> cvs log Bio/SeqStats.pm RCS file: /home/repository/bioperl/bioperl-live/Bio/Attic/SeqStats.pm,v Working file: Bio/SeqStats.pm head: 1.3 branch: locks: strict access list: symbolic names: keyword substitution: kv total revisions: 3; selected revisions: 3 description: ---------------------------- revision 1.3 date: 2000/03/21 11:47:55; author: birney; state: dead; lines: +0 -0 moved SeqStats, added SeqWords ---------------------------- revision 1.2 date: 2000/03/01 15:36:42; author: birney; state: Exp; lines: +148 -156 Refactored RootI to get exception throwing cleanly out Fixed minor issues in multifile.pm Minor fix to IUPAC added exception test tidied up SeqStats.pm ---------------------------- revision 1.1 date: 2000/02/27 11:36:14; author: birney; state: Exp; added multi_1 test and SeqStats ========================================================================== On Tue, 9 Jan 2001, Ewan Birney wrote: > > Just cvs update'd and run tests... SeqStats has disappeared. Is this > deliberate? > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From insana@ebi.ac.uk Tue Jan 9 19:17:46 2001 Date: Tue, 9 Jan 2001 19:17:46 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] make tests
> As I understand from your and Heikki's replies in your test you > wanted the overriding thing to happen, be accepted (even though a > warning was triggered), and the code be able to handle it. Exactly. > if you print a message before the test that a warning should be > expected? This is a nice proposal. But that one is not such an important feature that needs to be absolutely tested, to the point of forcing people to read the pre-warning message and the warning message not to get confused by them.... So I just changed the code to test something closely related, i.e. checking that the code works, avoiding only to check that the "override" of the two parameters is acted (it should anyway). Thank you. Joseph InsanaFrom hlapp@gmx.net Tue Jan 9 19:28:46 2001 Date: Tue, 09 Jan 2001 11:28:46 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: make test
Heikki Lehvaslaiho wrote: > > Ewan propbly means that Variation code should be part of the main > bioperl cvs but should form a separate module after 0.7 is out. I do > not think this a good idea. I'd like to keep Variation and LiveSeq > namespaces within Bioperl main distribution. > Even better. I see I haven't understood the issue, so you guys thrash this out. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From ajm6q@virginia.edu Wed Jan 10 00:31:01 2001 Date: Tue, 9 Jan 2001 19:31:01 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] make tests
Why don't you trap the warning in an eval/$SIG{__WARN__} - I don't see why you can't test for proper warnings, if that's what you were trying to do. -Aaron On Tue, 9 Jan 2001, Joseph Insana wrote: > > As I understand from your and Heikki's replies in your test you > > wanted the overriding thing to happen, be accepted (even though a > > warning was triggered), and the code be able to handle it. > > Exactly. > > > if you print a message before the test that a warning should be > > expected? > > This is a nice proposal. > > But that one is not such an important feature that needs to be absolutely > tested, to the point of forcing people to read the pre-warning message > and the warning message not to get confused by them.... > > So I just changed the code to test something closely related, i.e. checking > that the code works, avoiding only to check that the "override" of the two > parameters is acted (it should anyway). > > Thank you. > Joseph Insana > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From hlapp@gmx.net Wed Jan 10 09:55:32 2001 Date: Wed, 10 Jan 2001 01:55:32 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::SearchDist, Bio::Ext::Align
I thought for completeness I install the Bioperl XS modules in Bio::Ext::*, and downloaded bioperl-ext-0.6.tar.gz, which is advertised as the latest version. Installation went fine, but now the t/SearchDist.t tests get executed. This revealed a couple of bugs in Bio::SearchDist, some of which are related to the RootI transition. Others consist of calling functions which are simply not present by that name in the extension module. I tried to fix them all, but now there is a complaint about a missing parameter in fit_EVD (expects two, but gets only 1 hardcoded parameter), which I don't know how to fix. Does anyone use this module currently (and if so, why does it work for you?)? Did I grab the wrong version? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From heikki@ebi.ac.uk Wed Jan 10 12:26:53 2001 Date: Wed, 10 Jan 2001 12:26:53 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
I noticed that it is not possible to use three letter codes for amino acids in any bioperl sequence objects. I think should be possible at least to output in three letter code. Mapping three letter code back to one letter code is not too hard, either, but is it a good idea to have? I propose to put method 'seq3' into PrimarySeq.pm which is called from Seq.pm, too. =head2 seq3 Title : seq3 Usage : $string = $obj->seq3() Function: Read only method that returns the amino acid sequence as a string of three letter codes. moltype has to be 'protein'. Output follows the IUPAC standard plus 'Ter' for terminator. Returns : A scalar Args : character used for stop, optional, defaults to '*' character used for unknown, optional, defaults to 'X' =cut Any opinions? -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From gert.thijs@esat.kuleuven.ac.be Wed Jan 10 15:35:48 2001 Date: Wed, 10 Jan 2001 16:35:48 +0100 From: gert thijs gert.thijs@esat.kuleuven.ac.be Subject: [Bioperl-l] Bio::SeqIO::genbank.pm
I have been using Bio::SeqIO::genbank.pm quite frequently lately and I stumbled upon a small parsing problem. Sometimes there is no TITLE field defined in the REFERENCE and this makes the parsing of the record fail such that no features are detected. To solve this problem I have added 1 extra check in Bio::SeqIO::genbank.pm at line 602 if (/^ AUTHORS\s+(.*)/) { $au .= $1; while ( defined($_ = $self->_readline) ) { /^ TITLE/ && last; /^ JOURNAL/ && last; ### when no title is given ### /^\s+(.*)/ && do { $au .= $1; $au =~ s/\,(\S)/ $1/g;$au .= " ";next;}; } } -- + Gert Thijs + + email: gert.thijs@esat.kuleuven.ac.be + homepage: http://www.esat.kuleuven.ac.be/~thijs + + K.U.Leuven + ESAT-SISTA + Kasteelpark Arenberg 10 + B-3001 Leuven-Heverlee + Belgium + Tel : +32 16 32 18 84 + Fax : +32 16 32 19 70From birney@ebi.ac.uk Wed Jan 10 13:35:46 2001 Date: Wed, 10 Jan 2001 13:35:46 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: Bio::SearchDist, Bio::Ext::Align
On Wed, 10 Jan 2001, Hilmar Lapp wrote: > I thought for completeness I install the Bioperl XS modules in > Bio::Ext::*, and downloaded bioperl-ext-0.6.tar.gz, which is > advertised as the latest version. > > Installation went fine, but now the t/SearchDist.t tests get > executed. This revealed a couple of bugs in Bio::SearchDist, some > of which are related to the RootI transition. Others consist of > calling functions which are simply not present by that name in the > extension module. I tried to fix them all, but now there is a > complaint about a missing parameter in fit_EVD (expects two, but > gets only 1 hardcoded parameter), which I don't know how to fix. This is my bug to fix. I will look at it. I don't think anyone has used SearchDist before, including me. Doh! > > Does anyone use this module currently (and if so, why does it work > for you?)? Did I grab the wrong version? > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Wed Jan 10 13:46:01 2001 Date: Wed, 10 Jan 2001 13:46:01 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
On Wed, 10 Jan 2001, Heikki Lehvaslaiho wrote: > > > I noticed that it is not possible to use three letter codes for amino > acids in any bioperl sequence objects. I think should be possible at > least to output in three letter code. Mapping three letter code back > to one letter code is not too hard, either, but is it a good idea to > have? > > I propose to put method 'seq3' into PrimarySeq.pm which is called from > Seq.pm, too. > > =head2 seq3 > > Title : seq3 > Usage : $string = $obj->seq3() > Function: Read only method that returns the amino acid sequence > as a string of three letter codes. moltype has to be > 'protein'. Output follows the IUPAC standard plus > 'Ter' for terminator. > Returns : A scalar > Args : character used for stop, optional, defaults to '*' > character used for unknown, optional, defaults to 'X' > > =cut > > Any opinions? Do you really want this? I guess so. There could be an argument to make a SeqUtils class and move this sort of function in there, allowing us to mess less objects/interfaces it would be $seq3 = Bio::SeqUtils->seq3($seq); > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Wed Jan 10 19:10:51 2001 Date: Wed, 10 Jan 2001 11:10:51 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::SeqIO::genbank.pm
Submitted to the Bioperl bug-tracker. (BTW whenever you feel quite sure that your complaint addresses a bug, you can directly submit it to bioperl-bugs@bio.perl.org. If you don't feel sure, you can still do so. The bug-tracking system is the best way of keeping track of such things.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From lorrie@oreilly.com Wed Jan 10 19:31:27 2001 Date: Wed, 10 Jan 2001 14:31:27 -0500 From: Lorrie LeJeune lorrie@oreilly.com Subject: [Bioperl-l] Re: Initial draft of bioperl tutorial committed
At 05:33 AM 1/9/2001 -0500, Peter Schattner wrote: >I have committed an initial draft of an introductory bioperl tutorial >(called "bptutorial.pl") to the bioperl-live (main) repository. Peter (and fellow BioPerlers): I think the tutorial is a great idea. BioPerl needs good documentation in a big way, and I promised Ewan at BOSC that I'd be willing to volunteer some time to the cause. So I'd be happy to sign on as your editor and help you get it ship-shape. I'm also a beginning Perl programmer, so I'm sure it'll help me learn more about both the language and BioPerl. I'm in the process of finishing up O'Reilly's first bioinformatics book: Developing Bioinformatics Computer Skills. I'd like to put a pointer to the tutorial in the book, but the URL is way too long. D'ya think we might convince the webmaster give it a shorter link that's suitable for publication? Cheers, --Lorrie ------------------------------------------------------ Lorrie LeJeune Editor, Web Technologies and Bioinformatics O'Reilly & Associates 90 Sherman Street, Cambridge, MA 02140 Tel: 617-499-7472; FAX: 617-661-1116 www.oreilly.com ------------------------------------------------------From hlapp@gmx.net Wed Jan 10 19:35:09 2001 Date: Wed, 10 Jan 2001 11:35:09 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] three letter codes for amino acids?
Heikki Lehvaslaiho wrote: > > I noticed that it is not possible to use three letter codes for amino > acids in any bioperl sequence objects. I think should be possible at > least to output in three letter code. Mapping three letter code back > to one letter code is not too hard, either, but is it a good idea to > have? > > I propose to put method 'seq3' into PrimarySeq.pm which is called from > Seq.pm, too. > > =head2 seq3 > > Title : seq3 > Usage : $string = $obj->seq3() > Function: Read only method that returns the amino acid sequence > as a string of three letter codes. moltype has to be > 'protein'. Output follows the IUPAC standard plus > 'Ter' for terminator. > Returns : A scalar > Args : character used for stop, optional, defaults to '*' > character used for unknown, optional, defaults to 'X' > > =cut > > Any opinions? > Considering sequence atoms as symbols seems the most natural concept to me. Having single letters representing each symbol makes symbol arrays and strings more or less equivalent in Perl. This might not hold for multi-letter representations, so in the first place I'd expect an array to be returned. However, this is inconsistent with $seq->seq(), and reportedly inefficient due to Perl's array implementation. I know you could still split at every 3rd letter as a simple way to get an array. I'd nevertheless accept a third optional parameter denoting the 'join' character, with a default of ''. Just my few thoughts. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From paul-christophe.varoutas@curie.fr Thu Jan 11 00:48:13 2001 Date: Thu, 11 Jan 2001 01:48:13 +0100 From: Paul-Christophe Varoutas paul-christophe.varoutas@curie.fr Subject: [Bioperl-l] Emerging from obscurity
Hi everybody, I am writing because I would like to start contributing to the bioperl project. But first of all let me introduce myself: I am finishing my PhD at the Curie Institute in Paris, France. My subject is molecular genetics in yeast, and more particularly the study of the initiation of meiotic recombination. Apart from molecular genetics, I have a rather strong background in algorithmics and programming, that I developped by studying alone and interacting a lot with people studying computer science. One of my favorite fields is OOP: a have read a lot of books on OOP design and have experience in designing projects using UML and implementing them in C++. I started using C++ on 1992 and ever since I have implemented lots of sexually attractive object classes, such as various types of neural networks (backpropagation nets, BAMs (bidirectional associative memories) and FAMs (like BAMs but integrating fuzzy logic), various cryptographic algorithms, and a basic collection of bioinformatics objects (that was before I discovered bioperl ;-) ), that I then used to develop some small appications, the coolest one probably being a program doing ORF prediction using Fourier transforms. As for Perl, I started learning it on 1995 (that was the year of the 5.001 release). Slowly but steadily it has become my favorite language. I use it extensively to do virtually *everything*, including solving my small everyday problems, such as doing file management, automating various internet activities (from more low-level operations using the IO::Socket:: modules to FTP/telnet sessions and web stuff), automating the very few biocomputing needs I've had for my PhD project. I also use perl for CGI scripting. I am fascinated by the power of regular expressions (I am reading "Mastering Regular Expressions" for the second time, and I am even more fascinated than the first time I read it, I'm still discovering astonishing details and realizing there is so much to learn about them), and try to use them whenever/wherever I can B-) . I discovered the bioperl project two years ago and I am following with big interest the discussion groups for almost a year. Many times I wanted to just jump in the discussions, but I didn't because I knew I would have no time to deal with it on top of my other activities. So, after this rather long introduction, here is the subject of my mail: like all of you, I want to participate in making bioperl better. As I mentionned above I am finishing my PhD, so I don't have much time for the moment. But will have finished the experimental part of my PhD by the end of January, so I will have some time to spare. I will probably pass large amounts of time in front of a macintosh writting my article and PhD. I *hate* macs (my favorite mac software is telnet for loging to our unix servers or to my home PC), and participating in the bioperl project will prevent me from getting insane :-). I was thinking about participating in the discussions about the OOP design of bioperl, participating in the biocorba interoperability project, but for the moment I would prefer starting with something more "smooth", after all I am not (yet) familiar with all bioperl modules. So doing something that can get me more familiarized with the whole set of the bioperl modules should be a good start. I figured out that I can help with some aspects of bioperl that can contribute to the enlargement of the bioperl community. So, here is what I propose to do: - Help figuring out bioperl 0.7 cross-platform compatibility with the MacOS platform. Almost all french labs use macintosh computers, and our lab has a lot of mac boxes with various types of processors and various versions of MacOS (from 7.5.3 to 9.0). Todd Richmond and Mark Colosimo have already pointed out that there are a lot of compatibility problems, their posts are going to be my starting point. I would like to make a list of all problems, figure out which ones can be solved reasonably easily, and make at least a subset of bioperl work on MacOS "Classic" (non-MacOS X) platforms, which is what most Mac people use, and most probably will continue using. - Contribute to Shelly's effort for compatibility with the Windows NT/2000 platform. - Participate in the documentation project of bioperl. I know that there are already people working on various aspects of the documentation, so I would like Ewan / Hilmar to tell me what you prefer: participate in one of the ongoing projects or initiate another project to do something that is missing. I am very glad to contribute to the bioperl group, you are doing some exceptionally good work out there. (For those who are reading this line, thank you for reaching so far :-) ). Paul-Christophe -------------------------------------- Paul-Christophe Varoutas Institut Curie - Section de Recherche - UMR144 Laboratoire de Genetique Moleculaire de la Recombinaison 26, rue d'Ulm 75248 Paris cedex 05 FRANCE Tel: 01.42.34.66.36 Fax: 01.42.34.66.44 ----------------------------------------From jason@chg.mc.duke.edu Thu Jan 11 01:33:12 2001 Date: Wed, 10 Jan 2001 20:33:12 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Emerging from obscurity
Paul - We are very happy to have you aboard the project. We are very happy to add you skills to the team and I look forward to you getting aquainted with the modules and helping us in the design (and redesign) of many of the objects. The tasks you outline below sound like a very good starting point and are very sorely needed as many of developers are only plugged into UNIX on a regular basis. The documentation will be a good starting point too, but I strongly suggest you try and use the bioperl modules to solve a task you have in your lab. (I guess the Mac portability tackling will give you this experience - but really try and use it to manipulate some of your yeast data). This will give you the chance to both get used to modules and help write documentation for people who are new to bioperl. The developers who are familiar with the code too often skip over the important details when writing docs. If you are unsure of how to use a module feel free to use the list for questions, I know there are many more people who are looking for ways to get comfortable with the modules. I'd also like to see us consider moving some of the documentation/tutorials to the wiki web site to facilitate more people contributing to it. Perhaps some 'scenerio writing' which describes a problem and how bioperl was used to solve it. Again, welcome aboard and we look forward to your contributions. Jason On Thu, 11 Jan 2001, Paul-Christophe Varoutas wrote: [snip] > > > - Help figuring out bioperl 0.7 cross-platform compatibility with the MacOS > platform. Almost all french labs use macintosh computers, and our lab has a > lot of mac boxes with various types of processors and various versions of > MacOS (from 7.5.3 to 9.0). Todd Richmond and Mark Colosimo have already > pointed out that there are a lot of compatibility problems, their posts are > going to be my starting point. I would like to make a list of all problems, > figure out which ones can be solved reasonably easily, and make at least a > subset of bioperl work on MacOS "Classic" (non-MacOS X) platforms, which is > what most Mac people use, and most probably will continue using. > > - Contribute to Shelly's effort for compatibility with the Windows NT/2000 > platform. > > - Participate in the documentation project of bioperl. I know that there > are already people working on various aspects of the documentation, so I > would like Ewan / Hilmar to tell me what you prefer: participate in one of > the ongoing projects or initiate another project to do something that is > missing. > > I am very glad to contribute to the bioperl group, you are doing some > exceptionally good work out there. > > (For those who are reading this line, thank you for reaching so far :-) ). > > > Paul-Christophe > > > -------------------------------------- > Paul-Christophe Varoutas > Institut Curie - Section de Recherche - UMR144 > Laboratoire de Genetique Moleculaire de la Recombinaison > 26, rue d'Ulm > 75248 Paris cedex 05 > FRANCE > Tel: 01.42.34.66.36 > Fax: 01.42.34.66.44 > ---------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From schattner@alum.mit.edu Thu Jan 11 09:04:19 2001 Date: Thu, 11 Jan 2001 01:04:19 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Re: Initial draft of bioperl tutorial committed
Lorrie LeJeune wrote: > > Peter Schattner wrote: > > >I have committed an initial draft of an introductory bioperl tutorial > >(called "bptutorial.pl") to the bioperl-live (main) repository. > So I'd be happy to sign on as your editor and help you > get it ship-shape. Thanks for your offer. I look forward to getting your feedback and recommendations regarding the tutorial. PeterFrom heikki@ebi.ac.uk Thu Jan 11 10:09:48 2001 Date: Thu, 11 Jan 2001 10:09:48 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
Dear Adrian, I guess I was not too clear here. I'll post the reply to the list as others might have misunderstood, too. The translate method in PrimarySeqI defaults to '*' and 'X' for stop and any in its output, but there are arguments to the method that allow you to change it. As The resulting protein sequence object can have any come other characters in the one letter code stored in the object. The same argumets are needed in the seq3 method so that the corresponding three letter codes are always 'Ter' and 'Xaa' (IUPAC standard). -Heikki Adrian Goldman wrote: > > Heikki, > > I am not very good at listserv etiquette. Anyway, here is my 2c.. if you want to post it further on to the list server, it's OK by me. Or else you can just ignore what follows as my own personal opinion. > > I don't think it makes much sense to use * as the default character for stop in 3-letter codes, nor X as the default for unknown, for the optional arguments you mention below. Ter (as you propose) for the termination codon and ?XXX for unknown make more sense to me. > > Adrian > > At 12:03 pm -0500 10/1/2001, bioperl-l-request@bioperl.org wrote: > > Message: 5 > Date: Wed, 10 Jan 2001 12:26:53 +0000 > From: Heikki Lehvaslaiho <heikki@ebi.ac.uk> > Organization: EMBL - EBI > To: bioperl-l <bioperl-l@bioperl.org> > Subject: [Bioperl-l] three letter codes for amino acids? > > I noticed that it is not possible to use three letter codes for amino > acids in any bioperl sequence objects. I think should be possible at > least to output in three letter code. Mapping three letter code back > to one letter code is not too hard, either, but is it a good idea to > have? > > I propose to put method 'seq3' into PrimarySeq.pm which is called from > Seq.pm, too. > > =head2 seq3 > > Title : seq3 > Usage : $string = $obj->seq3() > Function: Read only method that returns the amino acid sequence > as a string of three letter codes. moltype has to be > 'protein'. Output follows the IUPAC standard plus > 'Ter' for terminator. > Returns : A scalar > Args : character used for stop, optional, defaults to '*' > character used for unknown, optional, defaults to 'X' > > =cut > > Any opinions? > > -Heikki > > -- > > Professor Adrian Goldman, | Phone: 358-(0)9-191 58923 > Structural Biology Group, | FAX: 358-(0)9-191 58952 > Institute of Biotechnology | Sec: 358-(0)9-191 58921 > University of Helsinki, | Mobile: 358-(0)50-336 8960 > PL 56 | Home: 358-(0)9-728 7103 > 00014 Helsinki | email: Adrian.Goldman@Helsinki.fi > > -- on sabbatical at Brookhaven National labs, June 2000-June 2001 > Adrian Goldman, Biology Department, Building 463 50 Bell Ave., Brookhaven National Lab., Upton NY 11973. Phone: 631-344-2671 (off) 631-344-3417 (lab), 631-344-3407 (FAX). email: agoldman@bnl.gov -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From heikki@ebi.ac.uk Thu Jan 11 12:00:26 2001 Date: Thu, 11 Jan 2001 12:00:26 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
> Considering sequence atoms as symbols seems the most natural > concept to me. Having single letters representing each symbol > makes symbol arrays and strings more or less equivalent in Perl. > This might not hold for multi-letter representations, so in the > first place I'd expect an array to be returned. However, this is > inconsistent with $seq->seq(), and reportedly inefficient due to > Perl's array implementation. > > I know you could still split at every 3rd letter as a simple way > to get an array. I'd nevertheless accept a third optional > parameter denoting the 'join' character, with a default of ''. Can be done. In my mind the main use of this function is in displaying translations on top of nucleotide sequnces. Gaps inside codons are clearer with the three letter coding. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From dalke@acm.org Thu Jan 11 12:32:21 2001 Date: Thu, 11 Jan 2001 05:32:21 -0700 From: Andrew Dalke dalke@acm.org Subject: [Bioperl-l] looking for datafile parsers
Hello, I'm working on a parser generator as part of the Biopython development. It's getting towards completion which means it's time to start writing papers about it. :) Indeed, my paper was accepted for a talk at the upcoming Python conference. One of the reviewers wanted more information comparing my work to others in the field, so I've been digging up related project. I figure on writing another paper for Bioinformatics which will include some more of this information. The most similar program is SRS, which is also a parser generator, although they are context free while my parser is (mostly) regular. I tried to get a copy of the reference paper (from Meth.Enzy.) from the library but it was checked out. I would love it if someone would offer to answer a few questions for my about it, and to run some benchmarks to see how fast it parses swissprot38, say, as compared to how long it takes the bioperl code to parse the same file. Any takers? There are a few projects which allow users to specific a format using a configuration description which can roughly be classified as a regular expression pattern matcher sitting on top of line type recognizer. This includes Biopy and BioDB-Loader as well as the current Biopython parser. Another class of projects uses a common data structure then implements readers/writers to the different formats at the expense of throwing away some data, such as bioperl and SeqIO. Swissknife is an example of a library which reads/writes from a single format into a data format tailored specifically to that format. A few are special case programs (grep, NiceProt, sp2fasta) which do one and only one thing, although in the case of sw2xml that one thing converts the format (SWISS-PROT) to another format (XML) for which many tools are readily available. Most of the packages throw away formatting information and only store the physical data, although get-sprot-entry is a nice example of why keeping presentation information is useful. The program creates an HTML page which looks the same as the original format except that various fields are marked up with hyperlinks. Finally, the project I've been working on, Martel, lets you develop parsers which handle most, if not all, of these cases. I want to make sure I covered everything so I've been searching for SWISS-PROT parsers as my prototypical example. A description of what I found is below. If something major is missing, please tell me. If you can provide assistence with the SRS, GCG, Java or Lisp parts, also please tell me. Here's a key to some of the notation I use in the listings below: count == count the number of records in a database offset == generate offsets into the file for fast indexing fasta == extract data for FASTA (ID, AC and SQ fields) generic == extract generic sequence data, usually as a data structure containing fields common to multiple formats but ignoring some SWISS-PROT specific fields all == extract all fields validate == validate that a record is in the correct format markup == identifies fields and saves the layout data so as to allow HTML markup without otherwise changing the format (timings not given for markup since it will depend on the specific markup requested, and because only Martel and get-sprot-entry preserve markup) Performance is measured against the 80,000 records of swissprot38 grep - http://www.gnu.org/gnulist/production/grep.html written in C count (when used as "grep ^ID | wc") takes 0m:57s to parse sprot38 offset (when used as "grep -b ^ID") cannot be used for fasta, generic, all, validate, markup one really large regular expression (here as a bit of humor) written in C cannot be used for count, offset, fasta, generic, all, markup can be used for validate in theory, but I haven't tested it bioperl - http://www.bioperl.org/ written in Perl count (as a special case of generic) fasta (as a special case of generic) generic takes 30m:13s to parse sprot38 cannot be used for index (?), all, validate, markup biopython - http://www.biopython.org/ written in Python count (as a special case of all) fasta (as a special case of all) generic (as a special case of all) all takes 28m:55s to parse sprot38 validate cannot be used for index(?), markup biojava - http://www.biojava.org/ written in Java unknown (have source but need to figure it out) performance unknown (don't know how to code in Java) Martel - http://www.biopython.org/~dalke/Martel/ written in Python with a C extension count RecordReader.StartsWith "ID" takes 1m28s to parse sprot38 index fasta (standard format def. but only using the ID and SQ tags) takes 9m:23s to parse sprot38 generic (as a special case of all) all takes 23m:29s to parse sprot38 validate with no callbacks takes 6m:41s markup SRS - http://www.lionbio.co.uk/ written in C (?) have never used it, but it can definitely do count, fasta, generic and all. The standard swissprot format definition http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz? -page+LibInfo+-id+01FXMii+-lib+SWISSPROT cannot be used to validate although SRS itself can. I think SRS can be used to generate HTML markup but I can't begin to guess how that might be done. *** I really want to ask someone questions about SRS *** *** Any takers? *** I don't think it can be used to create your own indicies - you must use its offset tables. swissknife - ftp://ftp.ebi.ac.uk/pub/software/swissprot/ written in Perl count lazy reader takes 1m:48s to parse sprot38 fasta (getting the ->ID and ->SQ attributes) takes 8m:47s to parse sprot38 generic (as a special case of all) all takes 38m:21s to parse sprot38 cannot be used to validate, markup Biopy - http://shag.embl-heidelberg.de:8000/Biopy/ written in Python count (as a special case of all) index (by "position += length($_)") fasta (as a special case of all) generic (as a special case of all) all - requires additional programming to parse the subfields (it only identifies lines) so I actually wouldn't count this as a full parser. * takes roughly 25m to parse cannot be used to validate, markup Darwin - http://cbrg.inf.ethz.ch/Darwinshome.html is its own language and set of libraries contains a converter from SWISS-PROT to its own format. I don't access to the source code so the following is based on the example parser at http://www.inf.ethz.ch/personal/hallett/drive/node92.html count (as a special case of all) fasta (as a special case of all) generic (as a special case of all) all - requires additional programming to parse the subfields although the real implementation may contain all of that. given example cannot be used to index, validate, markup (Why does http://www.inf.ethz.ch/personal/hallett/drive/drive.html say that SWISS-PROT 38 has only 77,977 record when my copy has exactly 80,000?) SeqIO - http://www.cs.ucdavis.edu/~gusfield/seqio.tar.gz written in C count (as a special case of generic) fasta (as a special case of generic) generic have not yet benchmarked cannot be used to index, all, validate, markup readseq (C) - http://iubio.bio.indiana.edu/soft/molbio/readseq/ version1/readseq.shar written in C doesn't have swissprot and need to test of embl works instead to be tested readseq (Java) - http://iubio.bio.indiana.edu/soft/molbio/readseq/ java/readseq-source.zip written in Java have not yet explored (see above where I need help on how to write a good test program in Java.) Boulder - http://stein.cshl.org/software/boulder/ written in Perl count (as a special case of generic) fasta (as a special case of generic) generic have not yet benchmarked cannot be used for index, all, validate, markup molbio++ - ftp://ftp.ebi.ac.uk/pub/software/unix/molbio.tar.Z written in (now obsolete) C++ which doesn't compile I think it can be classified as count (as a special case of generic) fasta (as a special case of generic) generic, although it calls for some extra parsing to get at subfields of a data line * will not be benchmarking since I don't want to spend the effort to get it to compile. cannot be used for index, all, validate, markup BioDB-Loader - http://www.franz.com/services/conferences_seminars/ ismb2000/biodb1.tar.Z written in Common Lisp (Help! I know even less lisp than Java!) I'm guessing it can be classified as count (as a special case of generic) index fasta (as a special case of generic) generic, although it calls for some extra parsing to get at the subfields of a data line * have not benchmarked, although I have downloaded the Allegro common Lisp demo version. cannot be used for all, validate, markup GCG - http://www.gcg.com/products/wis-package.html written in C (?) never used it. Betting it can be classified as count (as a special case of generic) index fasta (as a special case of generic) generic have not benchmarked since I'm not spending that much money just to test the performance. cannot be used for all, validate, markup sp2fasta - part of ftp://ftp.ncbi.nlm.nih.gov/toolkit/ ? Can't seem to find it in the current distribution. Various web pages imply it is a C program to convert SWISS-PROT/EMBL to FASTA. count (if used together with grep and wc) fasta have not benchmarked since I cannot find code cannot be used for index, generic, all, validate, markup sw2xml - http://www.vsms.nottingham.ac.uk/biodom/software/ protsuite-user-dist/sw2xml-protbot.pl written in Perl. It is a translation program from SWISS-PROT to XML so some additional, though minor, XML coding is needed to do the following. count (as a special case of all) fasta (as a special case of all) generic (as a special case of all) all have not yet benchmarked cannot be used to index, validate, markup (because of the 'tidy') NiceProt - used at ExPASy implementation information not available only used to parse a single record parses the data file but doesn't build a data structure (?) so creation of fasta, generic and all require som modifications. cannot be used to count, index, validate(?), markup get-sprot-entry - used at ExPASy implementation not available can be used to markup a record (eg, see http://expasy.cbr.nrc.ca/cgi-bin/get-sprot-entry?P52930 ) doesn't build data structures or convert to another format so it cannot be used for anything else (true?) Whew! I'ld be surprised if I really did miss some other major style of parsing. Actually, I did - there are no lex/yacc grammers for SWISS-PROT but I'm not surprised because the lexing is strongly position dependent which calls for tight, explicit, tricky communications with the parser. Any other suggestions? Sincerely, Andrew Dalke dalke@acm.orgFrom simon.brocklehurst@CambridgeAntibody.com Thu Jan 11 13:59:22 2001 Date: Thu, 11 Jan 2001 13:59:22 +0000 From: Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com Subject: [Bioperl-l] Re: [Biojava-l] looking for datafile parsers
Hi Andrew, You might be interested to know that CAT has contributed to biojava a SAX2-compliant, event-based parsing framework for dealing with bioinformatics data files. Essentially, by using a SAX2 model, the framework allows users to build arbritrary XML Content Handlers for dealing with data from bioinformatics files in arbritary ways. The framework generates SAX2 events from bioinformatics format files i.e. the input data isn't XML, nor is it converted into XML internally. It's a reasonable implementation of the SAX2 e.g. Users can: o Set properties on SAX Parsers e.g. configuration of various features namespace reporting etc. o Handle infinitely large files, because it works like a SAXParser should i.e. doesn't keep the whole file in memory etc. o Deals with InputSources i.e. essentially various flavours of streams. A couple of neat benefits of a implementationing of SAX2: o It's trivial to create XML format versions of files so, with which you can do whatever you want with these e.g. using XSLT o By stringing together biojava SAXParsers which are non-validating, with validating SAXParsers from e.g. IBM, you can create parsers that validate against DTDs and/or XML Schemas that we produce for the data formats supported by the framework. Because, the bioinforamtics data from is modelled in a strongly typed way by the framework, you can get genuinely useful benefits from validation. We haven't put SwissProt support into this framework as of yet - biojava already had ways of handling SwissProt data before we put the SAX2 framework in. Currently we have in there OK support for NCBI Blast and WU-Blast, and improving support for HMMER, and PDB data. Hope this info is useful... Simon -- Simon M. Brocklehurst, Ph.D. Head of Bioinformatics & Advanced IS Cambridge Antibody Technology The Science Park, Melbourn, Cambridgeshire, UK http://www.CambridgeAntibody.com/ mailto:simon.brocklehurst@CambridgeAntibody.comFrom ajm6q@virginia.edu Thu Jan 11 13:52:32 2001 Date: Thu, 11 Jan 2001 08:52:32 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] looking for datafile parsers
On Thu, 11 Jan 2001, Andrew Dalke wrote: > Finally, the project I've been working on, Martel, > lets you develop parsers which handle most, if not all, of > these cases. Excellent, I look forward to seeing your work. Parsing is the meat and potatoes of bioinformatics, and it's beginning to taste very stale (I dunno, maybe it's been stale for awhile now). My own secret wish list is focused more on result file parsing; I once spent a fair amount of time building a "robust" FASTA result file parser, but found myself constantly needing to tweak it to keep up with fasta development changes. You don't have that problem with SwissProt or other static file formats. > grep - http://www.gnu.org/gnulist/production/grep.html > written in C > count (when used as "grep ^ID | wc") > takes 0m:57s to parse sprot38 > offset (when used as "grep -b ^ID") > cannot be used for fasta, generic, all, validate, markup I've actually found that I now use grep and a small mix of perl more than any other parsing routine (mainly because of the predicament I mention above: when a format changes, I have to fix the entire parser, even if I just want to pull out a few relevant fields at the moment). My result file "parsers" often take a few 'grep swipes' at the file (since the second grep on the same file is commonly much faster than the first), and as you show, it's very fast to begin. The one extension to grep that I'd dearly like to see (perhaps I'll submit a patch) would be to extend the -A and -B (after-context and before-context flags) to take regexp's themselves (i.e. instead of printing N lines after the first match, continue printing until the second regexp is matched, or other possibilities depending on specified flags). Then you could start using (multiple) greps to get 'fasta', 'generic', 'all' satisfied. Use the shell, Luke. -Aaron -- o ~ ~ ~ ~ ~ ~ o / Aaron J Mackey \ \ Dr. Pearson Laboratory / \ University of Virginia \ / (804) 924-2821 \ \ amackey@virginia.edu / o ~ ~ ~ ~ ~ ~ oFrom insana@ebi.ac.uk Thu Jan 11 18:15:04 2001 Date: Thu, 11 Jan 2001 18:15:04 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] make tests
> Why don't you trap the warning in an eval/$SIG{__WARN__} - I don't see why > you can't test for proper warnings, if that's what you were trying to do. I didn't know that. Now that I understood what you meant and read through the manual how to apply it, I see it's the perfect solution. Thank you very much JosephFrom birney@ebi.ac.uk Fri Jan 12 22:10:36 2001 Date: Fri, 12 Jan 2001 22:10:36 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
[Ewan recovers from rereading the Bio::Root:: stuff...] This is *mainly* for Jason and Hilmar, but in case there are other people who want to chip in: I want to completely detach RootI from the other Root::Objects (in particular Err). This means a heavy refactoring of RootI - mainly in removing the code. I will keep ->throw and ->warn but not ->verbose as a real method. (jason - do you mind this?) (I will have a "deprecation warning" on verbose) I am planning to do this on my local copy now and see how it pans out... Bio::Root::Object in it's full glory will still be there for modules we have not migrated to Bio::Root::RootI thoughts anyone? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From lapp@gnf.org Fri Jan 12 22:32:11 2001 Date: Fri, 12 Jan 2001 14:32:11 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] RootI detachment proposal.
Ewan Birney wrote: > > [Ewan recovers from rereading the Bio::Root:: stuff...] > > This is *mainly* for Jason and Hilmar, but in case there are other > people who want to chip in: > > I want to completely detach RootI from the other Root::Objects (in > particular Err). This means a heavy refactoring of RootI - mainly in > removing the code. > > I will keep ->throw and ->warn but not ->verbose as a real method. (jason > - do you mind this?) (I will have a "deprecation warning" on verbose) > > I am planning to do this on my local copy now and see how it pans out... > > Bio::Root::Object in it's full glory will still be there for modules we > have not migrated to Bio::Root::RootI > > thoughts anyone? > verbose() is being made use of heavily as far as I saw some code and code migrations from Jason. I do think that it is beneficial and desirable to have a central mechanism for regulating 'verbosity' (e.g., what happens upon a warning being issued). I also don't see yet why having verbose() in RootI hampers disentangling RootI from the other objects, or where this should interfere. (People who don't want that feature simply override it with a stub.) Maybe I'm missing something. Ideally I don't have to come up with a SeqIO-specific mechanism concerning client-side regulation of the severity of warnings. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From birney@ebi.ac.uk Fri Jan 12 22:51:21 2001 Date: Fri, 12 Jan 2001 22:51:21 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
On Fri, 12 Jan 2001, Hilmar Lapp wrote: > > verbose() is being made use of heavily as far as I saw some code and code > migrations from Jason. I do think that it is beneficial and desirable to > have a central mechanism for regulating 'verbosity' (e.g., what happens > upon a warning being issued). I also don't see yet why having verbose() in > RootI hampers disentangling RootI from the other objects, or where this > should interfere. (People who don't want that feature simply override it > with a stub.) > > Maybe I'm missing something. Ideally I don't have to come up with a > SeqIO-specific mechanism concerning client-side regulation of the severity > of warnings. Yeah. I know. I guess I am thinking with my C-extension hat on again. Ok. verbose stays. > > Hilmar > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jason@chg.mc.duke.edu Fri Jan 12 23:43:20 2001 Date: Fri, 12 Jan 2001 18:43:20 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] RootI detachment proposal.
On Fri, 12 Jan 2001, Ewan Birney wrote: > > > [Ewan recovers from rereading the Bio::Root:: stuff...] > > This is *mainly* for Jason and Hilmar, but in case there are other > people who want to chip in: > > > I want to completely detach RootI from the other Root::Objects (in > particular Err). This means a heavy refactoring of RootI - mainly in > removing the code. > > I will keep ->throw and ->warn but not ->verbose as a real method. (jason > - do you mind this?) (I will have a "deprecation warning" on verbose) well, actually verbose makes me happy because we can choose whether or not warn will actually print out msgs. Can it just be a get/set method and warn can check to see if verbose > 0 before printing? I like to use it as a debugging flag as well so we can have object specific debugging flags. > > > I am planning to do this on my local copy now and see how it pans out... > > > Bio::Root::Object in it's full glory will still be there for modules we > have not migrated to Bio::Root::RootI > > > thoughts anyone? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > <birney@ebi.ac.uk>. > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From birney@ebi.ac.uk Sat Jan 13 01:18:34 2001 Date: Sat, 13 Jan 2001 01:18:34 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] refactoring RootI
I have finished a very serious refactoring of RootI. This detaches RootI from the other Root:: objects completely. verbose I think it handled nicer. I would venture to say that the code is more readable. I have changed the formatting somewhat of the stack trace in the throw/warn statements. Your milage may vary here... Jason, Hilmar - check it out and tell me what you think. I am now a little exhausted although the final product I think is vastly improved... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From dsokol@osnut.com Sat Jan 13 07:47:09 2001 Date: Sat, 13 Jan 2001 02:47:09 -0500 From: dsokol@osnut.com dsokol@osnut.com Subject: [Bioperl-l] Exciting New Nutraceutical Company- Promote your own ideas!
--=200101130127= Content-Type: text/html;charset=US-ASCII <!-- saved from url=(0022)http://internet.e-mail --> <html> <head> <meta http-equiv="Content-Language" content="en-us"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> <meta name="ProgId" content="FrontPage.Editor.Document"> <title>Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits</title> </head> <body bgcolor="#FFFFFF" text="#008000"> <p align="left"> bioperl-l@bioperl.org, </p> <p align="left"> It was a pleasure learning about your interests in biology from your website. Based on your credentials, I am offering you the following opportunity, which I hope you may find worthwhile.</p> <p align="left">Thank you,</p> <p align="left">Daniel</p> <p align="center"><font face="Arial Black" size="5"> <b><font color="#008000">Have your nutraceutical ideas become reality and marketed to the general public-and perhaps even globally</font></b><font color="#008000"><b>.</b></font></font></p> <p align="center"><b><u><font face="Arial Black" size="4">Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits from the Quality of your own ideas!</font></u></b></p> <p align="center"><font face="Gill Sans Ultra Bold" size="4">Kava Kava, Ginseng, Echinacea, St. John's Wort...</font></p> <p align="center"><font face="Gill Sans Ultra Bold" size="4">For <u>FREE</u> information on these nutraceuticals, including their methods of synthesis, you can go to <a href="http://www.osnut.com/freeinfo.htm">http://www.osnut.com/freeinfo.htm</a> by clicking <a href="http://www.osnut.com/freeinfo.htm">HERE</a>.</font></p> <p align="center"><font color="#008000" size="4">The explosion in the nutraceutical industry has left open the possibility for considerable profits. New nutraceuticals and herbal formulas are being discovered, designed, and marketed every day! If you have a background in herbs/ biology/ chemistry /nutrition and/or medicine, then OSnutraceuticals is the company for you.</font></p> <p align="center"><font size="4" color="#008000">Open Source Nutraceuticals, Inc. is a company committed to excellence in the nutraceutical industry by providing an open source for the creation and standardization of nutraceuticals for naturally treating all kinds of conditions. By implementing a linux-like platform for discussion and protection of your ideas, OSnutraceuticals can be the best way to have your innovations marketed to the general public and for you to reap the financial benefits from the sales.</font></p> <p align="center"><font size="4" color="#008000">Sign up <b>NOW</b> and get 2 months <b>FREE</b>!</font></p> <p align="center"><font color="#008000" size="4">For more information, visit <a href="http://www.osnut.com">www.osnut.com</a></font></p> <p align="center"><font color="#008000" size="4">by clicking <a href="http://www.osnut.com">HERE!</a></font></p> <p align="center"><font color="#008000" size="4">(Note: <a href="http://www.osnut.com">www.osnut.com</a> is best viewed using Microsoft's Internet Explorer but can also be viewed with Netscape as well)</font></font></p> <p align="center"><font size="3"> </font><font size="4">If you feel you received this ad by mistake, please contact <a href="mailto:dsokol@osnut.com">dsokol@osnut.com </a>and put the word "remove" in the subject line. You will automatically be taken off our mailing list!</font></p> </body> </html> --=200101130127=--From heikki@ebi.ac.uk Sat Jan 13 16:56:10 2001 Date: Sat, 13 Jan 2001 16:56:10 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
Jason Stajich wrote: > > On Fri, 12 Jan 2001, Ewan Birney wrote: > > > > > > > [Ewan recovers from rereading the Bio::Root:: stuff...] > > > > This is *mainly* for Jason and Hilmar, but in case there are other > > people who want to chip in: > > > > > > I want to completely detach RootI from the other Root::Objects (in > > particular Err). This means a heavy refactoring of RootI - mainly in > > removing the code. > > > > I will keep ->throw and ->warn but not ->verbose as a real method. (jason > > - do you mind this?) (I will have a "deprecation warning" on verbose) > > well, actually verbose makes me happy because we can choose whether or not > warn will actually print out msgs. Can it just be a get/set method and > warn can check to see if verbose > 0 before printing? I like to use it as > a debugging flag as well so we can have object specific debugging flags. I'd like to use verbose function but RootI documention is a bit hard to read at the moment. I have not followed too closely the discussion about RootI object but once this restructuring is done, it would be great to have a few clear examples what RootI can do and what are the options. For example, I was pleasently surprised that I could ignore the contructor method for a simple class which inherits from Bio::Root:RootI. I was not sure if it worked before trying. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From heikki@ebi.ac.uk Sat Jan 13 17:38:16 2001 Date: Sat, 13 Jan 2001 17:38:16 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
I just committed the first version(s) of Bio::SeqUtils. Add in it any method you'd like Bio::PrimarySeqI compliant objects have. I put it two methods: ->seq3 and ->seq3in. seq3in, since now we do not have to worry about messing with interfaces, translates three letter amino acid codes into one letter code an stores it in the current sequence object. It throws an exception when seeing a code it does not know, although it probably should only warn and let -verbosity decide what to do. As an extra feature, both methods know about selenocystein (Sel, U). Have fun, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From schattner@alum.mit.edu Sat Jan 13 21:41:42 2001 Date: Sat, 13 Jan 2001 13:41:42 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Molecular weight calculations
I've recently been revisiting the dna & protein molecular wieght calculations in SeqStats.pm and realize I have a few related questions I would like to pose to the more bio-chemically oriented folks on the list. In nucleic acid weight calculations: 1. Should SeqStats use the charged or the neutral molecular weight of the sugar-phosphate backbone? Given that these groups are charged at physiological pH it seems reasonable to me - and the one biochemist with whom I spoke - to use the charged values. However, at least one commercial package (VectorNTI) uses neutral weights so I am unsure. (The difference is ~0.5% - 1% ). 2. For the initial (5') and final (3') sugar phosphate, should SeqStats add an extra OH and an extra H respectively? Again adding the weight of the additional water seems readonable to me but is not the way the weight calculation is sometimes performed. (The diference here is 18 which is negligible except when computing molecular weights of very short oligos.) In protein weight calculations: 3. Should SeqStats use the charged or the neutral molecular weights of the acidic and basic amino acid residues (eg aspartate, glutamate, histidine, arginine, lysine) in its computations? Given that these amino acids are charged at physiological pH it seems reasonable to use charged values. However, again VectorNTI uses neutral weights so I am unsure. (The difference is ~0.5% - 1% times the fraction of amino acids in the protein which are acidic or basic). Although the difference in calculated weights is small, my understanding is that with mass spectroscopy becoming increasingly important for protein and nucleic acid analysis, having more precise molecular weights might be useful (but if that's not really true, I'd like to know that too.) It's easy enough to implement the calculation in any of these ways.Just want to do it in the way that seems most useful. Thanks for the help. Peter (The only downside of all this is that my revisiting of these caclulations was triggered by Keith James discovering a bug in the molecular weight calculations in the current (0.6) version of SeqStats.pm which causes it to return inaccurate values :--(. Everything is fixed for the - hopefully soon - 0.7 release, but in the meantime the molecular weight routines of SeqStats should be avoided. The other methods of SeqStats.pm are fine.)From birney@ebi.ac.uk Sun Jan 14 12:39:36 2001 Date: Sun, 14 Jan 2001 12:39:36 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: pSW problem
yOn Sat, 13 Jan 2001, Peter Schattner wrote: > Hi Ewan > > I just noticed that the demo of pSW in bptutorial.pl no longer works on > my machine. > Nor does examples/pSW.pl. In either case I get an error message like > that shown below. I can't > tell what's going on. Any ideas what may have changed? > i will track this down. I spotted this as well ;) > Peter > > > [peter@pschattner examples]$ perl -w psw.pl > Use of uninitialized value at > /usr/lib/perl5/site_perl/5.005/Bio/Tools/pSW.pm line 298. > Use of uninitialized value at > /usr/lib/perl5/site_perl/5.005/Bio/Tools/pSW.pm line 298. > Warning Error > Passed in NULL objects into Align_Sequences_ProteinSmithWaterman! > > -------------------- EXCEPTION -------------------- > MSG: Unable to build an alignment > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: psw.pl > STACK: > Bio::Tools::pSW::align_and_show(299) > main::psw.pl(89) > --------------------------------------------------- > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sun Jan 14 12:51:17 2001 Date: Sun, 14 Jan 2001 12:51:17 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] updated
A couple of days I updated the task list for 0.7 http://bio.perl.org/wiki/html/BioPerl/TaskList.html which is getting much more "green". Hilmar - I think we drop some of the more unlikely things to make it into 0.7 (NetIO class for example?) and concentrate on the last important features ... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From birney@ebi.ac.uk Sun Jan 14 12:52:27 2001 Date: Sun, 14 Jan 2001 12:52:27 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
On Sat, 13 Jan 2001, Heikki Lehvaslaiho wrote: > I'd like to use verbose function but RootI documention is a bit hard > to read at the moment. I have not followed too closely the discussion > about RootI object but once this restructuring is done, it would be > great to have a few clear examples what RootI can do and what are the > options. have you cvs updated recently? I think the RootI is looking in much better shape at the moment... > > For example, I was pleasently surprised that I could ignore the > contructor method for a simple class which inherits from > Bio::Root:RootI. I was not sure if it worked before trying. > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Tue Jan 16 19:02:47 2001 Date: Tue, 16 Jan 2001 11:02:47 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] refactoring RootI
Ewan Birney wrote: > > I have finished a very serious refactoring of RootI. This detaches > RootI from the other Root:: objects completely. verbose I think it handled > nicer. I would venture to say that the code is more readable. > > I have changed the formatting somewhat of the stack trace in the > throw/warn statements. Your milage may vary here... > > Jason, Hilmar - check it out and tell me what you think. > > I am now a little exhausted although the final product I think is vastly > improved... > Well, that was a radical surgery :) Even though SteveC won't be excited about it, it looks we now have a relatively clear and straight code base there. It also seems that Err.pm is now superfluous, so we may want to deprecate it. We should also build a test for $obj->throw(), that it really prints a meaningful stack trace. In addition, there should be a test demonstrating that $obj->verbose(2) really turns warn() into throw(). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Tue Jan 16 19:53:58 2001 Date: Tue, 16 Jan 2001 11:53:58 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Status 0.7
Ewan Birney wrote: > > A couple of days I updated the task list for 0.7 > > http://bio.perl.org/wiki/html/BioPerl/TaskList.html > > which is getting much more "green". Hilmar - I think we drop some of the > more unlikely things to make it into 0.7 (NetIO class for example?) and > concentrate on the last important features ... > I think we should stick to our goal of finalizing the 0.7 release by the end of January. The situation actually doesn't look bad. Major things remaining to be addressed as I see it basically comprise of the following. 1) Fuzzy locations coverage. This is probably the most significant hurdle. Jason's already elaborating an interface outline. If anyone has suggestions/views/experience, feel encouraged to post. You may also want to check out Ewan's proposal (http://bioperl.org/pipermail/bioperl-l/2000-November/001724.html). 2) With the preceding being addressed, a review of SeqFeatureI and BioCorba interoperability may go hand in hand. Jason, Brad, is BioCorba 0.2 interoperability still within sight? 3) BPlite update. Lorenz seems to have abandoned the list, or is too busy with other things. It's priority 2, but I think at the same time as we are phasing out support for Blast.pm we need to increase support for BPlite. Anyone out there who would volunteer to assume responsibility? 4) SeqAnalysisParserI needs more elaboration, according to a discussion we (Jason, Ewan, I) had in December. It'll probably be the three of us who thrash this out. 5) Bio::SeqFeature::Transcript object. This will be related to GeneStructure and the concept has been worked out between Ewan and myself. Still, I'll need to put it into Perl code :) 6) Bugs reported on Incoming. (!) (These tend to be forgotten, but I'm sure they won't be fixed in a matter of minutes.) 7) The rest I think (I hope :) is smaller fixups, some of which I need to address myself. We'll probably have to drop Root::StreamIO (priority 3), and probably also fixing Blast.pm bugs, unless SteveC finds the time to fix them. It seems that almost all priority 2 tasks will make it into 0.7, BioCorba 0.2 being the only one not started yet. Since more or less all of us can do BioPerl work only on weekends, I suggest that we freeze the code on a Monday. I'll be off to San Jose (is anyone else going to attend the Microarray Meeting at BiOS?) the next weekend, so I propose to schedule the 0.7 code freeze for Feb. 5th (one week earlier would be Jan 29th). Note that once this is agreed upon, it will be a firm deadline. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From jason@chg.mc.duke.edu Tue Jan 16 20:36:22 2001 Date: Tue, 16 Jan 2001 15:36:22 -0500 From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] refactoring RootI
----- Original Message ----- From: "Hilmar Lapp" <hlapp@gmx.net> To: "Ewan Birney" <birney@ebi.ac.uk> Cc: <bioperl-l@bioperl.org> Sent: Tuesday, January 16, 2001 2:02 PM Subject: Re: [Bioperl-l] refactoring RootI > Ewan Birney wrote: > > > > I have finished a very serious refactoring of RootI. This detaches > > RootI from the other Root:: objects completely. verbose I think it handled > > nicer. I would venture to say that the code is more readable. > > > > I have changed the formatting somewhat of the stack trace in the > > throw/warn statements. Your milage may vary here... > > > > Jason, Hilmar - check it out and tell me what you think. > > > > I am now a little exhausted although the final product I think is vastly > > improved... > > > > Well, that was a radical surgery :) Even though SteveC won't be > excited about it, it looks we now have a relatively clear and > straight code base there. It also seems that Err.pm is now > superfluous, so we may want to deprecate it. I am very impressed as well, it should be a lot simplier. I did notice the warn/throw changed to only accept 1 parameter while I think it accepted 2 before - 1st paramet was printed as MSG: $_[0] second as NOTE: $_[1] But I don't think it is seriously important. > > We should also build a test for $obj->throw(), that it really > prints a meaningful stack trace. In addition, there should be a > test demonstrating that $obj->verbose(2) really turns warn() into > throw(). Did that in t/RootI.t I think, but it may not be extremely complete. Tried to make it catch all the thrown errors in eval, I didn't play with the SIG{__WARN__} settings enough to try and catch errors on warn when verbose== 1. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l >From lapp@gnf.org Tue Jan 16 22:34:33 2001 Date: Tue, 16 Jan 2001 14:34:33 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Refactor mercilessly
I found some thoughts about code refactoring at http://www.extremeprogramming.org/rules/refactor.html. As we are experiencing something similar with Bio::Root::*, what do people think about the points made there with particular regard to Bioperl? I enclose some quotes from that page. Hilmar <quote> We computer programmers hold onto our software designs long after they have become unwieldy. We continue to use and reuse code that is no longer maintainable because it still works in some way and we are afraid to modify it. [...] Refactor mercilessly to keep the design simple as you go and to avoid needless clutter and complexity. Keep your code clean and concise so it is easier to understand, modify, and extend. Make sure everything is expressed once and only once. [...] There is a certain amount of Zen to refactoring. It is hard at first because you must be able to let go of that perfect design you have envisioned and accept the design that was serendipitously discovered for you by refactoring. You must realize that the design you envisioned was a good guide post, but is now obsolete. </quote> -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From jason@chg.mc.duke.edu Tue Jan 16 22:54:20 2001 Date: Tue, 16 Jan 2001 17:54:20 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] what to do about Blast.pm, parsing
On the refactor front - I think BPlite is a good way to go for moving functionality from Blast.pm, however things like to_html/from_html are very nice and I'd like to see migrated along. Perhaps we could get a poll or priority list of features from Blast.pm which identify what we use it for to be sure they are migrated first. Another alternative is to go for a clean code base and write a module like what I've started locally called YABP (Yet Another Blast Parser). I'd like us to really identify the functions we want before starting to write it since porting all of Blast.pm to a new module is sort of silly if we aren't going to see signif benefit in functionality or speed. I do see the value in having a lightweight module to accomplish some tasks and a heavyweight one for doing others. I also have been playing with Parse::RecDescent some. While writing a grammar is not the most fun I've ever had, I've been able to write a parser for GenBank files and get at least accession,locus, and sequence lines parsed (I know, big deal). Feature table will be a bit more fun, but I think it may be a useful exercise whether or not we will really just write grammars for seqformats I don't know. Perhaps a grammar could be written for blast files - might be more trouble than it's worth... Just some thought rattling around... Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From jason@chg.mc.duke.edu Tue Jan 16 23:00:14 2001 Date: Tue, 16 Jan 2001 18:00:14 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Refactor mercilessly
On Tue, 16 Jan 2001, Hilmar Lapp wrote: > I found some thoughts about code refactoring at > http://www.extremeprogramming.org/rules/refactor.html. As we are > experiencing something similar with Bio::Root::*, what do people think > about the points made there with particular regard to Bioperl? I enclose > some quotes from that page. > > Hilmar > I like XP for bioperl, but I ask who are our users as users are supposed to drive the product? It seems to be the users are also the system developers. So I think we have to stop occasionally and ask - what do I want to be able to do with this system/api? This is where some of the list subscribers who don't want to develop code can really help out by identifying areas that bioperl needs to focus on or where needs aren't being met. > <quote> > We computer programmers hold onto our > software designs long after they have become > unwieldy. We continue to use and reuse code that is > no longer maintainable because it still works in some > way and we are afraid to modify it. > [...] > Refactor mercilessly to keep the design > simple as you go and to avoid needless clutter and > complexity. Keep your code clean and concise so it > is easier to understand, modify, and extend. Make > sure everything is expressed once and only once. > [...] > There is a certain amount of Zen to > refactoring. It is hard at first because you must be > able to let go of that perfect design you have > envisioned and accept the design that was > serendipitously discovered for you by refactoring. > You must realize that the design you envisioned was > a good guide post, but is now obsolete. > </quote> > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From ajm6q@virginia.edu Tue Jan 16 23:25:13 2001 Date: Tue, 16 Jan 2001 18:25:13 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] what to do about Blast.pm, parsing
On Tue, 16 Jan 2001, Jason Stajich wrote: > I also have been playing with Parse::RecDescent some. While writing a > grammar is not the most fun I've ever had, I've been able to write a > parser for GenBank files and get at least accession,locus, and sequence > lines parsed (I know, big deal). Feature table will be a bit more fun, > but I think it may be a useful exercise whether or not we will really just > write grammars for seqformats I don't know. Perhaps a grammar could be > written for blast files - might be more trouble than it's worth... I've often thought the same (and then stepped back and wondered if blast/fasta/hmmer output could be expressed in BNF [ Backus-Naur Form ]). It seems like an excellent project for an undergrad CS major who wanted to crossover into bioinformatics. There's too much grunt work involved for any of us to want to do it, though ;) Maybe we should take this off-list Jason, but do you have any comments on Parse::ResDecent vs. Parse::Yapp utility? -Aaron -- o ~ ~ ~ ~ ~ ~ o / Aaron J Mackey \ \ Dr. Pearson Laboratory / \ University of Virginia \ / (804) 924-2821 \ \ amackey@virginia.edu / o ~ ~ ~ ~ ~ ~ oFrom jason@chg.mc.duke.edu Wed Jan 17 02:34:11 2001 Date: Tue, 16 Jan 2001 21:34:11 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Status 0.7
On Tue, 16 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > A couple of days I updated the task list for 0.7 > > > > http://bio.perl.org/wiki/html/BioPerl/TaskList.html > > > > which is getting much more "green". Hilmar - I think we drop some of the > > more unlikely things to make it into 0.7 (NetIO class for example?) and > > concentrate on the last important features ... > > > > I think we should stick to our goal of finalizing the 0.7 release > by the end of January. The situation actually doesn't look bad. > Major things remaining to be addressed as I see it basically > comprise of the following. > > 1) Fuzzy locations coverage. This is probably the most significant > hurdle. Jason's already elaborating an interface outline. If > anyone has suggestions/views/experience, feel encouraged to post. > You may also want to check out Ewan's proposal > (http://bioperl.org/pipermail/bioperl-l/2000-November/001724.html). Hopefully will have something by the end of the week or early next week. > > 2) With the preceding being addressed, a review of SeqFeatureI and > BioCorba interoperability may go hand in hand. Jason, Brad, is > BioCorba 0.2 interoperability still within sight? I haven't played with this much, I was planning on doing it after the SeqFeatureI - LocationI stuff was settled. > > 3) BPlite update. Lorenz seems to have abandoned the list, or is > too busy with other things. It's priority 2, but I think at the > same time as we are phasing out support for Blast.pm we need to > increase support for BPlite. Anyone out there who would volunteer > to assume responsibility? > > 4) SeqAnalysisParserI needs more elaboration, according to a > discussion we (Jason, Ewan, I) had in December. It'll probably be > the three of us who thrash this out. Hmm, we need to determine what the future of SeqFeatureProducerI is as well in this context. > > 5) Bio::SeqFeature::Transcript object. This will be related to > GeneStructure and the concept has been worked out between Ewan and > myself. Still, I'll need to put it into Perl code :) > > 6) Bugs reported on Incoming. (!) (These tend to be forgotten, but > I'm sure they won't be fixed in a matter of minutes.) > > 7) The rest I think (I hope :) is smaller fixups, some of which I > need to address myself. > > We'll probably have to drop Root::StreamIO (priority 3), and > probably also fixing Blast.pm bugs, unless SteveC finds the time > to fix them. It seems that almost all priority 2 tasks will make > it into 0.7, BioCorba 0.2 being the only one not started yet. I wanted to wait until code was stable before working on BioCorba stuff since it is entirely dependant on the bioperl modules api. > > Since more or less all of us can do BioPerl work only on weekends, > I suggest that we freeze the code on a Monday. I'll be off to San > Jose (is anyone else going to attend the Microarray Meeting at > BiOS?) the next weekend, so I propose to schedule the 0.7 code > freeze for Feb. 5th (one week earlier would be Jan 29th). Note > that once this is agreed upon, it will be a firm deadline. Yes. Feb 5 is reasonable. Let's see how close we are the week before and take stock. Thanks for being the lead on this! > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From SnyderEE@pbrc.edu Wed Jan 17 02:43:48 2001 Date: Tue, 16 Jan 2001 20:43:48 -0600 From: Eric Snyder SnyderEE@pbrc.edu Subject: [Bioperl-l] Map Manipulation and Genetic Analysis
Hello BioPerl Folks, I was thumbing through the BioPerl modules list and noticed that there was not any coverage in the area of processing (non-sequence) maps and genetic data. I am working on some programs for processing physical and genetic maps, as well as genotypic and phenotypic data. I was wondering, is there any interest in these areas in the BioPerl community or, have I overlooked previous work on these things? I know of some of the stuff that Lincoln Stein has done (on ACEDB, RH mapping, etc.) but I have not seen anything in the form of reusable software components for basic map manipulation, comparison, etc. Nor am I aware of modules for manipulating raw data for genetic analysis. I am fairly new to working with genetic data. I would be interested in hearing of leads in this area. However, if it is not already done, I would be willing to write it in the context of BioPerl. Cheers, Eric E. Snyder Associate Professor Pennington Biomedical Research Center 6400 Perkins Road Baton Rouge, LA 70808-4124 USA Phone: (225) 763-3185 Fax: (225) 763-2525 Cell: (225) 235-6271 Email: eesnyder@pbrc.edu ICBM: N 30 24'14.0", W 91 07'20.0"From jason@chg.mc.duke.edu Wed Jan 17 22:07:29 2001 Date: Wed, 17 Jan 2001 17:07:29 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Map Manipulation and Genetic Analysis
Eric - Heikki and I had batted around talking about MarkerI for describing Markers which can be used to build maps. I have some code that I am using for some analysis which I am happy to donate when it is finished. It doesn't do much to represent maps other than assume that markers with the same mapid are part of the same map (data is stored in db). But I think a good representation of Markers first and then Maps would be very good for bioperl and those trying to bridge the gap between genetic analysis, maps, and sequence based investigation. -Jason On Tue, 16 Jan 2001, Eric Snyder wrote: > Hello BioPerl Folks, > > I was thumbing through the BioPerl modules list and noticed that there > was not any coverage in the area of processing (non-sequence) maps and > genetic data. I am working on some programs for processing physical > and genetic maps, as well as genotypic and phenotypic data. I was > wondering, is there any interest in these areas in the BioPerl > community or, have I overlooked previous work on these things? > > I know of some of the stuff that Lincoln Stein has done (on ACEDB, RH > mapping, etc.) but I have not seen anything in the form of reusable > software components for basic map manipulation, comparison, etc. Nor > am I aware of modules for manipulating raw data for genetic analysis. > I am fairly new to working with genetic data. I would be interested > in hearing of leads in this area. However, if it is not already done, > I would be willing to write it in the context of BioPerl. > > Cheers, > > > Eric E. Snyder > Associate Professor > Pennington Biomedical Research Center > 6400 Perkins Road > Baton Rouge, LA 70808-4124 > USA > Phone: (225) 763-3185 > Fax: (225) 763-2525 > Cell: (225) 235-6271 > Email: eesnyder@pbrc.edu > ICBM: N 30 24'14.0", W 91 07'20.0" > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From imre.vastrik@helsinki.fi Thu Jan 18 10:29:58 2001 Date: Thu, 18 Jan 2001 12:29:58 +0200 From: Imre Vastrik imre.vastrik@helsinki.fi Subject: [Bioperl-l] BPlite bug
Don't know if this one is for Lorenz or Jason: BPlite seems to be unaware of ' Frame = ...' lines in NCBI TBLASTN etc reports. Consequently parsing of the alignment lines does not work properly. The bug does not show up with the current test, since it is BLASTP report (lacks Frame lines). A quick hack would be to introduce the following line between lines 115 and 120: elsif ($_ =~ /^\s*Frame/) {next} However, the frame info, of course, will be lost. Bug report filed. Rgds., imreFrom jason@chg.mc.duke.edu Thu Jan 18 17:55:53 2001 Date: Thu, 18 Jan 2001 12:55:53 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html Please look it over, I didn't describe the detail of the fuzzy feature methods because I'm not sure there will be extra methods, just overriding things like start,end to be remapped. The different feature types need to be differentiated so that Bio::SeqIO::FTHelper can handle then differently when parsing/writing. Ewan, Let me know what I've left off. Hilmar does this sound reasonable, straightforward enough to you? Some may have a beef about the name - SplitSeqFeature - you are welcome to propose a better one. Send you comments or make corrections to the wiki (send a courtesy note to let us know to check the webpage). Thanks for you help. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From hlapp@gmx.net Thu Jan 18 19:11:57 2001 Date: Thu, 18 Jan 2001 11:11:57 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: LocationI
Jason Stajich wrote: > > Interfaces: > > Bio::LocationI -> ISA RangeI > Purpose: capture location information - such as in an EMBL/GenBank > feature > /source 1..345 > Methods: RangeI methods, and ...? [start/end/strand] > > Questions: How is a LocationI object going to be different from the > vanilla SeqFeatureI or should be migrate some methods from > SeqFeature (start/end/strand) to LocationI and make > SeqFeaturesI more about tags (primary/source/has_tag/each_tag) > and gff stuff? In principle I think yes. SeqFeatureI could still keep start/end/strand and map these to calls into the location object. Or, SeqFeatureI loses it (i.e., it's no longer mandatory), but for simplicity SeqFeature::Generic keeps it. > > Bio::ComplexLocationI -> ISA Bio::LocationI > Purpose: capture location information for features that are not linear > as in an EMBL/Genbank join > CDS join(544..589,688..1032) > > Methods: > - sub_Locations() -> a list of LocationI objects that indicate > start/stop boundaries for this object must override overlap, > contains, etc from RangeI with since coordinates are not > contiguous > > Objects: > Bio::SeqFeature::Generic -> ISA Bio::SeqFeatureI, Bio::LocationI > add the location() method to this object, the LocationI object > returned will be a reference to $self. > > Bio::SeqFeature::Complex -> ISA Bio::SeqFeatureI, Bio::ComplexLocationI > Purpose: implementation to handle those join() statements This is the outline you pretty much follow in the proposal on Wiki. The point I'm not so happy with is that purely location-specific issues change the class (type) of a SeqFeature. > > I'm still not clear on what a fuzzy location is supposed to represent > ie - does that mean we know that the feature is located somewhere > in the range, but we don't know the exact start/stop? Exactly. At least to my understanding. > Why can't you treat > it like real start/stop since we don't have any more information? Or > would union/intersection calculations need to behave differently? > Well, biologically you can't, because annotating a sequence with such a feature without indicating the uncertainty of start and end is deceptive. For cDNA entries this is sometimes crucial: <1..100 as CDS location means that the entry doesn't even contain the start of the CDS, and it's totally unclear where that is. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From hlapp@gmx.net Thu Jan 18 19:26:57 2001 Date: Thu, 18 Jan 2001 11:26:57 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Jason Stajich wrote: > > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html > > Please look it over, I didn't describe the detail of the fuzzy feature > methods because I'm not sure there will be extra methods, just overriding > things like start,end to be remapped. The different feature types need to > be differentiated so that Bio::SeqIO::FTHelper can handle then differently > when parsing/writing. > > Ewan, Let me know what I've left off. Hilmar does this sound reasonable, > straightforward enough to you? > You didn't include actual interface definitions, did you? Just wondering whether I missed the link. As mentioned before, what bothers me is that in this layout location-specific issues impact the class (type) of a SeqFeature. Why should any SeqFeature change it's type only because its location becomes uncertain or compound, and vice-versa? I'd rather favor uncoupling a feature and its location, with features having a reference to a location object which will give further detailsif the application worries. An application that doesn't do anything with the coordinates wouldn't notice a change, but an application that e.g. draws features on sequences will have to decide what to do if the location object says that the coordinates are not well determined. Retrieving the sequence part the feature refers to on its attached seq will also be affected: doing so for a feature with an uncertain location will result in an exception being thrown. Separating SeqFeatureI and LocationI allows also for the following: assume a feature with uncertain start and end. If you're satisfied with an average start and end, you can substitute the location object by a Range with certain start and end, and voila - drawing, sequence excision etc will just work fine on the very same feature object. Maybe I'm missing something. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From jason@chg.mc.duke.edu Thu Jan 18 19:41:51 2001 Date: Thu, 18 Jan 2001 14:41:51 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
On Thu, 18 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html > > > > Please look it over, I didn't describe the detail of the fuzzy feature > > methods because I'm not sure there will be extra methods, just overriding > > things like start,end to be remapped. The different feature types need to > > be differentiated so that Bio::SeqIO::FTHelper can handle then differently > > when parsing/writing. > > > > Ewan, Let me know what I've left off. Hilmar does this sound reasonable, > > straightforward enough to you? > > > > You didn't include actual interface definitions, did you? Just > wondering whether I missed the link. No - didn't describe actual interfaces since we are still struggling through this. Will do that when we agree enough. > > As mentioned before, what bothers me is that in this layout > location-specific issues impact the class (type) of a SeqFeature. > Why should any SeqFeature change it's type only because its > location becomes uncertain or compound, and vice-versa? Ewan and I had decoupled the LocationI from SeqFeature but there was no seen advantage, just interface mish-mash, perhaps we were too hasty? What you suggest above could be done as: Bio::SeqFeatureI ISA RangeI method : location desc : Get/Set method args : LocationI object returns: LocationI object method : start() desc : start location of seqfeature sub start { my($self) = @_; return $self->location->start() } ... similar for end ... Bio::LocationI ISA RangeI Bio::SplitLocationI ISA Bio::LocationI method: sub_SeqFeatures() desc : method for obtaining list of sub Locations - they could be SeqFeature::Exons, SeqFeature::Generic, or LocationI's? returns: list of LocationI or SeqFeatureI objects? Bio::FuzzyLocationI ISA Bio::LocationI method: get_embl_fuzzy_string() desc : possible method to return location as an embl string for a fuzzy location returns: string Does this seem more agreeable - location is decoupled from SeqFeature, but we have to support backwards compatibility with SeqFeatureI ISA RangeI which means all SeqFeatures have a start/end... > > I'd rather favor uncoupling a feature and its location, with > features having a reference to a location object which will give > further detailsif the application worries. An application that > doesn't do anything with the coordinates wouldn't notice a change, > but an application that e.g. draws features on sequences will have > to decide what to do if the location object says that the > coordinates are not well determined. Retrieving the sequence part > the feature refers to on its attached seq will also be affected: > doing so for a feature with an uncertain location will result in > an exception being thrown. Separating SeqFeatureI and LocationI > allows also for the following: assume a feature with uncertain > start and end. If you're satisfied with an average start and end, > you can substitute the location object by a Range with certain > start and end, and voila - drawing, sequence excision etc will > just work fine on the very same feature object. > > Maybe I'm missing something. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/From hlapp@gmx.net Thu Jan 18 20:34:24 2001 Date: Thu, 18 Jan 2001 12:34:24 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Jason Stajich wrote: > > What you suggest above could be done as: > > Bio::SeqFeatureI ISA RangeI > > method : location > desc : Get/Set method > args : LocationI object > returns: LocationI object > > method : start() > desc : start location of seqfeature > > sub start { > my($self) = @_; > return $self->location->start() > } > Note that as one of the few noticeable changes in the SeqFeatureI API this call should be allowed to throw an exception if 1) the start location is uncertain 2) the start location does not refer to the attached seq (to be disputed) > ... similar for end ... > > Bio::LocationI ISA RangeI > > Bio::SplitLocationI ISA Bio::LocationI > > method: sub_SeqFeatures() > desc : method for obtaining list of sub Locations - they could be > SeqFeature::Exons, SeqFeature::Generic, or LocationI's? > returns: list of LocationI or SeqFeatureI objects? > Yeah, that's the really hairy case. We probably should define first what we would like to be able to do with compound locations. This is a strong call for feedback: what do people out there using the package intend to do with compound locations? E.g. if you draw annotations, would you just draw the part referring to the attached seq? Ensembl people, any experience/wishlists for this? An obvious requirement is the ability to recover the original GenEmbl location string, so all the information necessary should be present. A compound location indeed is somewhat a hybrid between a location and a feature, because a sublocation clearly only makes sense if you also know the sequence it refers to. The sequence can be identified by its name (but then which name? the name in the location line as given in GenBank?), or by an object reference? The latter can be very expensive, because the sequence can be quite long, and if there are many of such sublocations, you quickly eat up your memory. You could also construct the seq object as sort of a dummy, without really holding the seq string. Not really convincing. So why not the simple case: a CompoundLocation has a method sub_Locations(). Each sublocation has a method seqname() (or seq_id() or whatever you prefer), which returns the same string as $feature->seqname() for subfeatures lying on the same seq, and a different name for those referring to other seqs. $feature->seq() for features with a compound location throws an exception, unless all sublocations are on the same (attached) sequence. Too simple? > Bio::FuzzyLocationI ISA Bio::LocationI > > method: get_embl_fuzzy_string() > desc : possible method to return location as an embl string for a fuzzy > location > returns: string > min_start()/max_start() etc should also be included. start() and end() in an implementation are overridden and throw exceptions, depending on which end is uncertain (and least they should be expected to throw exceptions). A certain end can be determined by min_start() == max_start() (or .._end(), resp.). > Does this seem more agreeable - location is decoupled from SeqFeature, but > we have to support backwards compatibility with SeqFeatureI ISA RangeI > which means all SeqFeatures have a start/end... > I indeed like the decoupled approach much better. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 -----------------------------------------------------------------From birney@ebi.ac.uk Thu Jan 18 23:27:53 2001 Date: Thu, 18 Jan 2001 23:27:53 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
On Thu, 18 Jan 2001, Jason Stajich wrote: > On Thu, 18 Jan 2001, Hilmar Lapp wrote: > > > Jason Stajich wrote: > > > > > > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html > > > > > > Please look it over, I didn't describe the detail of the fuzzy feature > > > methods because I'm not sure there will be extra methods, just overriding > > > things like start,end to be remapped. The different feature types need to > > > be differentiated so that Bio::SeqIO::FTHelper can handle then differently > > > when parsing/writing. > > > > > > Ewan, Let me know what I've left off. Hilmar does this sound reasonable, > > > straightforward enough to you? > > > > > > > You didn't include actual interface definitions, did you? Just > > wondering whether I missed the link. > > No - didn't describe actual interfaces since we are still struggling > through this. Will do that when we agree enough. > > > > > As mentioned before, what bothers me is that in this layout > > location-specific issues impact the class (type) of a SeqFeature. > > Why should any SeqFeature change it's type only because its > > location becomes uncertain or compound, and vice-versa? > > > Ewan and I had decoupled the LocationI from SeqFeature but there was no > seen advantage, just interface mish-mash, perhaps we were too hasty? Just to chime in, my original proposal had locations separate from SeqFeatures, but at the end of the day we seemed to be making two parallel interface heirarchies with no real gain in abstraction or understanding, and the potential for generating alot of confusion So - I guess to flip around the question - what do we gain from hanging location "off" seqfeature rather than merging the interfaces? (remember interface definitions can be implemented with any number of objects or object collections if so desired...) e.From birney@ebi.ac.uk Thu Jan 18 23:37:24 2001 Date: Thu, 18 Jan 2001 23:37:24 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
On Thu, 18 Jan 2001, Hilmar Lapp wrote: > > Note that as one of the few noticeable changes in the SeqFeatureI > API this call should be allowed to throw an exception if > 1) the start location is uncertain > 2) the start location does not refer to the attached seq > (to be disputed) My feeling is that seqfeature->start should still be well defined. It is up to the SeqFeature implementing class to "make a sensible decision" about start/end points. If it is fuzzy/complex/strange the client can test. If the client does not want to test and just wants to "draw it", I think inisiting that start/end/seqname return *something* is valid. Otherwise the client has no real option to figure out what to do with these things... If we let the implementaiton objects get away with not implementing this, the interface becomes less useful... </snip> > annotations, would you just draw the part referring to the > attached seq? Ensembl people, any experience/wishlists for this? Experience on our side is that 90% of things are either SeqFeatures or FeaturePairs and fit the simple seqfeature interface just fine the remaining 10% are genes and could be handled via some sort of complex location thing. As genes have transcripts have exons, simple mapping to complex locations is not on. For other internal reasons, Ensembl is very likely to keep with specialised adaptor classes which map Ensembl genes to Bioperl SeqFeatures, so we are flexible here... > > An obvious requirement is the ability to recover the original > GenEmbl location string, so all the information necessary should > be present. Right. > </snip> > > min_start()/max_start() etc should also be included. start() and > end() in an implementation are overridden and throw exceptions, > depending on which end is uncertain (and least they should be > expected to throw exceptions). A certain end can be determined by > min_start() == max_start() (or .._end(), resp.). I would be in favour or min_start/max_start but against letting start throw an exception. The implementation has to decide how to "become a hard feature" from being Fuzzy. It is up to the implementation. As long as this is documented, this is no more arbitary than letting the client decide. > > > Does this seem more agreeable - location is decoupled from SeqFeature, but > > we have to support backwards compatibility with SeqFeatureI ISA RangeI > > which means all SeqFeatures have a start/end... > > > > I indeed like the decoupled approach much better. > If we go for a decoupled approach I am keen on it being justified by more than just "it feels good". We are increasing the complexity here alot and we need justification... > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From lapp@gnf.org Fri Jan 19 01:28:09 2001 Date: Thu, 18 Jan 2001 17:28:09 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Ewan Birney wrote: > > > > > > min_start()/max_start() etc should also be included. start() and > > end() in an implementation are overridden and throw exceptions, > > depending on which end is uncertain (and least they should be > > expected to throw exceptions). A certain end can be determined by > > min_start() == max_start() (or .._end(), resp.). > > I would be in favour or min_start/max_start but against letting start > throw an exception. The implementation has to decide how to "become a hard > feature" from being Fuzzy. It is up to the implementation. As long as this > is documented, this is no more arbitary than letting the client decide. > I think it is more arbitrary, and I'll tell you why. There is more than one interpretation of fuzzy locations. I name two for which I think the BioPerl core is not in a position to take the decision from the client, which is why it shouldn't pretend that it is: 1) Uncertainty about the real location, that is, it is clear that the described feature sits at a particular position, but for one reason or another the producer of the feature can only give an estimated range for start and/or end. Now, we can implement (and document) the rule that in such cases $feature->start() and $feature->end() will always return the widest (or smallest, or average, make your choice) possible range. A client is then free to rely on it, thinking that what the BioPerl developers decided for is probably the wisest choice you can make. That's already catch #1. Catch #2 happens if there is a user of the client program who, because he's a good user, read the documentation of the client program, but not that of BioPerl. Do we request users of programs that use BioPerl to read through the BioPerl documentation as well? 2) The location is undefined. A location saying <1..100 is undefined for that feature in its biological meaning. You're not supposed to make up a value for an undefined value. If you had an interface dividing two integers and returning an integer (to prevent you from responding NAN or INF), and the denominator is zero, what do you return? I strongly believe that every client that does something sensible with the feature coordinates should know, and should be required to make sure in order to be safe from an exception, what type of coordinates it is dealing with. It is not the task of BioPerl to relieve the client from thinking, but it is its task to provide every information the client needs for making an educated decision. You can always divide by a number without checking for zero, but by doing so you accept the risk that some day you might get an exception. The same holds for clients calling $feature->start() instead of obtaining the location object and examining it for its capabilities. Maybe I'm missing an important point in having $feature->start() guaranteed to be exception-free. > > > > I indeed like the decoupled approach much better. > > > > If we go for a decoupled approach I am keen on it being justified by more > than just "it feels good". We are increasing the complexity here alot and > we need justification... > First for clarification: I thought we agree that we have different interfaces, that is, SeqFeatureI (ISA RangeI) and LocationI (ISA RangeI), don't we? Regarding complexity, the question is whether we better have subinterfaces for each of FuzzyLocation, CompoundLocation, etc (what is etc?), or whether we pack all into one interface. I have a preference for the first, because it let's you find out the type of location by checking $loc->isa('Bio::SomeLocationInterface'). I maybe missing another equally elegant way if everything's in one interface. The increase in complexity is fairly little I think. All interfaces can be put into their own subdirectory (Bio::Loc?). Only those people are really concerned with it who want to deal with the coordinates in a very reliable way (that is, avoid exceptions and deal with any possible sort of location type). And these people really should care what type of location they could encounter, and they mean. Everyone else could simply use LocationI which in essence is probably the same as RangeI. Regarding your point that there can be many implementations of an interface, sure that's true. In principle I have no problem with $feature->location() returning $self, assuming that the SeqFeature object implements LocationI itself. But I do think it's bad if a SeqFeature implements every type of location interface itself, because if I wanted to change the type of a feature's location I would end up instantiating a SeqFeature passed to a SeqFeature as its location object, which is weird isn't it. I say weird because it's not lightweight. No more of those beast-like classes, please. I don't think the reduction in hierarchy complexity achieved by beast classes makes them easier to learn, or to use. You may ask why I wish to change the type of a location. Consider a client program that draws features. When it encounters a feature with a FuzzyLocation, it may want to ask the user what to do. The user may even be able to set a preference like 'always take the widest possible range'. Then the client program simply replaces the FuzzyLocation with a Range object denoting the widest possible range and passes the feature on to the drawing module. No code change necessary there. And the user knows what he's doing, it's not just an arbitrary decision of a backend library. So, I still think that having not only individual interfaces, but also individual implementations for the different location types is justified, doesn't add too much complexity (in fact, it reduces hidden complexity), and provides a clear API for programmers. Long mail, sorry for wasting your time to read it, but you asked. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------From birney@ebi.ac.uk Fri Jan 19 08:45:58 2001 Date: Fri, 19 Jan 2001 08:45:58 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more...
Ok. Hilmar and I are now probably into the "code aesthetics" part of this debate, which definitely is worth having but someone sometime has to make a decision. I suggest that we keep bashing this out on the list for a couple more days (please... other people... if you have a view, do chip in). If Hilmar and I are still disagreeing with aesthetics I would like to nominate Jason to tie-break on the way to go (is this ok with you Hilmar and Jason...?) We have two points of contention: (a) Explicit Location objects or not. Hilmar suggests an explicit location object SeqFeatureI has-a LocationI LocationI is sub classed for Split (join statements) and Fuzzies Benefits - (a) easy to mix and match implementations of locations to different feature objects, and (b) if mix and matching locations to features is common, more realisatic. Hilmar argues that is clearer as well. Against - more objects and infact the majority of seqfeatures are little more than the location, and two extra strings. For backwards compatibility, I think SeqFeatureI->start would *have* to be delegated to SeqFeatureI->location->start - otherwise too much code will break... (of course, this delegation could just be for a while as we move code and people over to using "proper" locations) People might be interested that I originally argued for an explicit location object about 1 month ago. I don't now... I am suggesting that SeqFeatures do not have an explicit location object, but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting from a common SeqFeature interface Benefits - (a) less objects (b) only one place where the client gets the information and (c) more backwardly compatible. Effectively my main argument is that there will always be a pretty clear cut relationship that "this type of SeqFeature" is always "this class of location" so the splitting of the location away from the SeqFeature is just suggesting a mix-and-match world which doesn't actually exist. Simpler and stronger to go for the combined interface in my view. (b) ->start ->end throwing exceptions or not. Hilmar says that for at least Fuzzies and possibly Splits the client should figure out by rooting around the object how to map these more complex locations to a simple start,end. The interface should allow exceptions to be thrown on ->start/->end indicating that the client should be treating this seqfeature somehow differently... Basically we pass the buck to the client. I say that the implementation objects have to provide a default mapping of whatever ->start and ->end are. This means that clients can live in this happy world of "I have well defined start/ends" if they so wish without writing extra code. Smart clients are encouraged to root around in the objects for their "real" interpretation of the fuzziness. There are three reasons why I favour this: (a) Clients for dumping/drawing/manipulation have to treat large numbers of sequence features as a pretty homogeneous mass. If we make seqfeatures less homogeneous then every client is going to have to figure out how to "homogenize" the seqfeatures - this will be different client to client although for the main case they just want a "default way" of handling them. We are encouraging a diversity of views when our clients really want us to solve the problems for them. (b) as 99% of features are nice, well behaved "hard features" many pieces of client code written with the bioperl libaries will just assumme ->start,->end do not throw exceptions. When this piece of code is used by another user with a fuzzy feature, there will be a rather deep exception thrown by bioperl through the client code. I think both the user and the client with some justification will blame bioperl for this, no matter how much we say "you should have read the documentation and written 3 different subroutines to replace every time you go if( $one->start == $two->start ) gets replaced by if( &my_exact_function($one,$two) ) { } ... sub my_exact_function { # one of many if statements... if( $one->isa('Bio::FuzzyFeatureI') && $two->isa('Bio::SimpleFeatureI') { ... } } (c) long experience with seqfeatures has made me claim that the following rules are generally just what people want: - simple features - easy - join statements - ignore leading and trailing '<' '>' and take the edge start/end points on the sequence you are looking at - fuzzy features - either skip or - if you have to draw/compare them, take start/end as the min hard location mentioned and the maximum hard location mentioned, irregardless of the internal grammar. I reckon bioperl will be better to implement the (c) method by default without preventing smart clients from making their own decisions. Another long email, but worth I think knowing where we disagree... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From gert.thijs@esat.kuleuven.ac.be Fri Jan 19 12:14:53 2001 Date: Fri, 19 Jan 2001 13:14:53 +0100 From: gert thijs gert.thijs@esat.kuleuven.ac.be Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Hilmar Lapp wrote: > > Yeah, that's the really hairy case. We probably should define > first what we would like to be able to do with compound locations. > This is a strong call for feedback: what do people out there using > the package intend to do with compound locations? E.g. if you draw > annotations, would you just draw the part referring to the > attached seq? Ensembl people, any experience/wishlists for this? > I hope do not mind me giving some comments on this issue. I am writing some programs to automatically extract genes and intergenic regions from DNA sequences. So, I am mostly interested in the type of a feature and also its start and end position in the sequence. The main problem I am facing is that sometimes a feature is not extracted from the sequence because it has a fuzzy location. eg. if the location of a CDS is described as "join(AL101010.1:1..201,123..245) this CDS is not add to the list of feature and it is impossible to do anything usefull with this sequence for me. In my opinion, I think it is important that a feature is created even if the location is fuzzy. When there is a problem, it should be possible to access the description of the location. Gert -- + Gert Thijs + + email: gert.thijs@esat.kuleuven.ac.be + homepage: http://www.esat.kuleuven.ac.be/~thijs + + K.U.Leuven + ESAT-SISTA + Kasteelpark Arenberg 10 + B-3001 Leuven-Heverlee + Belgium + Tel : +32 16 32 18 84 + Fax : +32 16 32 19 70From arek@ebi.ac.uk Fri Jan 19 09:58:45 2001 Date: Fri, 19 Jan 2001 09:58:45 +0000 (GMT) From: Arek Kasprzyk arek@ebi.ac.uk Subject: [Bioperl-l] Re: [Fwd: Re: marker manipulation in bioperl]
On Fri, 19 Jan 2001, Heikki Lehvaslaiho wrote: Hi guys, I have not been following this discussion very closely but thought you may find useful to poke around a set of ensembl modules which called ensembl-map. I think that some of the ideas you are talking about have been implemented there. Arek > -------- Original Message -------- > Subject: Re: marker manipulation in bioperl > Date: Thu, 18 Jan 2001 13:06:26 -0500 (EST) > From: Jason Stajich <jason@chg.mc.duke.edu> > To: Heikki Lehvaslaiho <heikki@ebi.ac.uk> > CC: Eric Snyder <SnyderEE@pbrc.edu> > > Heikki - yes I think going via Variation::VariantI is a good way - I > am > not as familiar as I'd like to be with the Variation objects, but this > makes sense and I could imagine actually having ways to handle alleles > later on which might become useful. > > I'd still like to have an interface describe a Marker so we can do > some > fun inheritance things later with different types of markers. So I'd > make > a MarkerI and it would subclasses VariantI and add the methods > pcr_fwd, > pcr_rev (or a more appropriate function name). > > Eric [ might want to read below first ] does the OO stuff make sense > here? > If we make MarkerI with basic methods pcrprimers, chrom, sequence > location > then a concrete implementation of this can be GenericMarker, and > various > subclasses - RhMarker, STSMarker, MicrosatteliteMarker or > GeneticMarker, > RhMarker, ... depending on how you want to describe them. If they > have > specific attributes or methods that are particular to that type of > marker. > > Then on the Maps front, something like a > LinkageMap could be then build using GeneticMarkers or STSMarkers > as they implemented a function like get_genetic_location... or > get_location('cM'); > > Am I too far out there in interface land for you? > > -jason > On Thu, 18 Jan 2001, Heikki Lehvaslaiho wrote: > > > > Jason, > > > > I finally found my notes on upgrading the Ensembl Variation class. > > The problem there is that the SNP with an ID can have several > > locations in a genome. At the moment when several locations are needed > > I simply return several Variation objects with same ID. Not very > > pretty, but the interface requires me to return SeqFeature objects not > > something that contains them. > > > > So, your needs. You said that you need the following methods: > > > > fwd_primer, rev_primer, length, genetic_location, marker_sequence > > > > The following lists where they could go (+) are are already in > > Variation > > classes(%) : > > > > Bio::Variation::VariantI > > subclassed by DNAMutation, RNAChange, AAChange > > > > + fwd_primer, (moltype not protein) > > + rev_primer, (moltype not protein) > > % length, > > % add_DBLink > > % each_DBLink > > % status > > > > Bio::Variation::SeqDiff (VariantI holder class) > > % chromosome > > + genetic_location, (for strings like 12p13.3 ) > > > > Bio::Variation::Allele > > isa Bio::PrimarySeq > > % marker_sequence > > ->seq > > has additional methods repeat_unit and repeat_count > > to describe the sequence: e.g. (CA)5 > > > > > > Separately, these are the methods that I have in Variation: > > > > Bio::Ensembl::ExternalData::Variation > > ------------------------------------- > > same inheritance as in VariantI > > > > in addition: > > > > start_in_clone_coord > > end_in_clone_coord > > (status) > > alleles (string as opposed to Allele object in VariantI) > > (upStreamSeq) (same as in VariantI) > > (dnStreamSeq) (same as in VariantI) > > > > > > So, it seems to me almost everything can be accomodated within > > VariantI implementing objects. > > > > Do you want to say if marker is defined on DNA or RNA? > > moltype method? > > What additional methods you can think of having? > > > > > > It might be enough just to have a > > Bio::Variation::Marker class (isa Bio::Variation::VariantI) > > add > > + fwd_primer, (moltype not protein) > > + rev_primer, (moltype not protein) > > into Bio::Variation::VariantI > > > > and have method for genetic_location and override status method to > > accept > > any scalar (it is now restricted to values 'suspected'/'proven'). It > > might > > be a good idea to have a separate chromosome method a la GenBank/EMBL? > > > > + chromosom > > + genetic_location > > + status > > > > You could use Allele class and VariantI method to manipulate the > > sequence > > data of you could come up with a simplier implementation or interface. > > > > What do you think? > > > > Yours, > > > > -Heikki > > > > > > > > Jason Stajich wrote: > > > > > > I won't be writing anything substantial until holidays are over, I have > > > just been thinking about this and had some time to play last week as > > > things were slow for me. I guessed you would have some ideas and insight. > > > Let's see if we start coming up with an interface or extensions to > > > VariationI after Jan 1st. > > > > > > Happy holidays. > > > -jason > > > > > > On Sat, 23 Dec 2000, Heikki Lehvaslaiho wrote: > > > > > > > Hi Jason, > > > > > > > > Sorry I have not answered. I am on holiday and Christmas is in a day > > > > or two. > > > > > > > > > > > > Jason Stajich wrote: > > > > > > > > > > I'm trying to write some code that allows me to manipulate marker > > > > > information (SNPs, Microsattelites, STS). Thought it might be a useful > > > > > bioperl object. Right now I want to associate the following data with a > > > > > marker name - fwd_primer, rev_primer, length, genetic_location, > > > > > marker_sequence. I am also querying GDB, genbank, and local databases for > > > > > this and thought it would make sense to create a reusable object. Does > > > > > any/all of this fit into any of the Variation modules? I feel like if > > > > > > > > It fits fine. You could also have a look what I have put into > > > > ensembl-external as a Variation class. That is a gough and dirty class > > > > for holding SNP information. > > > > > > > > I have plans somewhere to extend it .... (I can not find the text I > > > > wrote...have to look with more time in my hands.... ) > > > > > > > > > there isn't one already this should somehow fall into the Variation > > > > > category. I have already written many throw away scripts to manipulate > > > > > the information, but it seems to me that this should be a object. I can > > > > > relate the information to physical sequence via blast and the > > > > > marker_sequence or e-PCR and the primers, but often I might want to > > > > > process the markers for something else. > > > > > > > > > > Bio::Variation::GeneticMarker? A SNP would be a sequence change, but also > > > > > a marker ... I imagine this working on multiple levels - sequence, maps, > > > > > etc. > > > > > > > > I think we should see what could be put into a interface file and what > > > > into an istantiable class. > > > > > > > > Bio::Variation::MarkerI > > > > Bio::Variation::Marker > > > > > > > > Altenatively, Bio::Variation::VariationI is already there and can me > > > > extended. > > > > > > > > I have to go... > > > > Are you going to do write this right now or can we think about this > > > > over the holidays? > > > > > > > > -Heikki > > > > > > > > > Jason Stajich > > > > > jason@chg.mc.duke.edu > > > > > Center for Human Genetics > > > > > Duke University Medical Center > > > > > http://www.chg.duke.edu/ > > > > > > > > -- > > > > ______ _/ _/_____________________________________________________ > > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > > > > Jason Stajich > > > jason@chg.mc.duke.edu > > > Center for Human Genetics > > > Duke University Medical Center > > > http://www.chg.duke.edu/ > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > ------------------------------------------------------------------------------- Dr Arek Kasprzyk EMBL-European Bioinformatics Institute. Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Tel: +44-(0)1223-494606 Fax: +44-(0)1223-494468 -------------------------------------------------------------------------------From heikki@ebi.ac.uk Fri Jan 19 14:05:14 2001 Date: Fri, 19 Jan 2001 14:05:14 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Re: [Fwd: Re: marker manipulation in bioperl]
Arek Kasprzyk wrote: > > On Fri, 19 Jan 2001, Heikki Lehvaslaiho wrote: > > Hi guys, > I have not been following this discussion very closely but > thought you may find useful to poke around a set of ensembl modules which > called ensembl-map. I think that some of the ideas you are talking > about have been implemented there. The URl is: http://www.ensembl.org/cgi-bin/cvsweb/cvsweb.cgi/ensembl-map/modules/Bio/EnsEMBL/Map/ -Heikki > Arek > > > > -------- Original Message -------- > > Subject: Re: marker manipulation in bioperl > > Date: Thu, 18 Jan 2001 13:06:26 -0500 (EST) > > From: Jason Stajich <jason@chg.mc.duke.edu> > > To: Heikki Lehvaslaiho <heikki@ebi.ac.uk> > > CC: Eric Snyder <SnyderEE@pbrc.edu> > > > > Heikki - yes I think going via Variation::VariantI is a good way - I > > am > > not as familiar as I'd like to be with the Variation objects, but this > > makes sense and I could imagine actually having ways to handle alleles > > later on which might become useful. > > > > I'd still like to have an interface describe a Marker so we can do > > some > > fun inheritance things later with different types of markers. So I'd > > make > > a MarkerI and it would subclasses VariantI and add the methods > > pcr_fwd, > > pcr_rev (or a more appropriate function name). > > > > Eric [ might want to read below first ] does the OO stuff make sense > > here? > > If we make MarkerI with basic methods pcrprimers, chrom, sequence > > location > > then a concrete implementation of this can be GenericMarker, and > > various > > subclasses - RhMarker, STSMarker, MicrosatteliteMarker or > > GeneticMarker, > > RhMarker, ... depending on how you want to describe them. If they > > have > > specific attributes or methods that are particular to that type of > > marker. > > > > Then on the Maps front, something like a > > LinkageMap could be then build using GeneticMarkers or STSMarkers > > as they implemented a function like get_genetic_location... or > > get_location('cM'); > > > > Am I too far out there in interface land for you? > > > > -jason > > On Thu, 18 Jan 2001, Heikki Lehvaslaiho wrote: > > > > > > Jason, > > > > > > I finally found my notes on upgrading the Ensembl Variation class. > > > The problem there is that the SNP with an ID can have several > > > locations in a genome. At the moment when several locations are needed > > > I simply return several Variation objects with same ID. Not very > > > pretty, but the interface requires me to return SeqFeature objects not > > > something that contains them. > > > > > > So, your needs. You said that you need the following methods: > > > > > > fwd_primer, rev_primer, length, genetic_location, marker_sequence > > > > > > The following lists where they could go (+) are are already in > > > Variation > > > classes(%) : > > > > > > Bio::Variation::VariantI > > > subclassed by DNAMutation, RNAChange, AAChange > > > > > > + fwd_primer, (moltype not protein) > > > + rev_primer, (moltype not protein) > > > % length, > > > % add_DBLink > > > % each_DBLink > > > % status > > > > > > Bio::Variation::SeqDiff (VariantI holder class) > > > % chromosome > > > + genetic_location, (for strings like 12p13.3 ) > > > > > > Bio::Variation::Allele > > > isa Bio::PrimarySeq > > > % marker_sequence > > > ->seq > > > has additional methods repeat_unit and repeat_count > > > to describe the sequence: e.g. (CA)5 > > > > > > > > > Separately, these are the methods that I have in Variation: > > > > > > Bio::Ensembl::ExternalData::Variation > > > ------------------------------------- > > > same inheritance as in VariantI > > > > > > in addition: > > > > > > start_in_clone_coord > > > end_in_clone_coord > > > (status) > > > alleles (string as opposed to Allele object in VariantI) > > > (upStreamSeq) (same as in VariantI) > > > (dnStreamSeq) (same as in VariantI) > > > > > > > > > So, it seems to me almost everything can be accomodated within > > > VariantI implementing objects. > > > > > > Do you want to say if marker is defined on DNA or RNA? > > > moltype method? > > > What additional methods you can think of having? > > > > > > > > > It might be enough just to have a > > > Bio::Variation::Marker class (isa Bio::Variation::VariantI) > > > add > > > + fwd_primer, (moltype not protein) > > > + rev_primer, (moltype not protein) > > > into Bio::Variation::VariantI > > > > > > and have method for genetic_location and override status method to > > > accept > > > any scalar (it is now restricted to values 'suspected'/'proven'). It > > > might > > > be a good idea to have a separate chromosome method a la GenBank/EMBL? > > > > > > + chromosom > > > + genetic_location > > > + status > > > > > > You could use Allele class and VariantI method to manipulate the > > > sequence > > > data of you could come up with a simplier implementation or interface. > > > > > > What do you think? > > > > > > Yours, > > > > > > -Heikki > > > > > > > > > > > > Jason Stajich wrote: > > > > > > > > I won't be writing anything substantial until holidays are over, I have > > > > just been thinking about this and had some time to play last week as > > > > things were slow for me. I guessed you would have some ideas and insight. > > > > Let's see if we start coming up with an interface or extensions to > > > > VariationI after Jan 1st. > > > > > > > > Happy holidays. > > > > -jason > > > > > > > > On Sat, 23 Dec 2000, Heikki Lehvaslaiho wrote: > > > > > > > > > Hi Jason, > > > > > > > > > > Sorry I have not answered. I am on holiday and Christmas is in a day > > > > > or two. > > > > > > > > > > > > > > > Jason Stajich wrote: > > > > > > > > > > > > I'm trying to write some code that allows me to manipulate marker > > > > > > information (SNPs, Microsattelites, STS). Thought it might be a useful > > > > > > bioperl object. Right now I want to associate the following data with a > > > > > > marker name - fwd_primer, rev_primer, length, genetic_location, > > > > > > marker_sequence. I am also querying GDB, genbank, and local databases for > > > > > > this and thought it would make sense to create a reusable object. Does > > > > > > any/all of this fit into any of the Variation modules? I feel like if > > > > > > > > > > It fits fine. You could also have a look what I have put into > > > > > ensembl-external as a Variation class. That is a gough and dirty class > > > > > for holding SNP information. > > > > > > > > > > I have plans somewhere to extend it .... (I can not find the text I > > > > > wrote...have to look with more time in my hands.... ) > > > > > > > > > > > there isn't one already this should somehow fall into the Variation > > > > > > category. I have already written many throw away scripts to manipulate > > > > > > the information, but it seems to me that this should be a object. I can > > > > > > relate the information to physical sequence via blast and the > > > > > > marker_sequence or e-PCR and the primers, but often I might want to > > > > > > process the markers for something else. > > > > > > > > > > > > Bio::Variation::GeneticMarker? A SNP would be a sequence change, but also > > > > > > a marker ... I imagine this working on multiple levels - sequence, maps, > > > > > > etc. > > > > > > > > > > I think we should see what could be put into a interface file and what > > > > > into an istantiable class. > > > > > > > > > > Bio::Variation::MarkerI > > > > > Bio::Variation::Marker > > > > > > > > > > Altenatively, Bio::Variation::VariationI is already there and can me > > > > > extended. > > > > > > > > > > I have to go... > > > > > Are you going to do write this right now or can we think about this > > > > > over the holidays? > > > > > > > > > > -Heikki > > > > > > > > > > > Jason Stajich > > > > > > jason@chg.mc.duke.edu > > > > > > Center for Human Genetics > > > > > > Duke University Medical Center > > > > > > http://www.chg.duke.edu/ > > > > > > > > > > -- > > > > > ______ _/ _/_____________________________________________________ > > > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > > > > > > > Jason Stajich > > > > jason@chg.mc.duke.edu > > > > Center for Human Genetics > > > > Duke University Medical Center > > > > http://www.chg.duke.edu/ > > > > > > -- > > > ______ _/ _/_____________________________________________________ > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > Jason Stajich > > jason@chg.mc.duke.edu > > Center for Human Genetics > > Duke University Medical Center > > http://www.chg.duke.edu/ > > > > ------------------------------------------------------------------------------- > Dr Arek Kasprzyk > EMBL-European Bioinformatics Institute. > Wellcome Trust Genome Campus, Hinxton, > Cambridge CB10 1SD, UK. > Tel: +44-(0)1223-494606 > Fax: +44-(0)1223-494468 > ------------------------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________From jason@chg.mc.duke.edu Fri Jan 19 15:00:46 2001 Date: Fri, 19 Jan 2001 10:00:46 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::Index::Abstract & bug #860
Looking through this bug - I had 'fixed' it by adding use DB_File; at the top, but now I realize that may not be the best since it still causes an error when -type is specified as 'SDBM_File'. Could just add both in the 'use' but what if DB_File is not present... The code for the method dbm_package assumes that if you specify a package it will have been already 'included'. What to do... Try and require both in the BEGIN block so they are explictly loaded no matter what? Trap errors if DB_file is not present and user asks for it? >From Bio::Index::Abstract sub dbm_package { my( $self, $value ) = @_; if ($value) { $self->{'_dbm_package'} = $value; } elsif (! $self->{'_dbm_package'}) { if ($USE_DBM_TYPE) { $self->{'_dbm_package'} = $USE_DBM_TYPE; } else { my( $type ); # DB_File isn't available on all systems eval { require DB_File; DB_File->import("$DB_HASH"); }; if ($@) { require SDBM_File; $type = 'SDBM_File'; } else { $type = 'DB_File'; } $USE_DBM_TYPE = $self->{'_dbm_package'} = $type; } } return $self->{'_dbm_package'}; } Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ ---------- Forwarded message ---------- Date: Fri, 19 Jan 2001 09:55:29 +0000 (GMT) From: K Howe <klh@sanger.ac.uk> To: Jason Stajich <jason@chg.mc.duke.edu> Subject: Re: biperl bug #860 Hi Jason, We use the following command: bpindex.pl -fmt EMBL -dir /nfs/disk92/Pfam/index -type DB_File pfamseq.index <embl flat file> where /nfs/disk92/Pfam/index is the intended location of the index file, and pfamseq.index is the name of it. The key thing is that we explicitly give the type as DB_File, and when this happens, it dies (when you don't specify type, and it has to make a guess as to which dbm type to use, it works, but this is not scalalble for us, since the default dmb file in bioperl may change from DB_File in the future). Hope this is enough information. Best, Kevin On Thu, 18 Jan 2001, Jason Stajich wrote: > Kevin - I'm trying to track down a bug you submitted for > Bio::Index::Abstract - I may have fixed it, but I want to be sure. Can > you give me an example of how to invoke bpfetch/bpindex so to throw an > error due to a potentially missing require. > > Thanks. > -JasonFrom birney@ebi.ac.uk Fri Jan 19 16:51:12 2001 Date: Fri, 19 Jan 2001 16:51:12 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Index::Abstract & bug #860
On Fri, 19 Jan 2001, Jason Stajich wrote: > Looking through this bug - I had 'fixed' it by adding > use DB_File; at the top, but now I realize that may not be the best > since it still causes an error when -type is specified as 'SDBM_File'. > Could just add both in the 'use' but what if DB_File is not present... > > The code for the method dbm_package assumes that if you specify a package > it will have been already 'included'. What to do... Try and require both > in the BEGIN block so they are explictly loaded no matter what? Trap > errors if DB_file is not present and user asks for it? Go for a require run-time load.... check out pSW.pm for an example or the SeqIO.pm for another run-time load. > > >From Bio::Index::Abstract > > sub dbm_package { > my( $self, $value ) = @_; > > if ($value) { > $self->{'_dbm_package'} = $value; > } > elsif (! $self->{'_dbm_package'}) { > if ($USE_DBM_TYPE) { > $self->{'_dbm_package'} = $USE_DBM_TYPE; > } else { > my( $type ); > # DB_File isn't available on all systems > eval { > require DB_File; > DB_File->import("$DB_HASH"); > }; > if ($@) { > require SDBM_File; > $type = 'SDBM_File'; > } else { > $type = 'DB_File'; > } > $USE_DBM_TYPE = $self->{'_dbm_package'} = $type; > } > } > return $self->{'_dbm_package'}; > } > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > ---------- Forwarded message ---------- > Date: Fri, 19 Jan 2001 09:55:29 +0000 (GMT) > From: K Howe <klh@sanger.ac.uk> > To: Jason Stajich <jason@chg.mc.duke.edu> > Subject: Re: biperl bug #860 > > > Hi Jason, > > We use the following command: > > bpindex.pl -fmt EMBL -dir /nfs/disk92/Pfam/index -type DB_File > pfamseq.index <embl flat file> > > where /nfs/disk92/Pfam/index is the intended location of the index file, > and pfamseq.index is the name of it. The key thing is that we explicitly > give the type as DB_File, and when this happens, it dies (when you don't > specify type, and it has to make a guess as to which dbm type to use, it > works, but this is not scalalble for us, since the default dmb file in > bioperl may change from DB_File in the future). > > Hope this is enough information. > > Best, > > Kevin > > On Thu, 18 Jan 2001, Jason Stajich wrote: > > > Kevin - I'm trying to track down a bug you submitted for > > Bio::Index::Abstract - I may have fixed it, but I want to be sure. Can > > you give me an example of how to invoke bpfetch/bpindex so to throw an > > error due to a potentially missing require. > > > > Thanks. > > -Jason > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From hlapp@gmx.net Fri Jan 19 19:13:57 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Fri, 19 Jan 2001 11:13:57 -0800 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References:
If the "<" symbol precedes a base span, the sequence is partial on the 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, the sequence is partial on the 3' end (e.g., CDS 435..915>).>From http://www.ncbi.nlm.nih.gov/collab/FT/index.html
CDS <1..>336 /codon_start=1 /gene="IGHV1" /product="immunoglobulin heavy chain variable region" V_region <1..>336 /gene="IGHV1" /product="immunoglobulin heavy chain variable region">From the BNF grammar definition of the feature table, to be found at http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur
local_location ::=The sample record link seems to be pretty new, but I'm not sure. Shall we simply build upon the BNF? Maybe we should ask someone from NCBI. > > Questions: > 1. Do we need to override the famous pocock RangeI contains/overlaps > methods for a Split location to take into account where the pieces > of the contained LocationI are? > Or do we take the easy route and just use min_start/max_end? I think > that right now start/end return 0 for a split location since they are > not explictly set, should they default to delegating to > min_start/max_start? I think so. > > What about in Fuzzy, do we want to throw exceptions or do we just use > the best information we have and do some logic and coordinate > gymnastics to try and return a reasonable value or else throw an > exception? > As I understood the comments from users, exceptions should be avoided here whenever possible. However, since there are different policies one can think of, a mechanism should be provided to switch between them. > 2. Deep Split/Fuzziness - [copying famous artwork from Ewan's latest > email] > > LocationI > ^ > | > ------------------------ > SingleLocationI SplitLocationI > | sub_Locations defined to return SingleLocationI array > | > ----------------- > SimpleLocationI FuzzyLocationI > > > (does the above crappy ascii art make sense to you?) > > I guess this says that all FuzzyLocations can be made as combination of > a single SplitLocation with a set of FuzzyLocations. > > [ end Ewan's included message ] > > This is exactly what I have assumed. I see SplitLocation as simply a > Collection of LocationI objects some of which may be fuzzy. The only > problem is how to define min_start/max_end for a > SplitLocation when the beginning and end of the locations are fuzzy? > > As for deep SplitLocation (ie SplitLocation containing Location objects > that are SplitLocations), this will work in a very gross way just like > perl flattens arrays, except I don't plan to simplify the join(...join()) > code into a single join() unless you guys think its worth it. It wouldn't > be hard, just let perl collapse the arrays... > Be aware that you don't lose information you need for recovering the original location entry upon writing. If that seems to inflate the object tree unnecessarily, we can also store the original location string as a property. Not beautiful, but KISS is not a bad principle. > Any other problems you guys can think of. > > So close... I wonder if we should include Alan on this so we can see if > the biocorba IDL will really handle all of this now? I guess I could To my understanding BioCorba and BioPerl pretty much affect each other, don't they? If so, we should definitely get a comment from him. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Thu Jan 25 21:18:44 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 25 Jan 2001 13:18:44 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References:| | base_position ::= | | | low_base_bound ::= > high_base_bound ::= < two_base_bound ::= . between_position ::= ^ base_range ::= ..
> If the "<" symbol precedes a base span, the sequence is partial on the > 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, > the > sequence is partial on the 3' end (e.g., CDS 435..915>). >> > >From http://www.ncbi.nlm.nih.gov/collab/FT/index.html > >
> CDS <1..>336 > /codon_start=1 > /gene="IGHV1" > /product="immunoglobulin heavy chain variable region" > V_region <1..>336 > /gene="IGHV1" > /product="immunoglobulin heavy chain variable region" >> > >From the BNF grammar definition of the feature table, to be found at > http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur > >
> local_location ::=> I just looked for an example at NCBI and found this: http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&list_uids=234355&dopt=GenBank As you can see, the symbol '>' does end up BEFORE the position it is modifing which is consistant with the BNF. Hope this helps... LOCUS S52564 10 bp DNA PRI 05-APR-1999 DEFINITION Homo sapiens phenylalanine hydroxylase (PAH) gene, partial cds. ACCESSION S52564 VERSION S52564.1 GI:234355 SOURCE human. ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. FEATURES Location/Qualifiers source 1..10 /organism="Homo sapiens" /db_xref="taxon:9606" gene <1..>10 /gene="PAH" CDS <1..>10 /gene="PAH" /note="missense mutation" /codon_start=2 /product="phenylalanine hydroxylase" /protein_id="AAD14912.2" /db_xref="GI:4559419" /translation="HGV" variation 5..7 /gene="PAH" /note="Gly for Glu221" BASE COUNT 3 a 2 c 3 g 2 t ORIGIN 1 ccatggagta // Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work) From mwilkinson@gene.pbi.nrc.ca Thu Jan 25 21:44:36 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Thu, 25 Jan 2001 15:44:36 -0600 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References:| | > base_position ::= | | | > > > low_base_bound ::= > > > high_base_bound ::= < > > two_base_bound ::= . > > between_position ::= ^ > > base_range ::= .. >
Why not have Bio::SeqAnalysisParserFactoryI $parser = $factory->create_parser(-fh => \*FILE); Bio::SeqAnalyisParserI while( $next_feature = $parser->next_feature ) { } same number of functions defined. Twice the number of interfaces, but these are the interfaces I would argue we want. An implementation could implement ParserFactoryI and ParserI in the same module if so wished. ------For the factory interface I propose to open a new directory Bio::Factory, first to avoid cluttering of other directories, and second because there are many places in BioPerl that can eventually take advantage of a factory design (basically, wherever hard-coded object creation occurs, e.g. in SeqIO::* etc), so that directory hopefully won't stay empty for long. Any objections? If not, I'll give it a go soon. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 09:26:51 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 09:26:51 +0000 (GMT) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To:Message-ID: On Tue, 30 Jan 2001, David Block wrote: > On Tue, 30 Jan 2001, Hilmar Lapp wrote: > > > Mark Wilkinson wrote: > > > > > > Because this module doesn't really "fit" anywhere in the current BioPerl > > > structure, and because the .xml files that it is based on are still > > > quite fluid (and thus the module will likely have to be tweaked quite > > > extensively until things settle down), I don't feel that it is worth > > > adding into the BioPerl repository at this time. However, I would be > > > glad to share it with anyone who might find it useful, with all the > > > usual disclaimers :-) > > > > > > Let me know, > > > > > > > Wouldn't it make sense to add it to bioperl-gui? > > > > Hilmar > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > than SeqCanvas, maybe. Mark? I think it would be okay. Sounds like the right place to me.... > > -- > David Block > dblock@gene.pbi.nrc.ca > http://bioinfo.pbi.nrc.ca/dblock/wiki > Plant Biotechnology Institute > National Research Council of Canada > Saskatoon, Saskatchewan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 09:32:43 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 09:32:43 +0000 (GMT) Subject: [Bioperl-l] Bio::Factory In-Reply-To: <3A77D61E.CF9D1413@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > In an attempt to address revisit/finalization of the > SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept > the design change Ewan proposed couple of weeks ago: > > ------ > Why not have > > Bio::SeqAnalysisParserFactoryI > > $parser = $factory->create_parser(-fh => \*FILE); > > Bio::SeqAnalyisParserI > > while( $next_feature = $parser->next_feature ) { > > } > > same number of functions defined. Twice the number of interfaces, > but > these are the interfaces I would argue we want. > > An implementation could implement ParserFactoryI and ParserI in > the > same module if so wished. > ------> > For the factory interface I propose to open a new directory > Bio::Factory, first to avoid cluttering of other directories, and > second because there are many places in BioPerl that can > eventually take advantage of a factory design (basically, wherever > hard-coded object creation occurs, e.g. in SeqIO::* etc), so that > directory hopefully won't stay empty for long. > > Any objections? If not, I'll give it a go soon. This sounds really good.... Definitely needed/wanted... > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420. ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 31 13:56:21 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 08:56:21 -0500 (EST) Subject: [Bioperl-l] Bio::Factory In-Reply-To: <3A77D61E.CF9D1413@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > In an attempt to address revisit/finalization of the > SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept > the design change Ewan proposed couple of weeks ago: > > ------ > Why not have > > Bio::SeqAnalysisParserFactoryI > > $parser = $factory->create_parser(-fh => \*FILE); > > Bio::SeqAnalyisParserI > > while( $next_feature = $parser->next_feature ) { > > } > > same number of functions defined. Twice the number of interfaces, > but > these are the interfaces I would argue we want. > > An implementation could implement ParserFactoryI and ParserI in > the > same module if so wished. > ------> > For the factory interface I propose to open a new directory > Bio::Factory, first to avoid cluttering of other directories, and > second because there are many places in BioPerl that can > eventually take advantage of a factory design (basically, wherever > hard-coded object creation occurs, e.g. in SeqIO::* etc), so that > directory hopefully won't stay empty for long. > > Any objections? If not, I'll give it a go soon. Great idea and it is a good place to put these things and can help cleanup some of the clutter for sure. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Wed Jan 31 14:01:23 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 09:01:23 -0500 (EST) Subject: [Bioperl-l] RichSeqI In-Reply-To:Message-ID: On Tue, 30 Jan 2001, Ewan Birney wrote: > > To prove to hilmar that I am doing the RichSeqI stuff, I have committed > the interface. Basically this is a trivial recasting of the "additional > support" currently in Seq.pm which I will move out into > Bio::Seq::RichSeq.pm > > > currently the interface looks like... > > > =head1 NAME > > Bio::Seq::RichSeqI - RichSeq interface, mainly for database orientated > sequences > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > > > =head1 DESCRIPTION > > This interface extends the Bio::SeqI interface to give additional > functionality to sequences with richer data sources, in particular from > database sequences (EMBL, GenBank and Swissprot). > > > Kris, Jason, Hilmar --- comments? We have static set of methods for handling the fields you describe above as well as a set of dynamic methods (via AUTOLOAD) to deal with things like PID (bug #160), genbankid. Or does most of that get wrapped into secondard_accessions? I guess are there any other fields we are missing? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From heikki@ebi.ac.uk Wed Jan 31 15:33:57 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:33:57 +0000 Subject: [Bioperl-l] more fuzziness checked in References: Message-ID: <3A783065.9A629C67@ebi.ac.uk> Jason Stajich wrote: > > more robust fuzzy and split feature handling checked in. > > FTHelper will try and see if start==end, if it does and there is no > splitlocation delimiter then the code will return just a single number > representing the location ie > > variation 500 > /allele="C" > /allele="T" > I am just back from an one week holiday. I'll catch up with the list in a day or two. Jason, In case you really are going to use the above format, it is not valid according to The DDBJ/EMBL/GenBank Feature Table Definition. The allele qualifier gives a common name of the allele in free text, e.g.: /allele="adh1-1" In general there is the rule that there should not be identical feature keys on the same location, but 'variation' is an exception. When we are dealing with SNPs whe do not generally know which of the alleles are present in that particular sequence the SNP is mapped to (unless you want to check the sequence). The correct way to represent diallelic variation in DDBJ/EMBL/GenBank feature table is to repeat the feature key for each allele and use /replace qualifier. variation 500 /replace="C" variation 500 /replace="T" It is ugly but that's what they (EMBL database people) told me to do a few weeks ago when I was writing the to_FTHelper method to SNPs in EnsEMBL. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From paul-christophe.varoutas@curie.fr Wed Jan 31 15:19:06 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Wed, 31 Jan 2001 16:19:06 +0100 Subject: [Bioperl-l] Re: LiveSeq tests warn In-Reply-To: References: <3A7718EC.736A4F85@gmx.net> Message-ID: <5.0.2.1.2.20010131160802.00a62a98@mailhost.curie.fr> I guess you are talking about the small bug I fixed yesterday in /Bio/LiveSeq/SeqI.pm and Bio/LiveSeq/Gene.pm: http://bioperl.org/pipermail/bioperl-guts-l/2001-January/002957.html (I committed after Hilmar's mail and before Joseph's answer). Paul-Christophe At 00:08 31/01/2001 +0000, Joseph Insana wrote: > > Just to let you know, I'm getting warnings on my machine from > > LiveSeq.t and Mutator.t. Could you check whether this might > > indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) > >Strange, I have nothing like that. >Hmmmm. It seems it's complaining because I used "ne" instead than "!=" >to test for something to be -1 or not -1. >My perl is not complaining. >I am running perl v5.6.0 on linux 2.4.0. > >Try please putting "!=" instead than "ne" and see if it gets fixed. > >Joseph At 11:41 30/01/2001 -0800, Hilmar Lapp wrote: >Just to let you know, I'm getting warnings on my machine from >LiveSeq.t and Mutator.t. Could you check whether this might >indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) > > Hilmar > >t/LiveSeq...........Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1202. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1207. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1215. >Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 380. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 385. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 393. >ok > >t/Mutator...........Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1202. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1207. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1215. >Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 380. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 385. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 393. >ok > > >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@bioperl.org >http://bioperl.org/mailman/listinfo/bioperl-l From mwilkinson@gene.pbi.nrc.ca Wed Jan 31 15:25:51 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Wed, 31 Jan 2001 09:25:51 -0600 Subject: [Bioperl-l] GO ontology browser module available References: Message-ID: <3A782E7E.5EEC35CA@gene.pbi.nrc.ca> Ewan Birney wrote: > > > Wouldn't it make sense to add it to bioperl-gui? > > > > > > Hilmar > > > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > > than SeqCanvas, maybe. Mark? I think it would be okay. > > Sounds like the right place to me.... indeed - that was where I intended to put it when it was a little more "polished"... I am just hesitant to use the BioPerl CVS repository to store my half-baked code. There are several things which "don't work right" (tm). I think a lot of this has to do with the fact that I can not get my hands on the GO.dtd - it isn't available on the GO website, though all of the other XML files are (yet they reference the DTD in these same XML files). Neither do I receive a response to inquiries sent to the consortium e-mail address. The consequence is that XML::Parser doesn't know what to do with the HTML-like formatting tags that they are using in some of their "free text", and in some cases tries to treat them as sub-level tags (for example, what should be a subscript or superscript will become a sub-element of the preceeding word, so Carbon 14 parses as $GO->{Carbon}->{14}... which is ridiculous of course....). In addition they use HTML designations for the greek alpha, beta, gamma, and so on, preceeded with an ampersand and ending with a semicolon These can not be parsed by XML::Parser *at all* unless it is specifically told that these are going to be #CDATA elements... which requires a DTD.... which I don't have. So, GO_Browser (for the time being) hacks away at the XML in its first parsing pass, replacing these tags with things that will not break XML::Parser, and then reads from this hacked data. As a result, what you get is not "strict" GO ontology, but a slightly modified version of the same.... which effectively defeats the purpose of GO which is that everyone should use a consensus nomenclature. :-( In any case, after all that griping, I am perfectly willing to cvs add this module to bioperl-gui, so long as I am not judged too harshly by it - I know it's a hack!! :-) I'll get on to that later this afternoon. b.t.w. If anyone can assist me in getting ahold of a GO.dtd please speak up! It would make my miserable life a bit brighter!! -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From heikki@ebi.ac.uk Wed Jan 31 15:43:53 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:43:53 +0000 Subject: [Bioperl-l] RetrictionEnzyme.pm: a proposal References: <5.0.2.1.2.20010129100143.00b32138@pop.wanadoo.fr> Message-ID: <3A7832B9.ED647AAC@ebi.ac.uk> Paul-Christophe, Please have a look at Bio::Variation::VariantI::restriction_changes, too. I would have prefered to use Bio::Tools::RestrictionEnzyme but decided not to depend on it as I found it too complicated. It would be great not to have to duplicate restriction enzyme lists and functionality. If you come up with a solution I'd be happy to remove or modify the restriction_changes method. -Heikki Paul-Christophe Varoutas wrote: > > Yesterday I studied RestrictionEnzyme.pm more in depth. I haven't yet added > the methods I wanted to, because in my opinion it is far more urgent for > this module to get some redesigning. > > The module somewhat suffers of poor design, and just adding methods to it > will just worsen the situation. > > RestrictionEnzyme has methods which are proper to the restriction enzymes: > - seq() is the accessor method to the enzyme's recognition sequence. > - cut_seq() "cuts" a Bio::Seq-derived object and generates an array of > restriction site fragments. > - cuts_seq_at() does the same but this time generates an array of > restriction site coordinates. > > and methods which are proper to the list of enzymes: > - is_available() says if a particular enzyme is in the list. > - available_list() gives the list of all enzymes or list of n-base cutters. > > Steve Chervitz already suggested in the module's documentation that > is_available() "may be more appropriate for a REData.pm class", and I share > his opinion. From a conceptual point of view, the existing > RestrictionEnzyme.pm module corresponds to two object classes, not one. > > Here is an outline of my proposal: > > Separate RestrictionEnzyme in two classes: > > RestrictionEnzymeDBase (or whatever more appropriate): > - members: the list of restriction enzymes. > - methods: > - constructor using hardwired list of enzymes OR user file OR URL. > - add/remove enzyme to/from list (adding will be the equivalent of > _make_custom() ). > - member accessor methods: already existing methods: is_available(), > available_list(). > > RestrictionEnzyme: > - members: the same as now (_name, _seq, _site, _cuts_after). > - methods: > - constructor (equivalent to the constructor calling the > _make_standard() sub). > - already existing accessor methods. > - already existing methods: cut_seq, cuts_seq_at, etc. > > This design, apart from being more "correct", will facilitate any future > extensions of the two modules. The drawback in separating RestrictionEnzyme > in two classes is that all code using RestrictionEnzyme.pm will have to be > modified. > > Perhaps we should take advantage of the imminent release of the 0.7 version > and decide to proceed in the redesigning. If we change the design this will > also be the opportunity to slightly change/extend its public interface to > add small new functionalities such as being able to add and use asymmetric > cutters and enzymes which cut outside the recognition site (perhaps just > incorporating small changes now in order to be in time for the 0.7 release > and leaving extensions for afterwards, especially if I do this alone based > on what we decide). > > Tell me what you think about it: > - First of all, is redesigning possible or are we obliged to maintain > compatibility ? In the latter case I will just add functionality, > maintaining the poor design of the module. > - If redesigning is possible, please make comments/suggestions. > > Paul-Christophe > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 15:51:18 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:51:18 +0000 Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm References:Message-ID: <3A783476.5DE01D5D@ebi.ac.uk> I read "A Really Good Book" recently about CVS and found out that you can put in your home directory a .cvsrc file with for example following lines: update -d cvs -q -z9 After that 'cvs update' is automatically expanded to 'cvs -q -z9 update -d'! -Heikki Ewan Birney wrote: > > On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > Dear Group, > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > (line 122), which apparently does not exist. Is it just my installation > > which is wonky, or is this a genuine bug? > > cvs update -d > > > > > any advice appreciated. > > > > cheers all! > > > > M > > > > > > -- > > --- > > Dr. Mark Wilkinson > > Bioinformatics Group > > National Research Council of Canada > > Plant Biotechnology Institute > > 110 Gymnasium Place > > Saskatoon, SK > > Canada > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 16:52:20 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 16:52:20 +0000 Subject: [Bioperl-l] Incompatibility with Perl v5.6.0 [Fwd: XML::Parse test fails] Message-ID: <3A7842C4.6F05ED9D@ebi.ac.uk> It might be worth adding this into release notes of the upcoming 0.7 release. As a result Bio::Variation XML input and output does not work under Perl v5.6.0. We have to pray that 5.6.1 will be out soon. -Heikki David Megginson wrote: > > Heikki Lehvaslaiho writes: > > > I recently upgraded to Perl v5.6.0. As result the XML::Parse test > > script fails and CPAN does not install it: > > There is a known bug in Perl 5.6 when passing array references. > > All the best, > > David > > -- > David Megginson david@megginson.com > http://www.megginson.com/ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 17:11:26 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 17:11:26 +0000 Subject: [Bioperl-l] Re: Bio::Root::Object cleanup References: <3A771792.DB06ACA6@gmx.net> Message-ID: <3A78473E.554A78C1@ebi.ac.uk> Hilmar Lapp wrote: ... > In addition, the Variation code contains the line > Bio/Variation/IO.pm: return Bio::Root::Object::new($class, > %param); > Heikki, I don't know about the context, just wanted to make sure > this is indispensable. It is not. I copied it over from Bio::SeqIO at some point. Removed. -Heikki > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From hlapp@gmx.net Wed Jan 31 17:58:44 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 09:58:44 -0800 Subject: [Bioperl-l] RichSeqI References: Message-ID: <3A785254.7E3A11BD@gmx.net> Ewan Birney wrote: > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > What about species()? Just popped into my head. Right now a class implementing both SeqI and RichSeqI doesn't have to have that, even though it's present in probably most 'rich' databanks. What do you think about moving it, too? (It's now in Seq.pm.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From cjm@fruitfly.bdgp.berkeley.edu Wed Jan 31 18:02:05 2001 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Wed, 31 Jan 2001 10:02:05 -0800 (PST) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To: <3A782E7E.5EEC35CA@gene.pbi.nrc.ca> Message-ID: Hi Mark Sorry you haven't heard back from us GO people, all the GO developers are working full time on another project at the moment, just keep at us and we'll respond eventually. We should fix the problem of the SGML embedded within XML - Brad, can you see to this? In the meantime, have you tried using either the flat files or the mysql database? there are perl modules for using either of these in the GO repository. As to where you deposit your code, I'd love to keep all the GO code together in one cvs repository. Unfortunately, the stanford cvs server is highly restricted. I was considering moving the perl software portion of GO away from the stanford cvs server into the Berkeley one, for this reason. Another option would be to use bioperl cvs for all of GO-perl, if people are willing. if anyone's interested the GO module docs are here: http://www.fruitfly.org/annot/go/database/modules/GO::AppHandle.html On Wed, 31 Jan 2001, Mark Wilkinson wrote: > Ewan Birney wrote: > > > > > Wouldn't it make sense to add it to bioperl-gui? > > > > > > > > Hilmar > > > > > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > > > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > > > than SeqCanvas, maybe. Mark? I think it would be okay. > > > > Sounds like the right place to me.... > > indeed - that was where I intended to put it when it was a little more > "polished"... I am just hesitant to use the BioPerl CVS repository to store my > half-baked code. > > There are several things which "don't work right" (tm). I think a lot of this > has to do with the fact that I can not get my hands on the GO.dtd - it isn't > available on the GO website, though all of the other XML files are (yet they > reference the DTD in these same XML files). Neither do I receive a response to > inquiries sent to the consortium e-mail address. > > The consequence is that XML::Parser doesn't know what to do with the HTML-like > formatting tags that they are using in some of their "free text", and in some > cases tries to treat them as sub-level tags (for example, what should be a > subscript or superscript will become a sub-element of the preceeding word, so > Carbon 14 parses as $GO->{Carbon}->{14}... which is ridiculous of > course....). In addition they use HTML designations for the greek alpha, beta, > gamma, and so on, preceeded with an ampersand and ending with a semicolon These > can not be parsed by XML::Parser *at all* unless it is specifically told that > these are going to be #CDATA elements... which requires a DTD.... which I don't > have. > > So, GO_Browser (for the time being) hacks away at the XML in its first parsing > pass, replacing these tags with things that will not break XML::Parser, and then > reads from this hacked data. As a result, what you get is not "strict" GO > ontology, but a slightly modified version of the same.... which effectively > defeats the purpose of GO which is that everyone should use a consensus > nomenclature. :-( > > In any case, after all that griping, I am perfectly willing to cvs add this > module to bioperl-gui, so long as I am not judged too harshly by it - I know it's > a hack!! :-) > > I'll get on to that later this afternoon. > > b.t.w. If anyone can assist me in getting ahold of a GO.dtd please speak up! It > would make my miserable life a bit brighter!! > > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From birney@ebi.ac.uk Wed Jan 31 18:10:21 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 18:10:21 +0000 (GMT) Subject: [Bioperl-l] RichSeqI In-Reply-To: <3A785254.7E3A11BD@gmx.net> Message-ID:On Wed, 31 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > =head1 SYNOPSIS > > > > @secondary = $richseq->get_secondary_accessions; > > $division = $richseq->division; > > $mol = $richseq->molecule; > > @dates = $richseq->get_dates; > > $seq_version = $richseq->seq_version; > > > > What about species()? Just popped into my head. Right now a class > implementing both SeqI and RichSeqI doesn't have to have that, > even though it's present in probably most 'rich' databanks. What > do you think about moving it, too? (It's now in Seq.pm.) Hmmmm. I would guess it would go to SeqI. It should be somewhere. I'm agnostic. If we move it out to RichSeq genbank/embl IO have to be able to generate dummy Species lines... > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 31 19:10:30 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 11:10:30 -0800 Subject: [Bioperl-l] Bio::Factory::SeqAnalysisParserFactoryI Message-ID: <3A786326.D0DCBFFE@gmx.net> Interface committed. Check out the documentation. If you approve it, I'll add the implementation. The obvious question with regard to SeqFeatureProducer is what will happen to the add_features() method. In principle the implementation is simple enough to just dismiss it; as we already felt a couple of times it doesn't really add that much value. So, let me know what you think. Hilmar -------- Original Message -------- Subject: Bio::Factory Date: Wed, 31 Jan 2001 01:08:46 -0800 From: Hilmar Lapp Organization: Nereis 4 To: Bioperl In an attempt to address revisit/finalization of the SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept the design change Ewan proposed couple of weeks ago: ------ Why not have Bio::SeqAnalysisParserFactoryI $parser = $factory->create_parser(-fh => \*FILE); Bio::SeqAnalyisParserI while( $next_feature = $parser->next_feature ) { } same number of functions defined. Twice the number of interfaces, but these are the interfaces I would argue we want. An implementation could implement ParserFactoryI and ParserI in the same module if so wished. ------For the factory interface I propose to open a new directory Bio::Factory, first to avoid cluttering of other directories, and second because there are many places in BioPerl that can eventually take advantage of a factory design (basically, wherever hard-coded object creation occurs, e.g. in SeqIO::* etc), so that directory hopefully won't stay empty for long. Any objections? If not, I'll give it a go soon. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 19:15:37 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 19:15:37 +0000 (GMT) Subject: [Bioperl-l] Re: Bio::Factory::SeqAnalysisParserFactoryI In-Reply-To: <3A786326.D0DCBFFE@gmx.net> Message-ID:On Wed, 31 Jan 2001, Hilmar Lapp wrote: > Interface committed. Check out the documentation. If you approve > it, I'll add the implementation. > > The obvious question with regard to SeqFeatureProducer is what > will happen to the add_features() method. In principle the > implementation is simple enough to just dismiss it; as we already > felt a couple of times it doesn't really add that much value. So, > let me know what you think. > I don't like the add_features method much myself... Jason? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 31 20:17:10 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 15:17:10 -0500 (EST) Subject: [Bioperl-l] Re: Bio::Factory::SeqAnalysisParserFactoryI In-Reply-To: Message-ID: kill it, that's fine. We should instead be providing better example scripts rather than wrapping something that simple into an object since all the work is done by the Seq object. On Wed, 31 Jan 2001, Ewan Birney wrote: > On Wed, 31 Jan 2001, Hilmar Lapp wrote: > > > Interface committed. Check out the documentation. If you approve > > it, I'll add the implementation. > > > > The obvious question with regard to SeqFeatureProducer is what > > will happen to the add_features() method. In principle the > > implementation is simple enough to just dismiss it; as we already > > felt a couple of times it doesn't really add that much value. So, > > let me know what you think. > > > > I don't like the add_features method much myself... Jason? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From krbou@pgsgent.be Wed Jan 31 21:43:09 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Wed, 31 Jan 2001 22:43:09 +0100 Subject: [Bioperl-l] Cruft in module documentation ? Message-ID: <20010131224309.B24431@gryzo.pgsgent.be> In testing the documentation (SYNOPSIS) part I already fixed some errors (more to come during the coming days), but I don't know what to do with this one (I guess it can be removed). The SYNOPSIS for Bio::Annotation contains [ ...] # # Making an annotation object from scratch # $ann = Bio::Pfam::Annotation->new(); $ann->description("Description text"); print "Annotation description is ", $ann->description, "\n"; I can't find any reference to Bio::Pfam::Annotation, is this a remainder of history ? Kris, From birney@ebi.ac.uk Wed Jan 31 22:03:29 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 22:03:29 +0000 (GMT) Subject: [Bioperl-l] Cruft in module documentation ? In-Reply-To: <20010131224309.B24431@gryzo.pgsgent.be> Message-ID: On Wed, 31 Jan 2001, Kris Boulez wrote: > In testing the documentation (SYNOPSIS) part I already fixed some errors > (more to come during the coming days), but I don't know what to do with > this one (I guess it can be removed). > The SYNOPSIS for Bio::Annotation contains > > [ ...] > # > # Making an annotation object from scratch > # > > $ann = Bio::Pfam::Annotation->new(); > > $ann->description("Description text"); > print "Annotation description is ", $ann->description, "\n"; > > > I can't find any reference to Bio::Pfam::Annotation, is this a remainder > of history ? This is historical cruft. s/Pfam:://g; > > Kris, > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From krbou@pgsgent.be Wed Jan 31 22:32:21 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Wed, 31 Jan 2001 23:32:21 +0100 Subject: [Bioperl-l] Cruft in module documentation ? In-Reply-To: ; from birney@ebi.ac.uk on Wed, Jan 31, 2001 at 10:03:29PM +0000 References: <20010131224309.B24431@gryzo.pgsgent.be> Message-ID: <20010131233221.A24783@gryzo.pgsgent.be> Quoting Ewan Birney (birney@ebi.ac.uk): > On Wed, 31 Jan 2001, Kris Boulez wrote: > > > > > > > I can't find any reference to Bio::Pfam::Annotation, is this a remainder > > of history ? > > This is historical cruft. s/Pfam:://g; > Done. Kris, From Cox, Greg" I know that there are some people on the BioPerl list who went into the same trouble and managed to have some success. Please reply directly to Greg, as it wasn't me who had the question. Hilmar -------- Original Message -------- Subject: [Biojava-l] WinCVS and SSH Date: Wed, 31 Jan 2001 14:08:06 -0500 From: "Cox, Greg"