From M.W.E.J.Fiers@plant.wag-ur.nl Tue Jan 2 12:52:57 2001 Date: Tue, 02 Jan 2001 13:52:57 +0100 From: Fiers, M.W.E.J. M.W.E.J.Fiers@plant.wag-ur.nl Subject: [Bioperl-l] Computation object
Hi

Concerning the computation.pm object; I've seem to have made a rather stupid
mistake, I seem to have failed to do an actual commit last time. So I've
given it another try. If somebody feels like it, please take a look.
I didn't implement the structure Ewan proposed. If people like my
implementation of this object, I will do it.

Mark Fiers
Plant Research International


From jason@chg.mc.duke.edu Tue Jan 2 15:58:18 2001 Date: Tue, 2 Jan 2001 10:58:18 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] call for more tests
In the continued effort to check every module in our distribution before
0.7 is released.  I wondered if anyone does use Bio::SeqIO::scf?  I need
some test files for it.  
Thanks.

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From jason@chg.mc.duke.edu Tue Jan 2 17:19:38 2001 Date: Tue, 2 Jan 2001 12:19:38 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] test framework
while I'm messing with it, does anyone have objections to using the built
in perl Test module available since perl 5.004 rather than our 

I agree it is wasted time to constantly move things from one test suite to
another ( I already tried to standardize our existing ones as best as
possible).  But a nice standard makes it easier for new people to write
tests and make them fit.  Any comments?

sub test ($$;$) {
    my($num, $true,$msg) = @_;
    print($true ? "ok $num\n" : "not ok $num $msg\n");
}
  
[ from perldoc Test ]

      use strict;
       use Test;

       # use a BEGIN block so we print our plan before MyModule is loaded
       BEGIN { plan tests => 14, todo => [3,4] }

       # load your module...
       use MyModule;

       ok(0); # failure
       ok(1); # success

       ok(0); # ok, expected failure (see todo list, above)
       ok(1); # surprise success!

       ok(0,1);             # failure: '0' ne '1'
       ok('broke','fixed'); # failure: 'broke' ne 'fixed'
       ok('fixed','fixed'); # success: 'fixed' eq 'fixed'
       ok('fixed',qr/x/);   # success: 'fixed' =~ qr/x/

     ok(sub { 1+1 }, 2);  # success: '2' eq '2'
       ok(sub { 1+1 }, 3);  # failure: '2' ne '3'
       ok(0, int(rand(2));  # (just kidding :-)

       my @list = (0,0);
       ok @list, 3, "\@list=".join(',',@list);      #extra diagnostics
       ok 'segmentation fault', '/(?i)success/';    #regex match

       skip($feature_is_missing, ...);    #do platform specific test



Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From krbou@pgsgent.be Tue Jan 2 21:21:36 2001 Date: Tue, 2 Jan 2001 22:21:36 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SWISS-PROT writing
[ I know there are some specialists on SWISS-PROT on this list, so I
might make a fool of me, but here goes ]

When chasing down the reason why swiss.pm was not able to read a
SWISS-PROT formatted file it wrote itself I found the following things
which look suspicious in write_seq()

- at line 356 there is 
   $mol = $seq->molecule;
I think this should be $seq->moltype; as ->molecule only looks for
{'molecule'} which is not set by ->new. Bio::Seq->new only sets
{'moltype'}.
We should change the 'protein' of ->moltype to 'PRT' to conform to the
standard.

B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA
sequences ?


- around line 369 the whole else block should be changed. We should make
  sure we have a division ($div) in the ID part. The previous version of
the code which is now commented out did a better try at this. Looking at
next_seq() we why we're not able to read this (entry name must contain
an underscore section 3.1.1 of the SWISS-PROT manual).

    $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
     || $self->throw("swissprot stream with no ID. Not swissprot in my
book");
   $name = $1."_".$2;
   $seq->primary_id($1);
   $seq->division($2);

How standard compliant do we want to be with this. If we want to be very
strict we should e.g. make sure the 'entry name' (first item on the ID
line) is not more then 10 characters.

P.S. (very) minor issue: the division we choose 'UNK' for sequences
which don't have a division set is not in the standard (speclist.txt),
it only contains UNKP

Should I try to adopt swiss.pm to the thoughts I (tried to) put out or
are there major objections ?


Kris,

From lapp@gnf.org Tue Jan 2 23:45:28 2001 Date: Tue, 02 Jan 2001 15:45:28 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] SWISS-PROT writing
Kris Boulez wrote:
> 
> 
> - at line 356 there is
>    $mol = $seq->molecule;
> I think this should be $seq->moltype; as ->molecule only looks for
> {'molecule'} which is not set by ->new. Bio::Seq->new only sets
> {'moltype'}.
> We should change the 'protein' of ->moltype to 'PRT' to conform to the
> standard.

moltype() is internal to BioPerl. Whenever there is an attribute synonymous
to moltype() but defined by a databank, molecule() should be used for that.
So the code is correct I think.

Bio::Seq->new() indeed only sets moltype(), because at this point there is
no databank specificity. molecule() should be set by the parser. If you
want to instantiate a swissprot seq from memory and have it written in
swissprot format, the way we want to go is have dedicated classes under
Bio::Seq::*. If there is need for a swissprot-dedicated class, that one
probably would also set molecule() at instantiation time.

> 
> B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA
> sequences ?

In my opinion there's no need for that, but others may think differently.

> 
> - around line 369 the whole else block should be changed. We should make
>   sure we have a division ($div) in the ID part. The previous version of
> the code which is now commented out did a better try at this. Looking at
> next_seq() we why we're not able to read this (entry name must contain
> an underscore section 3.1.1 of the SWISS-PROT manual).
> 
>     $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
>      || $self->throw("swissprot stream with no ID. Not swissprot in my
> book");
>    $name = $1."_".$2;
>    $seq->primary_id($1);
>    $seq->division($2);
> 

If this is the code you're referring to (sorry, don't have at hand right
now), it does ensure that there is a division part. I'm probably missing
something.

> How standard compliant do we want to be with this. If we want to be very
> strict we should e.g. make sure the 'entry name' (first item on the ID
> line) is not more then 10 characters.
> 
> P.S. (very) minor issue: the division we choose 'UNK' for sequences
> which don't have a division set is not in the standard (speclist.txt),
> it only contains UNKP
> 

Sure, can (should) be changed.

> Should I try to adopt swiss.pm to the thoughts I (tried to) put out or
> are there major objections ?
>

See above. I'm not sure what we already have in the Bio::Seq::* hierarchy.
If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could
give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects
of that class.

Apart from this, Lorenz may wish to comment. He's been our Swissprot
cruncher for a while, but haven't heard from him for some time. Lorenz,
still out there?

Happy new year to all.

	Hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From schattner@alum.mit.edu Wed Jan 3 02:26:20 2001 Date: Tue, 02 Jan 2001 18:26:20 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] call for more tests
Jason Stajich wrote:
> 
> In the continued effort to check every module in our distribution before
> 0.7 is released.  I wondered if anyone does use Bio::SeqIO::scf?  I need
> some test files for it.
> Thanks.

I can't help you with Bio::SeqIO::scf, but I can add a couple of other
missing tests to your list: 
Bio::Tools::SeqPattern does not have a "t" file. (By the way,
seq_pattern.pl in the examples directory crashes - I just submitted a
bug report).
Bio:Tools:SeqStats currently only has one very simple test (located in
Tools.t)  Previously there were several more tests that seem to have
disappeared.  I can upload the additional tests again if you like.

Peter

From schattner@alum.mit.edu Wed Jan 3 02:31:14 2001 Date: Tue, 02 Jan 2001 18:31:14 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] A couple of CVS questions.
A couple of CVS questions.  

1. How can one access earlier releases of bioperl?  I haven't been able
to find them on CVS or elsewhere.  Where should I be looking?

2. Some modules were moved to different directories within the CVS
structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to
Bio::Tools::Run::Alignment::Clustalw.pm ).  Since then, I don't seem to
be able to find the versions of the modules made prior to the date that
the modules were moved.  Can someone tell me if these older versions are
accessible and if so how to find them.

Thanks

Peter Schattner

From lapp@gnf.org Wed Jan 3 04:16:02 2001 Date: Tue, 02 Jan 2001 20:16:02 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] A couple of CVS questions.
Peter Schattner wrote:
> 
> A couple of CVS questions.
> 
> 1. How can one access earlier releases of bioperl?  I haven't been able
> to find them on CVS or elsewhere.  Where should I be looking?
> 

You can checkout based on one of version, tag, or date. You very likely
don't want to checkout a release by version, as each file has a different
version. There is a tag for the 0.6.x release branch, and also for other
releases. If you want to checkout the whole development trunk in an earlier
version, the most sensible way is probably to go by date (option -D). For
individual modules you can go either way.

Do you have the manpages of cvs? They're actually poor compared to the
info-files cvs comes with. On a Unix box with info installed you should be
able to type 'info cvs'.

> 2. Some modules were moved to different directories within the CVS
> structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to
> Bio::Tools::Run::Alignment::Clustalw.pm ).  Since then, I don't seem to
> be able to find the versions of the modules made prior to the date that
> the modules were moved.  Can someone tell me if these older versions are
> accessible and if so how to find them.

The files were moved without retaining the revision history (cvs is bad at
file moving and renaming; you have to mess with the repository in order to
have cvs history preserved in this case). The version at the former
location was deleted, so you can restore it at the former place only. The
file at the new location has lost all its revision information before the
move.

Hope this helps.

	Hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From dagdigian@ComputeFarm.com Wed Jan 3 06:12:04 2001 Date: Wed, 03 Jan 2001 01:12:04 -0500 From: Chris Dagdigian dagdigian@ComputeFarm.com Subject: [Bioperl-l] A couple of CVS questions.
ftp://bioperl.org/pub/DIST/

All of our old 'official' bioperl release tarballs can be found there.

Regards,
Chris


At 06:31 PM 1/2/01 -0800, Peter Schattner wrote:
>A couple of CVS questions.
>
>1. How can one access earlier releases of bioperl?  I haven't been able
>to find them on CVS or elsewhere.  Where should I be looking?


From krbou@pgsgent.be Wed Jan 3 07:29:43 2001 Date: Wed, 3 Jan 2001 08:29:43 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SWISS-PROT writing
Quoting Hilmar Lapp (lapp@gnf.org):
> Kris Boulez wrote:
> > 
> > 
> > - at line 356 there is
> >    $mol = $seq->molecule;
> > I think this should be $seq->moltype; as ->molecule only looks for
> > {'molecule'} which is not set by ->new. Bio::Seq->new only sets
> > {'moltype'}.
> > We should change the 'protein' of ->moltype to 'PRT' to conform to the
> > standard.
> 
> moltype() is internal to BioPerl. Whenever there is an attribute synonymous
> to moltype() but defined by a databank, molecule() should be used for that.
> So the code is correct I think.
> 
Then documentation for Bio::Seq->molecule() should be extended a bit. It
now reads

       molecule


        Title   : molecule
        Usage   : $obj->molecule($newval)
        Function:
        Returns : type of molecule (DNA, mRNA)
        Args    : newvalue (optional)


> Bio::Seq->new() indeed only sets moltype(), because at this point there is
> no databank specificity. molecule() should be set by the parser. If you
> want to instantiate a swissprot seq from memory and have it written in
> swissprot format, the way we want to go is have dedicated classes under
> Bio::Seq::*. If there is need for a swissprot-dedicated class, that one
> probably would also set molecule() at instantiation time.
> 
> > 
> > B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA
> > sequences ?
> 
> In my opinion there's no need for that, but others may think differently.
> 
> > 
> > - around line 369 the whole else block should be changed. We should make
> >   sure we have a division ($div) in the ID part. The previous version of
> > the code which is now commented out did a better try at this. Looking at
> > next_seq() we why we're not able to read this (entry name must contain
> > an underscore section 3.1.1 of the SWISS-PROT manual).
> > 
> >     $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
> >      || $self->throw("swissprot stream with no ID. Not swissprot in my
> > book");
> >    $name = $1."_".$2;
> >    $seq->primary_id($1);
> >    $seq->division($2);
> > 
> 
> If this is the code you're referring to (sorry, don't have at hand right
> now), it does ensure that there is a division part. I'm probably missing
> something.
> 
Sorry I wasn't clear on this one obviously. The code I pasted is from
next_seq(). What I was referring to is the code in write_seq(). In there
we do not enforce that there is a division part (I think we should at
least check if $seq->display_id() returns an underscore in a reasonable
position). The code reads

   } else {
       #$temp_line = sprintf ("%10s     STANDARD;      %3s;   %d AA.",
       #                     $seq->primary_id()."_".$div,$mol,$len);
       # Reconstructing the ID relies heavily upon the input source
       # having
       # been in a format that is parsed as this routine expects it --
       # that is,
       # by this module itself. This is bad, I think, and immediately
       # breaks
       # if e.g. the Bio::DB::GenPept module is used as input.
       # Hence, switch to display_id(); _every_ sequence is supposed to
       # have
       # this. HL 2000/09/03
       $temp_line = sprintf ("%10s     STANDARD;      %3s;   %d AA.",
                             $seq->display_id(), $mol, $len);
   }


> > How standard compliant do we want to be with this. If we want to be very
> > strict we should e.g. make sure the 'entry name' (first item on the ID
> > line) is not more then 10 characters.
> > 
> > P.S. (very) minor issue: the division we choose 'UNK' for sequences
> > which don't have a division set is not in the standard (speclist.txt),
> > it only contains UNKP
> > 
> 
> Sure, can (should) be changed.
> 
> > Should I try to adopt swiss.pm to the thoughts I (tried to) put out or
> > are there major objections ?
> >
> 
> See above. I'm not sure what we already have in the Bio::Seq::* hierarchy.
> If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could
> give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects
> of that class.
> 
The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do
you plan on having a Bio::Seq::* class for every (complex) sequence type ?

Kris,

From jason@chg.mc.duke.edu Wed Jan 3 14:17:01 2001 Date: Wed, 3 Jan 2001 09:17:01 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] A couple of CVS questions.
On Tue, 2 Jan 2001, Hilmar Lapp wrote:

> Peter Schattner wrote:
> > 
> > A couple of CVS questions.
> > 
> > 1. How can one access earlier releases of bioperl?  I haven't been able
> > to find them on CVS or elsewhere.  Where should I be looking?
> > 
> 
> You can checkout based on one of version, tag, or date. You very likely
> don't want to checkout a release by version, as each file has a different
> version. There is a tag for the 0.6.x release branch, and also for other
> releases. If you want to checkout the whole development trunk in an earlier
> version, the most sensible way is probably to go by date (option -D). For
> individual modules you can go either way.
> 
> Do you have the manpages of cvs? They're actually poor compared to the
> info-files cvs comes with. On a Unix box with info installed you should be
> able to type 'info cvs'.
> 
> > 2. Some modules were moved to different directories within the CVS
> > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to
> > Bio::Tools::Run::Alignment::Clustalw.pm ).  Since then, I don't seem to
> > be able to find the versions of the modules made prior to the date that
> > the modules were moved.  Can someone tell me if these older versions are
> > accessible and if so how to find them.
> 
> The files were moved without retaining the revision history (cvs is bad at
> file moving and renaming; you have to mess with the repository in order to
> have cvs history preserved in this case). The version at the former
> location was deleted, so you can restore it at the former place only. The
> file at the new location has lost all its revision information before the
> move.

Many apologies, this was my stupidness for not moving the files the
correct way.  I wish I had waited for Hilmar's email.... Learned my
lesson though.... I didn't realize we could move the RCS files (itchy
trigger finger) before I moved the src files.  If you look at the
first date in Bio::Tools::Run::Alignment or Bio::Tools::StandAloneBlast
you can see when the move occurred and then checkout with -D as some day
or time before then.

> 
> Hope this helps.
> 
> 	Hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp@gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From birney@ebi.ac.uk Wed Jan 3 14:50:53 2001 Date: Wed, 3 Jan 2001 14:50:53 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] A couple of CVS questions.
On Wed, 3 Jan 2001, Jason Stajich wrote:

> On Tue, 2 Jan 2001, Hilmar Lapp wrote:
> 
> > Peter Schattner wrote:
> > > 
> > > A couple of CVS questions.
> > > 
> > > 1. How can one access earlier releases of bioperl?  I haven't been able
> > > to find them on CVS or elsewhere.  Where should I be looking?
> > > 
> > 
> > You can checkout based on one of version, tag, or date. You very likely
> > don't want to checkout a release by version, as each file has a different
> > version. There is a tag for the 0.6.x release branch, and also for other
> > releases. If you want to checkout the whole development trunk in an earlier
> > version, the most sensible way is probably to go by date (option -D). For
> > individual modules you can go either way.
> > 
> > Do you have the manpages of cvs? They're actually poor compared to the
> > info-files cvs comes with. On a Unix box with info installed you should be
> > able to type 'info cvs'.
> > 
> > > 2. Some modules were moved to different directories within the CVS
> > > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to
> > > Bio::Tools::Run::Alignment::Clustalw.pm ).  Since then, I don't seem to
> > > be able to find the versions of the modules made prior to the date that
> > > the modules were moved.  Can someone tell me if these older versions are
> > > accessible and if so how to find them.
> > 
> > The files were moved without retaining the revision history (cvs is bad at
> > file moving and renaming; you have to mess with the repository in order to
> > have cvs history preserved in this case). The version at the former
> > location was deleted, so you can restore it at the former place only. The
> > file at the new location has lost all its revision information before the
> > move.
> 
> Many apologies, this was my stupidness for not moving the files the
> correct way.  I wish I had waited for Hilmar's email.... Learned my
> lesson though.... I didn't realize we could move the RCS files (itchy
> trigger finger) before I moved the src files.  If you look at the
> first date in Bio::Tools::Run::Alignment or Bio::Tools::StandAloneBlast
> you can see when the move occurred and then checkout with -D as some day
> or time before then.

It is, in my book, bad form to move the actual files. If you move files
then CVS checkouts on old versions screw up with sometimes disasterous
effects. 

The removal and cvs add is "The Right Way" tm in my book.



> 
> > 
> > Hope this helps.
> > 
> > 	Hilmar
> > -- 
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp@gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> > 
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/ 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From jason@chg.mc.duke.edu Wed Jan 3 17:20:17 2001 Date: Wed, 3 Jan 2001 12:20:17 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] named parameters
This is a bit on inconsistency when we specify parameters to new in some
of the bioperl modules.  Whenever we don't use named parameters (ie
-file=> 'filename'), we are inconsistent with the fact that all modules
inherit from  Bio::Root::RootI.  This is because Bio::Root::RootI will
parse a couple of special parameters - specifically 
-verbose, -strict, -name, -obj, -record_err

now we really don't use these that much, however, in the case of
Bio::Species

one would call 
my @classification = qw( sapiens Homo Hominidae
                                   Catarrhini Primates Eutheria
                                   Mammalia Vertebrata Chordata
                                   Metazoa Eukaryota )

my $sp = new Bio::Species(@classification);

but if one also wanted debugging turned on, one might call this
my $sp = new Bio::Species(-verbose=>1, @classification);

This won't bother RootI, but Bio::Species expects all the parameters to be
part of the classification array.

A solution is to change Bio::Species to expect named parameters so an
array ref is 

$sp = new Bio::Species(-verbose=>1, -classification => \@classification );

What are people's reactions to this?  If we can agree that this is
expected then we can add this to our programming conventions wiki page.

-Jason



From birney@ebi.ac.uk Wed Jan 3 17:31:25 2001 Date: Wed, 3 Jan 2001 17:31:25 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] test failures on main trunk
perl 5.004_04 is failing again. Some I can fix, others Peter/Jason might
want to take a peek at. They are

Failed Test  Status Wstat Total Fail  Failed  List of failed
-------------------------------------------------------------------------------
t/Clustalw.t                  9    1  11.11%  4
t/DB.t            0    11    ??   ??       %  ??
t/Index.t         2   512     8    3  37.50%  6-8
t/SeqFeature.t               21   ??       %  ??
t/TCoffee.t                   9    1  11.11%  4
Failed 5/48 test scripts, 89.58% okay. -1/594 subtests failed, 100.17%
okay.
make: *** [test_dynamic] Error 29
riker:~/src/bioperl-live> perl t/DB/



riker:~/src/bioperl-live> perl t/DB.t
IO::String not installed. This means the Bio::DB::* modules are not
usable. Skipping tests.
1..1
ok 1
Segmentation fault
riker:~/src/bioperl-live> perl t/Clustalw.t
1..9
Clustalw program not found as /clustalw or not executable.
  Clustalw can be obtained from eg-
http://corba.ebi.ac.uk/Biocatalog/Alignment_Search_software.html/
ok 1

-------------------- EXCEPTION --------------------
MSG: Unallowed parameter: NEW !
CONTEXT: Error in uNKNOWN CONTEXT
SCRIPT: t/Clustalw.t
STACK:
Bio::Tools::Run::Alignment::Clustalw::AUTOLOAD(308)
main::t/Clustalw.t(52)
---------------------------------------------------


riker:~/src/bioperl-live> perl t/SeqFeature.t
1..21
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
ok 11
ok 12
ok 13
ok 14
ok 15
ok 16
ok 17
not ok 18
ok 19
not ok 20
ok 21
ok 22
ok 23
ok 24
ok 25
ok 26
ok 27


riker:~/src/bioperl-live> perl t/TCoffee.t
1..9
TCoffee program not found as /t_coffee or not executable.
  TCoffee can be obtained from eg-
http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/t_coffee_home_page.html
ok 1

-------------------- EXCEPTION --------------------
MSG: Unallowed parameter: NEW !
CONTEXT: Error in uNKNOWN CONTEXT
SCRIPT: t/TCoffee.t
STACK:
Bio::Tools::Run::Alignment::TCoffee::AUTOLOAD(561)
main::t/TCoffee.t(55)
---------------------------------------------------





I'll start to work on TCoffee/Clustalw...






-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From birney@ebi.ac.uk Wed Jan 3 17:39:19 2001 Date: Wed, 3 Jan 2001 17:39:19 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] named parameters
On Wed, 3 Jan 2001, Jason Stajich wrote:

> This is a bit on inconsistency when we specify parameters to new in some
> of the bioperl modules.  Whenever we don't use named parameters (ie
> -file=> 'filename'), we are inconsistent with the fact that all modules
> inherit from  Bio::Root::RootI.  This is because Bio::Root::RootI will
> parse a couple of special parameters - specifically 
> -verbose, -strict, -name, -obj, -record_err
> 
> now we really don't use these that much, however, in the case of
> Bio::Species
> 
> one would call 
> my @classification = qw( sapiens Homo Hominidae
>                                    Catarrhini Primates Eutheria
>                                    Mammalia Vertebrata Chordata
>                                    Metazoa Eukaryota )
> 
> my $sp = new Bio::Species(@classification);
> 
> but if one also wanted debugging turned on, one might call this
> my $sp = new Bio::Species(-verbose=>1, @classification);
> 
> This won't bother RootI, but Bio::Species expects all the parameters to be
> part of the classification array.
> 
> A solution is to change Bio::Species to expect named parameters so an
> array ref is 
> 
> $sp = new Bio::Species(-verbose=>1, -classification => \@classification );
> 
> What are people's reactions to this?  If we can agree that this is
> expected then we can add this to our programming conventions wiki page.


I think we should stick to named parameters throughout and have it as a
programming convention...




> 
> -Jason
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From birney@ebi.ac.uk Wed Jan 3 17:52:50 2001 Date: Wed, 3 Jan 2001 17:52:50 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] test failures on main trunk
Ok. My mistake - we are failing tests but not in the way that I
described...

TCoffee/ClustalW is waiting on RootI reorganisation, currently being led
by Jason 

SeqFeature was a trivial addition of 21 --> 27 tests to run for the new
computation object.




Index has a weird dependancy on IO::String - why is this? Who needs
IO::String in Index?





-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From hlapp@gmx.net Wed Jan 3 17:53:53 2001 Date: Wed, 03 Jan 2001 09:53:53 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] A couple of CVS questions.
Ewan Birney wrote:
> 
> It is, in my book, bad form to move the actual files. If you move files
> then CVS checkouts on old versions screw up with sometimes disasterous
> effects.
> 
> The removal and cvs add is "The Right Way" tm in my book.
> 

Well, I'm certainly not a CVS expert but when I wrote that you can
move the repository files I only quoted the recommendation given
in the CVS documentation (the info files that come with it). If
you think applying this recommendation can have disastrous effects
you should probably write to the CVS people to take this out of
their documentation, or better yet, to put in a warning.

I'm still not sure what could cause the disastrous effect, as the
revision file does not keep any directory information (I may be
wrong here though, but I haven't seen any dir info in such files
yet), and there is no 'central database' that keeps track of which
file is where.

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From birney@ebi.ac.uk Wed Jan 3 17:57:17 2001 Date: Wed, 3 Jan 2001 17:57:17 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] A couple of CVS questions.
On Wed, 3 Jan 2001, Hilmar Lapp wrote:

> Ewan Birney wrote:
> > 
> > It is, in my book, bad form to move the actual files. If you move files
> > then CVS checkouts on old versions screw up with sometimes disasterous
> > effects.
> > 
> > The removal and cvs add is "The Right Way" tm in my book.
> > 
> 
> Well, I'm certainly not a CVS expert but when I wrote that you can
> move the repository files I only quoted the recommendation given
> in the CVS documentation (the info files that come with it). If
> you think applying this recommendation can have disastrous effects
> you should probably write to the CVS people to take this out of
> their documentation, or better yet, to put in a warning.
> 
> I'm still not sure what could cause the disastrous effect, as the
> revision file does not keep any directory information (I may be
> wrong here though, but I haven't seen any dir info in such files
> yet), and there is no 'central database' that keeps track of which
> file is where.

Yeah, but then what happens is that in


   OldRelease (real)

          StableFile XX::YY says use AA:BB
          File       AA::BB is there

   We now move AA:BB to CC:BB *in the repository*


if we checkout the old release we get

          StableFile XX::YY says use AA:BB
          File       AA::BB ** IS NOT THERE **
          File       CC::BB is there, but is named wrong!


So it is ok from a cvs perspective, but it sucks from a code management
perspective!


if you cvs remove, cvs add this does not happen. Traditionally you put in
your log on the cvs add that is has just come from XXXX, allowing people
to track the history ...






> 
> 	Hilmar
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From hlapp@gmx.net Wed Jan 3 18:19:48 2001 Date: Wed, 03 Jan 2001 10:19:48 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] A couple of CVS questions.
Ewan Birney wrote:
> 
> Yeah, but then what happens is that in
> 
>    OldRelease (real)
> 
>           StableFile XX::YY says use AA:BB
>           File       AA::BB is there
> 
>    We now move AA:BB to CC:BB *in the repository*
> 
> if we checkout the old release we get
> 
>           StableFile XX::YY says use AA:BB
>           File       AA::BB ** IS NOT THERE **
>           File       CC::BB is there, but is named wrong!
> 
> So it is ok from a cvs perspective, but it sucks from a code management
> perspective!
> 
> if you cvs remove, cvs add this does not happen. Traditionally you put in
> your log on the cvs add that is has just come from XXXX, allowing people
> to track the history ...
> 

I see. You could still copy the repository file to the new
location, and then cvs remove it from the old. But then, you
probably don't want people to be able to restore a previous
version at a place where that version didn't sit.

	Hilmar

-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From hlapp@gmx.net Wed Jan 3 18:21:40 2001 Date: Wed, 03 Jan 2001 10:21:40 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] named parameters
Jason Stajich wrote:
> 
> 
> A solution is to change Bio::Species to expect named parameters so an
> array ref is
> 
> $sp = new Bio::Species(-verbose=>1, -classification => \@classification );
> 
> What are people's reactions to this?  If we can agree that this is
> expected then we can add this to our programming conventions wiki page.
> 

Yes, certainly.

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From jason@chg.mc.duke.edu Wed Jan 3 19:12:07 2001 Date: Wed, 3 Jan 2001 14:12:07 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] RootI migration and other changes
Hilmar, Ewan, and I came up with the scheme for handling Bio::Root::RootI
and all this obnoxious initializations.  My apologies for not keeping the
list more in the loop, but this was actually really boring.  

So I have checked in changes that should meet this new spec.  There are
some parts that were a little tricky, but all the tests pass so the
behaviour appears to be consistent.

In additione making the changes necessary for the move to a chained new
rather than chained _initialize I revamped some modules that needed
updating.  Here is a summary to the best of my recollection.

t/ - I updated some the tests on an ad hoc basis to using the perl Test
module.  more info on it perldoc Test.  I hope this will make test writing
even easier so that those interested can jump in and write a test (This
might be a good way to get acquainted with a module if you are wanting to
contribute to the project).

Bio::Tools::Run - this new directory is for modules that serve as wrappers
to call outside programs.  We should try and have all modules that execute
external programs residing in this dir or its subdirs.
I added some code using File::Spec to standardize how pathnames to
executeables are located.  I am not sure if we can expect File::Spec to
always be installed in a perl distribution (IT SHOULD BE!), so I revert
back to the original way of constructing paths (assuming unix style
directory separators '/').  Some cleaning up and standardization.
Actually we need to write a module Bio::Tools::Run.pm that will serve as a
framework for all modules that execute external programs.  There is much
code redundancy in these modules right now.

Bio::Species - now use named parameter for classification this required
updates to a test and some of the SeqIO modules.

Bio::SeqFeature::* - I worked on Mark's Computation object a little to
take advantage of inheritance, there are still some noises being made in
t/SeqFeature with the new tests Ewan added so I'll try and track those
down.  I also did some work so that feature1 and feature2 of
SimilarityPair always return something valid even if you have not
initialized it.  This was necessary because of the order parameters are
set when a subclass is instantiated (ie look at the Bio::Tools::Sim4::Exon
heirarchy and trace the calls to new() and you'll start to see what was
happening).  This was due to our move to chained new(), but it works now
so no worries.

Bio::AlignIO::clustalw - now supports read and writing of clustalw
alignments - only supported writing before.  This should work for both
clustal 1.4 and 1.8 

Bio::SearchDist - I added a test for this - I have not actually had luck
loading it on my machine lately so I have written a very simple test that
will skip if it cannot load the Bio::Ext::Align module.

Bio::SeqIO - genbank/embl/swiss I added the verbose parameters to
new Bio::FTHelper(-verbose => $self->verbose)	and when instantiating the
new Seq so that it will not print the warnings when vebose is set to -1
for the SeqIO object.

Bio::DB::GDB - a new module that will query the website www.gdb.org and
return simple things (what I needed which was for a markername, the
pcrprimers and length of product.  This will get much improved later on as
we develop objects for storing Markers and other information.  This will
fail if you overload the GDB server (Trust me I know...)  I'm still
tinkering with it so the tests may not pass 100% of the time.  We can
decide if it is good enough to include in the release (I'm not sure yet).
It's hairy HTML parsing in there.

There are some modules I did not touch - UnivAln, Bio::Tools::Blast,
which depend on Bio::Root::Object.  We're going to have to decide what we
want to do here in the future, but that may not be a job we try to
complete for 0.7 release.

-jason 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 






From lapp@gnf.org Wed Jan 3 20:33:32 2001 Date: Wed, 03 Jan 2001 12:33:32 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] SWISS-PROT writing
Kris Boulez wrote:
> 
> >
> > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy.
> > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could
> > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects
> > of that class.
> >
> The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do
> you plan on having a Bio::Seq::* class for every (complex) sequence type ?
> 

Yes, we plan to have a specialized class for every databank, for which the
attributes its entries carry are not sufficiently reflected in Bio::Seq.pm
or an already existing class under Bio::Seq::*. This enables us to free the
basic Seq object from definitions that only pertain to databanks and don't
make up the essentials of a biological sequence.

So, molecule(), division() etc will be eventually moved away from
Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of
2, meaning that we want it, but we may decide to skip it this time in order
to get the release out of the door.

	Hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From birney@ebi.ac.uk Wed Jan 3 21:12:54 2001 Date: Wed, 3 Jan 2001 21:12:54 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] SWISS-PROT writing
On Wed, 3 Jan 2001, Hilmar Lapp wrote:

> Kris Boulez wrote:
> > 
> > >
> > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy.
> > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could
> > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects
> > > of that class.
> > >
> > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do
> > you plan on having a Bio::Seq::* class for every (complex) sequence type ?
> > 
> 
> Yes, we plan to have a specialized class for every databank, for which the
> attributes its entries carry are not sufficiently reflected in Bio::Seq.pm
> or an already existing class under Bio::Seq::*. This enables us to free the
> basic Seq object from definitions that only pertain to databanks and don't
> make up the essentials of a biological sequence.
> 
> So, molecule(), division() etc will be eventually moved away from
> Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of
> 2, meaning that we want it, but we may decide to skip it this time in order
> to get the release out of the door.

For GenBank/EMBL I have prototype code to check in over here. Looks fine
to me. Swissprot probably needs its own class. 


there is a valid debate about whether swissprot and genbank/embl should
inheriet off a common base class of "rich database sequence objects" (eg,
division is the same) or we should just say that they are different enough
not to stretch this. I hae not done anything on swissprot.


> 
> 	Hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp@gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From wfish82@hotmail.com Thu Jan 4 00:59:55 2001 Date: Thu, 04 Jan 2001 00:59:55 -0000 From: Fish Fish wfish82@hotmail.com Subject: [Bioperl-l] Bio::Tools::Blast;
Hi,

  I am trying to pick out those blast results saying
"***** No hits found *****", among many other things.
But I can't get it work with Bio::Tools::Blast.  Can
somebody point out what is wrong in the following
code?  Also, it seems if the first of a multi blast
record is a "No hits found", then the 2nd record will
be skipped.

  Thanks in advance!

wfish82

**********************************

#!/usr/local/bin/perl -w

use strict;
use Bio::SeqIO;
use Bio::Tools::Blast qw(:obj);

my $blastn=$ARGV[0];

my %blastParam=(
                -file           => $blastn,
                -parse          => 1,
                -filt_func      => \&filter,
                -min_len        => 50,
                -check_all_hits => 0,
                -strict         => 0,
                -stats          => 0,
                -best           => 0,
                -share          => 0,
                -exec_func      => \&process_blast,
);

$Blast->parse(%blastParam);

sub filter{
  my $hit=shift;
  if(! defined $hit){
    print "blahblah...\n";
  }else{
    return 1;
  }
}
sub process_blast{
  my $blastObj=shift;
  if(! defined $blastObj->hit){
    printf "BLAHBLAH...\n";
  }
  $blastObj->destroy;
}

#######################################
# end
#############################
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com


From krbou@pgsgent.be Thu Jan 4 08:04:08 2001 Date: Thu, 4 Jan 2001 09:04:08 +0100 From: Kris Boulez krbou@pgsgent.be Subject: [Bioperl-l] SWISS-PROT writing
Quoting Ewan Birney (birney@ebi.ac.uk):
> On Wed, 3 Jan 2001, Hilmar Lapp wrote:
> 
> > Kris Boulez wrote:
> > > 
> > > >
> > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy.
> > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could
> > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects
> > > > of that class.
> > > >
> > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do
> > > you plan on having a Bio::Seq::* class for every (complex) sequence type ?
> > > 
> > 
> > Yes, we plan to have a specialized class for every databank, for which the
> > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm
> > or an already existing class under Bio::Seq::*. This enables us to free the
> > basic Seq object from definitions that only pertain to databanks and don't
> > make up the essentials of a biological sequence.
> > 
> > So, molecule(), division() etc will be eventually moved away from
> > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of
> > 2, meaning that we want it, but we may decide to skip it this time in order
> > to get the release out of the door.
> 
> For GenBank/EMBL I have prototype code to check in over here. Looks fine
> to me. Swissprot probably needs its own class. 
> 
> 
> there is a valid debate about whether swissprot and genbank/embl should
> inheriet off a common base class of "rich database sequence objects" (eg,
> division is the same) or we should just say that they are different enough
> not to stretch this. I hae not done anything on swissprot.
> 
> 
Last night I thought a bit more about this and have some questions.

- will these objects also inherit from Bio::Seq ?

- if yes, will these objects be created like
    my $swiss_seq = Bio::Seq->new( ..., -format => 'swiss');

  or 

   my $swiss_seq = Bio::Seq::swiss->new( .. );

- will it be possible to 'promote' a Bio::Seq object to one of these new
  objects ?



Kris,

From birney@ebi.ac.uk Thu Jan 4 09:26:59 2001 Date: Thu, 4 Jan 2001 09:26:59 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] SWISS-PROT writing
On Thu, 4 Jan 2001, Kris Boulez wrote:

> Quoting Ewan Birney (birney@ebi.ac.uk):
> > On Wed, 3 Jan 2001, Hilmar Lapp wrote:
> > 
> > > Kris Boulez wrote:
> > > > 
> > > > >
> > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy.
> > > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could
> > > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects
> > > > > of that class.
> > > > >
> > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do
> > > > you plan on having a Bio::Seq::* class for every (complex) sequence type ?
> > > > 
> > > 
> > > Yes, we plan to have a specialized class for every databank, for which the
> > > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm
> > > or an already existing class under Bio::Seq::*. This enables us to free the
> > > basic Seq object from definitions that only pertain to databanks and don't
> > > make up the essentials of a biological sequence.
> > > 
> > > So, molecule(), division() etc will be eventually moved away from
> > > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of
> > > 2, meaning that we want it, but we may decide to skip it this time in order
> > > to get the release out of the door.
> > 
> > For GenBank/EMBL I have prototype code to check in over here. Looks fine
> > to me. Swissprot probably needs its own class. 
> > 
> > 
> > there is a valid debate about whether swissprot and genbank/embl should
> > inheriet off a common base class of "rich database sequence objects" (eg,
> > division is the same) or we should just say that they are different enough
> > not to stretch this. I hae not done anything on swissprot.
> > 
> > 
> Last night I thought a bit more about this and have some questions.
> 
> - will these objects also inherit from Bio::Seq ?

yes.

> 
> - if yes, will these objects be created like
>     my $swiss_seq = Bio::Seq->new( ..., -format => 'swiss');
> 

No. They will be created though from

      my $swiss_seq_io = Bio::SeqIO->new( -format => 'swiss' ) ;
      $swiss_seq = $swiss_seq_io->next_seq;

>   or 
> 
>    my $swiss_seq = Bio::Seq::swiss->new( .. );
> 

This will be achievable.

> - will it be possible to 'promote' a Bio::Seq object to one of these new
>   objects ?
> 

yes....

> 
> 
> Kris,
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From heikki@ebi.ac.uk Thu Jan 4 10:38:43 2001 Date: Thu, 04 Jan 2001 10:38:43 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] test framework
Since it is already in perl 5.004, there should be no reason not to
use it.
I tried it yesterday it really cleans up test code nicely. I am going
to use it in the future.

	-Heikki

Jason Stajich wrote:
> 
> while I'm messing with it, does anyone have objections to using the built
> in perl Test module available since perl 5.004 rather than our
> 
> I agree it is wasted time to constantly move things from one test suite to
> another ( I already tried to standardize our existing ones as best as
> possible).  But a nice standard makes it easier for new people to write
> tests and make them fit.  Any comments?
> 
> sub test ($$;$) {
>     my($num, $true,$msg) = @_;
>     print($true ? "ok $num\n" : "not ok $num $msg\n");
> }
> 
> [ from perldoc Test ]
> 
>       use strict;
>        use Test;
> 
>        # use a BEGIN block so we print our plan before MyModule is loaded
>        BEGIN { plan tests => 14, todo => [3,4] }
> 
>        # load your module...
>        use MyModule;
> 
>        ok(0); # failure
>        ok(1); # success
> 
>        ok(0); # ok, expected failure (see todo list, above)
>        ok(1); # surprise success!
> 
>        ok(0,1);             # failure: '0' ne '1'
>        ok('broke','fixed'); # failure: 'broke' ne 'fixed'
>        ok('fixed','fixed'); # success: 'fixed' eq 'fixed'
>        ok('fixed',qr/x/);   # success: 'fixed' =~ qr/x/
> 
>      ok(sub { 1+1 }, 2);  # success: '2' eq '2'
>        ok(sub { 1+1 }, 3);  # failure: '2' ne '3'
>        ok(0, int(rand(2));  # (just kidding :-)
> 
>        my @list = (0,0);
>        ok @list, 3, "\@list=".join(',',@list);      #extra diagnostics
>        ok 'segmentation fault', '/(?i)success/';    #regex match
> 
>        skip($feature_is_missing, ...);    #do platform specific test
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center
> http://www.chg.duke.edu/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From hlapp@gmx.net Thu Jan 4 17:33:52 2001 Date: Thu, 04 Jan 2001 09:33:52 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] SWISS-PROT writing
Ewan Birney wrote:
> 
> For GenBank/EMBL I have prototype code to check in over here. Looks fine
> to me. Swissprot probably needs its own class.
> 
> there is a valid debate about whether swissprot and genbank/embl should
> inheriet off a common base class of "rich database sequence objects" (eg,
> division is the same) or we should just say that they are different enough
> not to stretch this. I hae not done anything on swissprot.
> 

There are probably enough attributes shared (division, molecule,
date, secondary accessions, maybe revision of the sequence, ...)
to justify creating a rich sequence base class. This would also
others wishing to add another rich seq class get started quickly.

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From hlapp@gmx.net Mon Jan 8 09:42:20 2001 Date: Mon, 08 Jan 2001 01:42:20 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] make test
make test presently reveals the following problems (I'm running
Perl 5.005003 on Linux 2.2.10).

t/Chain.............Warning chain2string: argument LAST:6
overriding LEN:4! at blib/lib/Bio/LiveSeq/Chain.pm line 184.

Does this have any significance?

There were a couple of others which I (and Ewan and Jason) could
fix.

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From heikki@ebi.ac.uk Mon Jan 8 10:18:01 2001 Date: Mon, 08 Jan 2001 10:18:01 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Re: make test
Hilmar,

The warning is intentional, but I agree it looks alarming to anyone
installing bioperl. Test code uses a value outside existing positions.
Can you think a way of rewriting the test so that it does not print it
out?

	-Heikki

Hilmar Lapp wrote:
> 
> make test presently reveals the following problems (I'm running
> Perl 5.005003 on Linux 2.2.10).
> 
> t/Chain.............Warning chain2string: argument LAST:6
> overriding LEN:4! at blib/lib/Bio/LiveSeq/Chain.pm line 184.
> 
> Does this have any significance?
> 
> There were a couple of others which I (and Ewan and Jason) could
> fix.
> 
>         Hilmar
> --
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From insana@ebi.ac.uk Mon Jan 8 13:33:13 2001 Date: Mon, 8 Jan 2001 13:33:13 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] Re: make test
> The warning is intentional, but I agree it looks alarming to anyone
> installing bioperl. Test code uses a value outside existing positions.
> Can you think a way of rewriting the test so that it does not print it
> out?

Ok, I will change that test not to create the warning.
But the whole point of that test was to get that warning and see it was
working as expected.

Jos


From jason@chg.mc.duke.edu Mon Jan 8 13:57:33 2001 Date: Mon, 8 Jan 2001 08:57:33 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Re: make test
If you made your warnings come from bioperl objects (ie $obj->warn() ) we
can turn them off by setting the verbose level on the object 
(ie $obj->verbose(-1) turns off all warnings).  This means you objects
have to inherit from Bio::Root::RootI.  I didn't change the LiveSeq or
Variation objects when I updated all for Bio::Root::RootI chained new
for the other modules in the repository because I didn't know what your
feelings were on this.

Do you want to check to see that the error is thrown or just that the
routine returns the correct value?

-Jason
On Mon, 8 Jan 2001, Joseph Insana wrote:

> > The warning is intentional, but I agree it looks alarming to anyone
> > installing bioperl. Test code uses a value outside existing positions.
> > Can you think a way of rewriting the test so that it does not print it
> > out?
> 
> Ok, I will change that test not to create the warning.
> But the whole point of that test was to get that warning and see it was
> working as expected.
> 
> Jos
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From insana@ebi.ac.uk Mon Jan 8 14:09:35 2001 Date: Mon, 8 Jan 2001 14:09:35 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] Re: make test
> (ie $obj->verbose(-1) turns off all warnings).  This means you objects
> have to inherit from Bio::Root::RootI.

I don't want my objects to inherit from RootI.
They are independent and I'd like to have them stay independent.

> Do you want to check to see that the error is thrown or just that the
> routine returns the correct value?

I wanted to check that the third argument ("last") would always override
the second argument ("length") since that is the way the method is supposed
to work.
I am now going to commit a version that won't produce the warning
and will check something else.

Joseph


From birney@ebi.ac.uk Mon Jan 8 15:08:57 2001 Date: Mon, 8 Jan 2001 15:08:57 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: make test
On Mon, 8 Jan 2001, Joseph Insana wrote:

> > (ie $obj->verbose(-1) turns off all warnings).  This means you objects
> > have to inherit from Bio::Root::RootI.
> 
> I don't want my objects to inherit from RootI.
> They are independent and I'd like to have them stay independent.

This is cool (I completely understand). I think we should consider moving
the variation into its own cvs module, which means that Joseph and Heikki
are not tied to the bioperl release schedule etc. 

This is for post 0.7 branching in my view (Hilmar to make the call).

 > 
> > Do you want to check to see that the error is thrown or just that the
> > routine returns the correct value?
> 
> I wanted to check that the third argument ("last") would always override
> the second argument ("length") since that is the way the method is supposed
> to work.
> I am now going to commit a version that won't produce the warning
> and will check something else.
> 
> Joseph
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From hlapp@gmx.net Mon Jan 8 18:51:16 2001 Date: Mon, 08 Jan 2001 10:51:16 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: make test
Joseph Insana wrote:
> 
> > The warning is intentional, but I agree it looks alarming to anyone
> > installing bioperl. Test code uses a value outside existing positions.
> > Can you think a way of rewriting the test so that it does not print it
> > out?
> 
> Ok, I will change that test not to create the warning.
> But the whole point of that test was to get that warning and see it was
> working as expected.
> 

As I understand from your and Heikki's replies in your test you
wanted the overriding thing to happen, be accepted (even though a
warning was triggered), and the code be able to handle it.

I'm not sure what you did by your change of the test, but it looks
like you simply don't test that feature anymore. If you do want to
keep the warning in the code (and not turn it into an exception,
which means to me that the call itself may indicate an error on
the client side, but in some cases may be totally sensible), what
if you print a message before the test that a warning should be
expected? If you feel confident with removing the warning message,
what if you test afterwards that your code dealt with the
overriding thing as you expected it to do?

Just my two pennies. I didn't want to suggest that anyone turns
off a test of his code. I just think that a warning message being
printed is not really a measurable test result (i.e., it should be
either 'passed' or 'failed').

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From hlapp@gmx.net Mon Jan 8 19:04:04 2001 Date: Mon, 08 Jan 2001 11:04:04 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: make test
Ewan Birney wrote:
> 
> On Mon, 8 Jan 2001, Joseph Insana wrote:
> 
> > > (ie $obj->verbose(-1) turns off all warnings).  This means you objects
> > > have to inherit from Bio::Root::RootI.
> >
> > I don't want my objects to inherit from RootI.
> > They are independent and I'd like to have them stay independent.
> 
> This is cool (I completely understand). I think we should consider moving
> the variation into its own cvs module, which means that Joseph and Heikki
> are not tied to the bioperl release schedule etc.
> 
> This is for post 0.7 branching in my view (Hilmar to make the call).
> 

I'm not sure what you mean by post-0.7 branching. I agree that
under these premises the Variation code should probably better go
into into its own module, even though it's a pity.

	Hilmar

-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From schattner@alum.mit.edu Mon Jan 8 19:44:08 2001 Date: Mon, 08 Jan 2001 11:44:08 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
Hello all

I have committed an initial draft of an introductory bioperl tutorial
(called "bptutorial.pl") to the bioperl-live (main) repository.  The
draft tutorial pretty much follows the outline from my proposal:
http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html
One addition to the original proposal is that I have included an
"appendix" which is a working script that demonstrates most of the
bioperl features described in the tutorial. (The script is largely
cut-and-pasted from various test and example files with print statements
added to make it clearer as to what is going on).

I believe that having a clear and accurate tutorial could make bioperl
more accessible and widely used.  On the other hand, if the tutorial is
confusing or contains mistakes, it will turn people away from trying
bioperl (and probably be worse than not having one at all).   So I have
a request.  I would appreciate it if some of you would read the tutorial
and give me feedback in terms of clarity and accuracy.  I am interested
in both general comments (eg "this section is too long - cut out
such-and-such" or "this module description fits better in this section"
or "this module will not be included in the 0.7 release so don't include
it" ) and specific places where there are errors or misleading or
confusing statements.  (If you think that the tutorial is clear and/or
that specific parts are particularly helpful I'd of course be happy to
get that feedback too :--).  Suggestions on improving the formatting
would also be appreciated.

I would definitely like feedback from people who have written modules
which are in the 0.7 release to make sure that I have captured your
intent and the proper usage of your module(s). I would also like
comments from folks who are simply bioperl users and, ideally, from a
few people who haven't used bioperl much before to see in what ways the
tutorial makes it easier to use or get started using bioperl (or
doesn't).  Feel free to write to me directly at schattner@alum.mit.edu
or via this list.  Thanks.

If you just want to look at the tutorial, you can view it through the
web browsable CVS at :
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. 

(Note: you may need to view the tutorial through a word processor to get
the lines to wrap properly and to get rid of extra '^M's.  If someone
can tell me how I need to reformat the file so this is not necessary I'd
be grateful.)

If you want to also run the tutorial script, you will need to have a
copy  of CVS "bioperl-live". The tutorial script will *not* work with
release 0.6. (Note that the contents of bioperl-live are being updated
often so some of the demo scripts may fail - they're working for me now
and if they start failing I'd appreciate finding out).

Cheers

Peter

From jason@chg.mc.duke.edu Mon Jan 8 21:10:26 2001 Date: Mon, 8 Jan 2001 16:10:26 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] ORF identification/prediction
To the best of my knowledge, we don't currently have bioperl modules that
predict/identify (depending on your confidence in the software =) Open
Reading Frames. Eric and I were thinking of working on a bioperl module
for this.  Any suggestions, known pitfalls, etc are welcomed.


Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 




From lapp@gnf.org Mon Jan 8 22:55:10 2001 Date: Mon, 08 Jan 2001 14:55:10 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] ORF identification/prediction
Jason Stajich wrote:
> 
> To the best of my knowledge, we don't currently have bioperl modules that
> predict/identify (depending on your confidence in the software =) Open
> Reading Frames. Eric and I were thinking of working on a bioperl module
> for this.  Any suggestions, known pitfalls, etc are welcomed.
> 

There is the Bio::Tools::ESTScan module, which obviously relies
on ESTScan as the ORF predicting external tool. If you plan to
implement a full-fledged ORF prediction algorithm in perl that
module is not what you want. (BTW ESTScan consists of a driver
layer in Perl; the core of the algorithm is written in C. One
could try to integrate/rewrite the driver layer into/in Bioperl.)

	Hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From fernan@iib.unsam.edu.ar Mon Jan 8 22:58:17 2001 Date: Mon, 8 Jan 2001 19:58:17 -0300 From: Fernan Aguero fernan@iib.unsam.edu.ar Subject: [Bioperl-l] ORF identification/prediction
Currently I am calling getorf (from the EMBOSS package) in my scripts to
do this for me.

[fernan@iib4 fernan]$ getorf -h 
Mandatory qualifiers:
[-sequence]          seqall     Sequence database USA
[-outseq]            seqoutall  Output sequence(s) USA

Optional qualifiers:
-table              list       Code to use
-minsize            integer    Minimum nucleotide size of ORF to report
-find               list       This is a small menu of possible output
                               options. The first four options are to
                               select either the protein translation or
the
                               original nucleic acid sequence of the
open
                               reading frame. There are two possible
                               definitions of an open reading frame: it
can
                               either be a region that is free of STOP
                               codons or a region that begins with a
START
                               codon and ends with a STOP codon. The
last
                               three options are probably only of
interest
                               to people who wish to investigate the
                               statistical properties of the regions
around
                               potential START or STOP codons. The last
                               option assumes that ORF lengths are
                               calculated between two STOP codons.
Advanced qualifiers:
-[no]methionine     bool       START codons at the beginning of protein
                               products will usually code for
Methionine,
                               despite what the codon will code for when
it
                               is internal to a protein. This qualifier
                               sets all such START codons to code for
                               Methionine by default.
-circular           bool       Is the sequence circular
-[no]reverse        bool       Set this to be false if you do not wish
to
                               find ORFs in the reverse complement of
the                               sequence.
-flanking           integer    If you have chosen one of the options of
the
                               type of sequence to find that gives the
                               flanking sequence around a STOP or START
                               codon, this allows you to set the number
of
                               nucleotides either side of that codon to
                               output. If the region of flanking
                               nucleotides crosses the start or end of
the
                               sequence, no output is given for this
codon.

What i find annoying about EMBOSS apps is that the -h (-help) option
prints limited information (unless the options are 'boolean' or
'integer' you don't know what to put there). You have to go to EMBOSS
web site to look for extended help!

Hope this helps,

Fernan

On Mon, 08 Jan 2001 18:10:26 Jason Stajich wrote:
> To the best of my knowledge, we don't currently have bioperl modules
> that
> predict/identify (depending on your confidence in the software =) Open
> Reading Frames. Eric and I were thinking of working on a bioperl
> module
> for this.  Any suggestions, known pitfalls, etc are welcomed.
> 
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/ 


-- 

# --------------------------------------------------------- #
#                                            _              #
#   Fernan Aguero            |              / \             #
#   Bioinformatics           |       ASCII  \ /  against    #
#   IIB-UNSAM                |      ribbon   /   HTML       #
#   fernan@iib.unsam.edu.ar  |    campaign  / \  email      #
#   ICQ 100325972            |             /   \            #
#                                                           #
# --------------------------------------------------------- #


From nirav@public.arl.Arizona.EDU Mon Jan 8 23:27:11 2001 Date: Mon, 08 Jan 2001 16:27:11 -0700 (MST) From: nirav@public.arl.Arizona.EDU nirav@public.arl.Arizona.EDU Subject: [Bioperl-l] EMBOSS -h Was : ORF identification/prediction
Quoting Fernan Aguero <fernan@iib.unsam.edu.ar>:

.
> 
> What i find annoying about EMBOSS apps is that the -h (-help) option
> prints limited information (unless the options are 'boolean' or
> 'integer' you don't know what to put there). You have to go to EMBOSS
> web site to look for extended help!
> 

use tfm <prog name> for detailed help in EMBOSS 

regards,
Nirav



From dblock@gene.pbi.nrc.ca Tue Jan 9 07:50:42 2001 Date: Tue, 9 Jan 2001 01:50:42 -0600 (CST) From: David Block dblock@gene.pbi.nrc.ca Subject: [Bioperl-l] [Poop-group] RELEASE: Alzabo 0.20 (fwd)
Just something to think about.  Anybody play with this?  Would it work
with BioPerl Objects?  Have we been re-inventing a wheel here?

Up late, thinking out loud.
-- 
David Block
dblock@gene.pbi.nrc.ca
http://bioinfo.pbi.nrc.ca/dblock/wiki
Plant Biotechnology Institute
National Research Council of Canada
Saskatoon, Saskatchewan


---------- Forwarded message ----------
Date: Tue, 9 Jan 2001 00:18:32 -0600 (CST)
From: Dave Rolsky <autarch@urth.org>
To: poop-group@lists.sourceforge.net, poop-scoop@lists.sourceforge.net
Subject: [Poop-group] RELEASE: Alzabo 0.20 (fwd)

Alzabo is a data modelling tool and OO-RDBMS mapper written in Perl.

This release includes a lot of changes, both internal and external.
Users who have older schemas saved to disk will need the eg/convert.pl
utility included with this release.  Existing users should also make sure
to note the deprecations and incompatibilities detailed at the bottom of
the change list.

Among the most visible changes/updates are a fairly large amount of
documentation revamping and support for Postgres.

Alzabo is available from either CPAN or
http://www.sourceforge.net/projects/alzabo/

The Alzabo homepage is at http://alzabo.sourceforge.net/.

The documentation can be read online at
http://alzabo.sourceforge.net/docs/.  This is a good place to start for
those curious about what Alzaob does.


Changes
--------------

0.20

- Preliminary Postgres support.  There is no support yet for
constraints or foreign keys when reverse engineering or making SQL.
There is also no support for large objects (I'm hoping that 7.1 will
be released soon so I won't have to think about this).  Otherwise, the
support is about at the same level as MySQL support, though less
mature.

- Added Alzabo::MethodMaker module.  This can be used to auto-generate
useful methods for your schema/table/row objects based on the
properties of your objects themselves.

- Reworking/expanding/clarifying/editing of the docs.

- Add sort_by and limit options whenever creating a cursor.

- Method documentation POD from the Alzabo::* modules is merged into
the relevant Alzabo::Create::* and Alzabo::Runtime::* modules during
install.  This should make it easier to find what you need since the
average user will only need to look at a few modules in
Alzabo::Runtime::*.

- Reworked exceptions so they are all now Alzabo::Exception::Something.

- Added default as a column attribute (thus there are now
Alzabo::Column->default and Alzabo::Create::Column->set_default
methods).

- Added length & precision attributes for columns.  Both are set
through the Alzabo::Create::Column->set_length method.

- This release includes a script in eg/ called convert.pl to convert
older schemas.

- Alzabo::Schema->tables & Alzabo::Table->columns now take an optional
list of tables/columns as an argument and return a list of matching
objects.

- Added Alzabo::Column->has_attribute method.

- The data browser has actually lost some functionality (the
filtering).  Making this more powerful is a fairly low priority at the
moment.

- Fix bugs where extra params passed to Alzabo::Runtime::Table->insert
were not making it to the Alzabo::Runtime::Row->new method.

- Fix for Alzabo::Runtime::Table->set_prefetch method.

- Fixed bug in handling of deleted object in Alzabo::ObjectCacheIPC
(they were never reported as deleted).

- Fix bug that caused schema to get bigger every time it was saved.

- Finally switched to regular hashes for objects.

- Added Alzabo::SQLMaker classes to handle generating SQL
in a cross-platform compatible way.

DEPRECATIONS:

- Parameters for Alzabo::Create::Column->new: 'null' parameter is now
'nullable'.  The use of the parameter 'null' is deprecated.

- Alzabo::Column->null & Alzabo::Column->set_null methods are now
Alzabo::Column->nullable & Alzabo::Column->set_nullable.  The old
methods are deprecated.

- Alzabo::Create::ForeignKey->new no longer requires table_from &
table_to params (it took me this long to realize I can get that from
the column passed in.  doh!)

INCOMPATIBILITIES:

- Alzabo::Runtime::Table->rows_where parameters have changed.  The
from parameter has been removed (use the Alzabo::Runtime::Schema->join
method instead).  The where parameter expects something different now.

- Alzabo::Runtime::Table->rows_by_where_clause method has been
removed.

- Alzabo::Runtime::Schema->join method's where parameter expects
something different.


/*==================
www.urth.org
We await the New Sun
==================*/



_______________________________________________
Poop-group mailing list
Poop-group@lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/poop-group


From heikki@ebi.ac.uk Tue Jan 9 09:29:23 2001 Date: Tue, 09 Jan 2001 09:29:23 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Re: make test
Ewan propbly means that Variation code should be part of the main
bioperl cvs but should form a separate module after 0.7 is out. I do
not think this a good idea. I'd like to keep Variation and LiveSeq
namespaces within Bioperl main distribution. 

There is an issue of Ensembl needing a copy of Variation code which
would favour moving thing over to a saparate module but it can be
handled by other means: e.g. by copying the objects over temporarily.

	-Heikki

Hilmar Lapp wrote:
> 
> Ewan Birney wrote:
> >
> > On Mon, 8 Jan 2001, Joseph Insana wrote:
> >
> > > > (ie $obj->verbose(-1) turns off all warnings).  This means you objects
> > > > have to inherit from Bio::Root::RootI.
> > >
> > > I don't want my objects to inherit from RootI.
> > > They are independent and I'd like to have them stay independent.
> >
> > This is cool (I completely understand). I think we should consider moving
> > the variation into its own cvs module, which means that Joseph and Heikki
> > are not tied to the bioperl release schedule etc.
> >
> > This is for post 0.7 branching in my view (Hilmar to make the call).
> >
> 
> I'm not sure what you mean by post-0.7 branching. I agree that
> under these premises the Variation code should probably better go
> into into its own module, even though it's a pity.
> 
>         Hilmar
> 
> --
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From birney@ebi.ac.uk Tue Jan 9 09:30:28 2001 Date: Tue, 9 Jan 2001 09:30:28 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: make test
On Tue, 9 Jan 2001, Heikki Lehvaslaiho wrote:

> Ewan propbly means that Variation code should be part of the main
> bioperl cvs but should form a separate module after 0.7 is out. I do
> not think this a good idea. I'd like to keep Variation and LiveSeq
> namespaces within Bioperl main distribution. 

I am cool with this as well. <grin>. 



From heikki@ebi.ac.uk Tue Jan 9 10:28:29 2001 Date: Tue, 09 Jan 2001 10:28:29 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
Dear Peter,

Wonderful! Thank you very much for writing the tutorial. Before any of
us goes into details I though it best to wrap the words and remove ^Ms
for easier viewing. CVS  is happier with short lines, too. This was
easy enough to do in emacs.

Thanks again,

	-Heikki

Peter Schattner wrote:
> 
> Hello all
> 
> I have committed an initial draft of an introductory bioperl tutorial
> (called "bptutorial.pl") to the bioperl-live (main) repository.  The
> draft tutorial pretty much follows the outline from my proposal:
> http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html
> One addition to the original proposal is that I have included an
> "appendix" which is a working script that demonstrates most of the
> bioperl features described in the tutorial. (The script is largely
> cut-and-pasted from various test and example files with print statements
> added to make it clearer as to what is going on).
> 
> I believe that having a clear and accurate tutorial could make bioperl
> more accessible and widely used.  On the other hand, if the tutorial is
> confusing or contains mistakes, it will turn people away from trying
> bioperl (and probably be worse than not having one at all).   So I have
> a request.  I would appreciate it if some of you would read the tutorial
> and give me feedback in terms of clarity and accuracy.  I am interested
> in both general comments (eg "this section is too long - cut out
> such-and-such" or "this module description fits better in this section"
> or "this module will not be included in the 0.7 release so don't include
> it" ) and specific places where there are errors or misleading or
> confusing statements.  (If you think that the tutorial is clear and/or
> that specific parts are particularly helpful I'd of course be happy to
> get that feedback too :--).  Suggestions on improving the formatting
> would also be appreciated.
> 
> I would definitely like feedback from people who have written modules
> which are in the 0.7 release to make sure that I have captured your
> intent and the proper usage of your module(s). I would also like
> comments from folks who are simply bioperl users and, ideally, from a
> few people who haven't used bioperl much before to see in what ways the
> tutorial makes it easier to use or get started using bioperl (or
> doesn't).  Feel free to write to me directly at schattner@alum.mit.edu
> or via this list.  Thanks.
> 
> If you just want to look at the tutorial, you can view it through the
> web browsable CVS at :
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl.
> 
> (Note: you may need to view the tutorial through a word processor to get
> the lines to wrap properly and to get rid of extra '^M's.  If someone
> can tell me how I need to reformat the file so this is not necessary I'd
> be grateful.)
> 
> If you want to also run the tutorial script, you will need to have a
> copy  of CVS "bioperl-live". The tutorial script will *not* work with
> release 0.6. (Note that the contents of bioperl-live are being updated
> often so some of the demo scripts may fail - they're working for me now
> and if they start failing I'd appreciate finding out).
> 
> Cheers
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From heikki@ebi.ac.uk Tue Jan 9 10:45:56 2001 Date: Tue, 09 Jan 2001 10:45:56 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
Peter,

Running of any part of the script is dependent on bioperl-ext package.
Since I do not have it, I can not run any demos. A workaround is
needed.

	-Heikki


odo ~/src/bioperl-live> perl -w  bptutorial.pl 0
 
The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align)
has not been installed.
 Please install the bioperl-ext package
 
odo ~/src/bioperl-live> perl -w  bptutorial.pl 4
 
The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align)
has not been installed.
 Please install the bioperl-ext package

odo ~/src/bioperl-live>

Peter Schattner wrote:
> 
> Hello all
> 
> I have committed an initial draft of an introductory bioperl tutorial
> (called "bptutorial.pl") to the bioperl-live (main) repository.  The
> draft tutorial pretty much follows the outline from my proposal:
> http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html
> One addition to the original proposal is that I have included an
> "appendix" which is a working script that demonstrates most of the
> bioperl features described in the tutorial. (The script is largely
> cut-and-pasted from various test and example files with print statements
> added to make it clearer as to what is going on).
> 
> I believe that having a clear and accurate tutorial could make bioperl
> more accessible and widely used.  On the other hand, if the tutorial is
> confusing or contains mistakes, it will turn people away from trying
> bioperl (and probably be worse than not having one at all).   So I have
> a request.  I would appreciate it if some of you would read the tutorial
> and give me feedback in terms of clarity and accuracy.  I am interested
> in both general comments (eg "this section is too long - cut out
> such-and-such" or "this module description fits better in this section"
> or "this module will not be included in the 0.7 release so don't include
> it" ) and specific places where there are errors or misleading or
> confusing statements.  (If you think that the tutorial is clear and/or
> that specific parts are particularly helpful I'd of course be happy to
> get that feedback too :--).  Suggestions on improving the formatting
> would also be appreciated.
> 
> I would definitely like feedback from people who have written modules
> which are in the 0.7 release to make sure that I have captured your
> intent and the proper usage of your module(s). I would also like
> comments from folks who are simply bioperl users and, ideally, from a
> few people who haven't used bioperl much before to see in what ways the
> tutorial makes it easier to use or get started using bioperl (or
> doesn't).  Feel free to write to me directly at schattner@alum.mit.edu
> or via this list.  Thanks.
> 
> If you just want to look at the tutorial, you can view it through the
> web browsable CVS at :
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl.
> 
> (Note: you may need to view the tutorial through a word processor to get
> the lines to wrap properly and to get rid of extra '^M's.  If someone
> can tell me how I need to reformat the file so this is not necessary I'd
> be grateful.)
> 
> If you want to also run the tutorial script, you will need to have a
> copy  of CVS "bioperl-live". The tutorial script will *not* work with
> release 0.6. (Note that the contents of bioperl-live are being updated
> often so some of the demo scripts may fail - they're working for me now
> and if they start failing I'd appreciate finding out).
> 
> Cheers
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From heikki@ebi.ac.uk Tue Jan 9 10:50:49 2001 Date: Tue, 09 Jan 2001 10:50:49 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Initial draft of bioperl tutorial committed
P.S. The URL for the wrapped version of the text is:

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.2&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl

With new versions coming in shortly it is best to use:

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?cvsroot=bioperl

And select the last from there.

	-Heikki

Heikki Lehvaslaiho wrote:
> 
> Dear Peter,
> 
> Wonderful! Thank you very much for writing the tutorial. Before any of
> us goes into details I though it best to wrap the words and remove ^Ms
> for easier viewing. CVS  is happier with short lines, too. This was
> easy enough to do in emacs.
> 
> Thanks again,
> 
>         -Heikki
> 
> Peter Schattner wrote:
> >
> > Hello all
> >
> > I have committed an initial draft of an introductory bioperl tutorial
> > (called "bptutorial.pl") to the bioperl-live (main) repository.  The
> > draft tutorial pretty much follows the outline from my proposal:
> > http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html
> > One addition to the original proposal is that I have included an
> > "appendix" which is a working script that demonstrates most of the
> > bioperl features described in the tutorial. (The script is largely
> > cut-and-pasted from various test and example files with print statements
> > added to make it clearer as to what is going on).
> >
> > I believe that having a clear and accurate tutorial could make bioperl
> > more accessible and widely used.  On the other hand, if the tutorial is
> > confusing or contains mistakes, it will turn people away from trying
> > bioperl (and probably be worse than not having one at all).   So I have
> > a request.  I would appreciate it if some of you would read the tutorial
> > and give me feedback in terms of clarity and accuracy.  I am interested
> > in both general comments (eg "this section is too long - cut out
> > such-and-such" or "this module description fits better in this section"
> > or "this module will not be included in the 0.7 release so don't include
> > it" ) and specific places where there are errors or misleading or
> > confusing statements.  (If you think that the tutorial is clear and/or
> > that specific parts are particularly helpful I'd of course be happy to
> > get that feedback too :--).  Suggestions on improving the formatting
> > would also be appreciated.
> >
> > I would definitely like feedback from people who have written modules
> > which are in the 0.7 release to make sure that I have captured your
> > intent and the proper usage of your module(s). I would also like
> > comments from folks who are simply bioperl users and, ideally, from a
> > few people who haven't used bioperl much before to see in what ways the
> > tutorial makes it easier to use or get started using bioperl (or
> > doesn't).  Feel free to write to me directly at schattner@alum.mit.edu
> > or via this list.  Thanks.
> >
> > If you just want to look at the tutorial, you can view it through the
> > web browsable CVS at :
> > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl.
> >
> > (Note: you may need to view the tutorial through a word processor to get
> > the lines to wrap properly and to get rid of extra '^M's.  If someone
> > can tell me how I need to reformat the file so this is not necessary I'd
> > be grateful.)
> >
> > If you want to also run the tutorial script, you will need to have a
> > copy  of CVS "bioperl-live". The tutorial script will *not* work with
> > release 0.6. (Note that the contents of bioperl-live are being updated
> > often so some of the demo scripts may fail - they're working for me now
> > and if they start failing I'd appreciate finding out).
> >
> > Cheers
> >
> > Peter
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/                      http://www.ebi.ac.uk/mutations/
>      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
>     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
>    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
>   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
>      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From birney@ebi.ac.uk Tue Jan 9 11:45:33 2001 Date: Tue, 9 Jan 2001 11:45:33 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] bptutorial
Many thanks to Peter for an excellent tutorial. It is well worth a read:
I have spotted no obvious errors, but I will reread more carefully.


The dependency problem can be solved with a conditional require and then
run time skipping of sections. I agree with heikki that this will be a
good thing. I will see what I can do here.


People may have noticed as well that jason me and hilmar have been
struggling through the refactoring of the main trunk towards 0.7. Much
praise goes to jason for doing the lion's share of the work. 

I have only one module failing for unexplained reasons. I am planning to
write on my transatlantic flight today the RichSeq style interfaces 




-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From birney@ebi.ac.uk Tue Jan 9 11:46:08 2001 Date: Tue, 9 Jan 2001 11:46:08 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] spoke too soon...
Just cvs update'd and run tests... SeqStats has disappeared. Is this
deliberate?



-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From jason@chg.mc.duke.edu Tue Jan 9 15:38:59 2001 Date: Tue, 9 Jan 2001 10:38:59 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] spoke too soon...
It appears you might have...  It is in Bio::Tools::SeqStats, I have
updated the test module to reflect this and split the tests into separate
ok statements so we can know which ones are failing.

It appears some are and I am not sure if it is an error in the tests or
the module.

31 helix ../bio/bioperl/bioperl-live> cvs log Bio/SeqStats.pm

RCS file: /home/repository/bioperl/bioperl-live/Bio/Attic/SeqStats.pm,v
Working file: Bio/SeqStats.pm
head: 1.3
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 3;     selected revisions: 3
description:
----------------------------
revision 1.3
date: 2000/03/21 11:47:55;  author: birney;  state: dead;  lines: +0 -0
moved SeqStats, added SeqWords
----------------------------
revision 1.2
date: 2000/03/01 15:36:42;  author: birney;  state: Exp;  lines: +148 -156
Refactored RootI to get exception throwing cleanly out

Fixed minor issues in multifile.pm

Minor fix to IUPAC

added exception test

tidied up SeqStats.pm
----------------------------
revision 1.1
date: 2000/02/27 11:36:14;  author: birney;  state: Exp;
added multi_1 test and SeqStats
==========================================================================

On Tue, 9 Jan 2001, Ewan Birney wrote:

> 
> Just cvs update'd and run tests... SeqStats has disappeared. Is this
> deliberate?
> 
> 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From insana@ebi.ac.uk Tue Jan 9 19:17:46 2001 Date: Tue, 9 Jan 2001 19:17:46 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] make tests
> As I understand from your and Heikki's replies in your test you
> wanted the overriding thing to happen, be accepted (even though a
> warning was triggered), and the code be able to handle it.

Exactly.

> if you print a message before the test that a warning should be
> expected?

This is a nice proposal.

But that one is not such an important feature that needs to be absolutely
tested, to the point of forcing people to read the pre-warning message
and the warning message not to get confused by them....

So I just changed the code to test something closely related, i.e. checking
that the code works, avoiding only to check that the "override" of the two
parameters is acted (it should anyway).

Thank you.
        Joseph Insana


From hlapp@gmx.net Tue Jan 9 19:28:46 2001 Date: Tue, 09 Jan 2001 11:28:46 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: make test
Heikki Lehvaslaiho wrote:
> 
> Ewan propbly means that Variation code should be part of the main
> bioperl cvs but should form a separate module after 0.7 is out. I do
> not think this a good idea. I'd like to keep Variation and LiveSeq
> namespaces within Bioperl main distribution.
> 

Even better. I see I haven't understood the issue, so you guys
thrash this out.

	Hilmar

-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From ajm6q@virginia.edu Wed Jan 10 00:31:01 2001 Date: Tue, 9 Jan 2001 19:31:01 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] make tests
Why don't you trap the warning in an eval/$SIG{__WARN__} - I don't see why
you can't test for proper warnings, if that's what you were trying to do.

-Aaron

On Tue, 9 Jan 2001, Joseph Insana wrote:

> > As I understand from your and Heikki's replies in your test you
> > wanted the overriding thing to happen, be accepted (even though a
> > warning was triggered), and the code be able to handle it.
>
> Exactly.
>
> > if you print a message before the test that a warning should be
> > expected?
>
> This is a nice proposal.
>
> But that one is not such an important feature that needs to be absolutely
> tested, to the point of forcing people to read the pre-warning message
> and the warning message not to get confused by them....
>
> So I just changed the code to test something closely related, i.e. checking
> that the code works, avoiding only to check that the "override" of the two
> parameters is acted (it should anyway).
>
> Thank you.
>         Joseph Insana
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>


From hlapp@gmx.net Wed Jan 10 09:55:32 2001 Date: Wed, 10 Jan 2001 01:55:32 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::SearchDist, Bio::Ext::Align
I thought for completeness I install the Bioperl XS modules in
Bio::Ext::*, and downloaded bioperl-ext-0.6.tar.gz, which is
advertised as the latest version.

Installation went fine, but now the t/SearchDist.t tests get
executed. This revealed a couple of bugs in Bio::SearchDist, some
of which are related to the RootI transition. Others consist of
calling functions which are simply not present by that name in the
extension module. I tried to fix them all, but now there is a
complaint about a missing parameter in fit_EVD (expects two, but
gets only 1 hardcoded parameter), which I don't know how to fix.

Does anyone use this module currently (and if so, why does it work
for you?)? Did I grab the wrong version?

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From heikki@ebi.ac.uk Wed Jan 10 12:26:53 2001 Date: Wed, 10 Jan 2001 12:26:53 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
I noticed that it is not possible to use three letter codes for amino
acids in any bioperl sequence objects. I think should be possible at
least to output in three letter code. Mapping three letter code back
to one letter code is not too hard, either, but is it a good idea to
have?

I propose to put method 'seq3' into PrimarySeq.pm which is called from
Seq.pm, too.

=head2 seq3

 Title   : seq3
 Usage   : $string = $obj->seq3()
 Function: Read only method that returns the amino acid sequence 
           as a string of three letter codes. moltype has to be 
           'protein'. Output follows the IUPAC standard plus 
           'Ter' for terminator.
 Returns : A scalar
 Args    : character used for stop, optional, defaults to '*'
           character used for unknown, optional, defaults to 'X'

=cut

Any opinions?

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From gert.thijs@esat.kuleuven.ac.be Wed Jan 10 15:35:48 2001 Date: Wed, 10 Jan 2001 16:35:48 +0100 From: gert thijs gert.thijs@esat.kuleuven.ac.be Subject: [Bioperl-l] Bio::SeqIO::genbank.pm
I have been using Bio::SeqIO::genbank.pm quite frequently lately and I
stumbled upon a small parsing problem. Sometimes there is no TITLE field
defined in the REFERENCE and this makes the parsing of the record fail such
that no features are detected. To solve this problem I have added 1 extra
check in  Bio::SeqIO::genbank.pm at line 602

if (/^  AUTHORS\s+(.*)/) { 
   $au .= $1;   
   while ( defined($_ = $self->_readline) ) {
       /^  TITLE/ && last;
       /^  JOURNAL/ && last;   ### when no title is given ###
       /^\s+(.*)/ && do { $au .= $1; $au =~ s/\,(\S)/ $1/g;$au .= " ";next;};
   }    
}



-- 
+ Gert Thijs              
+ 
+ email: gert.thijs@esat.kuleuven.ac.be 
+ homepage: http://www.esat.kuleuven.ac.be/~thijs
+ 
+ K.U.Leuven
+ ESAT-SISTA 
+ Kasteelpark Arenberg 10 
+ B-3001 Leuven-Heverlee  
+ Belgium  
+ Tel : +32 16 32 18 84
+ Fax : +32 16 32 19 70

From birney@ebi.ac.uk Wed Jan 10 13:35:46 2001 Date: Wed, 10 Jan 2001 13:35:46 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: Bio::SearchDist, Bio::Ext::Align
On Wed, 10 Jan 2001, Hilmar Lapp wrote:

> I thought for completeness I install the Bioperl XS modules in
> Bio::Ext::*, and downloaded bioperl-ext-0.6.tar.gz, which is
> advertised as the latest version.
> 
> Installation went fine, but now the t/SearchDist.t tests get
> executed. This revealed a couple of bugs in Bio::SearchDist, some
> of which are related to the RootI transition. Others consist of
> calling functions which are simply not present by that name in the
> extension module. I tried to fix them all, but now there is a
> complaint about a missing parameter in fit_EVD (expects two, but
> gets only 1 hardcoded parameter), which I don't know how to fix.

This is my bug to fix. I will look at it.

I don't think anyone has used SearchDist before, including me. Doh!


> 
> Does anyone use this module currently (and if so, why does it work
> for you?)? Did I grab the wrong version?
> 
> 	Hilmar
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From birney@ebi.ac.uk Wed Jan 10 13:46:01 2001 Date: Wed, 10 Jan 2001 13:46:01 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
On Wed, 10 Jan 2001, Heikki Lehvaslaiho wrote:

> 
> 
> I noticed that it is not possible to use three letter codes for amino
> acids in any bioperl sequence objects. I think should be possible at
> least to output in three letter code. Mapping three letter code back
> to one letter code is not too hard, either, but is it a good idea to
> have?
> 
> I propose to put method 'seq3' into PrimarySeq.pm which is called from
> Seq.pm, too.
> 
> =head2 seq3
> 
>  Title   : seq3
>  Usage   : $string = $obj->seq3()
>  Function: Read only method that returns the amino acid sequence 
>            as a string of three letter codes. moltype has to be 
>            'protein'. Output follows the IUPAC standard plus 
>            'Ter' for terminator.
>  Returns : A scalar
>  Args    : character used for stop, optional, defaults to '*'
>            character used for unknown, optional, defaults to 'X'
> 
> =cut
> 
> Any opinions?


Do you really want this? I guess so.


There could be an argument to make a SeqUtils class and move this sort of
function in there, allowing us to mess less objects/interfaces it would be


   $seq3 = Bio::SeqUtils->seq3($seq);


> 
> 	-Heikki
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/                      http://www.ebi.ac.uk/mutations/
>      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
>     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
>    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
>   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
>      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From hlapp@gmx.net Wed Jan 10 19:10:51 2001 Date: Wed, 10 Jan 2001 11:10:51 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Bio::SeqIO::genbank.pm
Submitted to the Bioperl bug-tracker.

(BTW whenever you feel quite sure that your complaint addresses a
bug, you can directly submit it to bioperl-bugs@bio.perl.org. If
you don't feel sure, you can still do so. The bug-tracking system
is the best way of keeping track of such things.)

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From lorrie@oreilly.com Wed Jan 10 19:31:27 2001 Date: Wed, 10 Jan 2001 14:31:27 -0500 From: Lorrie LeJeune lorrie@oreilly.com Subject: [Bioperl-l] Re: Initial draft of bioperl tutorial committed
At 05:33 AM 1/9/2001 -0500, Peter Schattner wrote:

>I have committed an initial draft of an introductory bioperl tutorial
>(called "bptutorial.pl") to the bioperl-live (main) repository.

Peter (and fellow BioPerlers):

I think the tutorial is a great idea. BioPerl needs good documentation in a 
big way, and I promised Ewan at BOSC that I'd be willing to volunteer some 
time to the cause. So I'd be happy to sign on as your editor and help you 
get it ship-shape. I'm also a beginning Perl programmer, so I'm sure it'll 
help me learn more about both the language and BioPerl.

I'm in the process of finishing up O'Reilly's first bioinformatics book: 
Developing Bioinformatics Computer Skills. I'd like to put a pointer to the 
tutorial in the book, but the URL is way too long. D'ya think we might 
convince the webmaster give it a shorter link that's suitable for publication?

Cheers,

--Lorrie

------------------------------------------------------
Lorrie LeJeune
Editor, Web Technologies and Bioinformatics
O'Reilly & Associates
90 Sherman Street, Cambridge, MA 02140
Tel: 617-499-7472;  FAX: 617-661-1116
www.oreilly.com
------------------------------------------------------

From hlapp@gmx.net Wed Jan 10 19:35:09 2001 Date: Wed, 10 Jan 2001 11:35:09 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] three letter codes for amino acids?
Heikki Lehvaslaiho wrote:
> 
> I noticed that it is not possible to use three letter codes for amino
> acids in any bioperl sequence objects. I think should be possible at
> least to output in three letter code. Mapping three letter code back
> to one letter code is not too hard, either, but is it a good idea to
> have?
> 
> I propose to put method 'seq3' into PrimarySeq.pm which is called from
> Seq.pm, too.
> 
> =head2 seq3
> 
>  Title   : seq3
>  Usage   : $string = $obj->seq3()
>  Function: Read only method that returns the amino acid sequence
>            as a string of three letter codes. moltype has to be
>            'protein'. Output follows the IUPAC standard plus
>            'Ter' for terminator.
>  Returns : A scalar
>  Args    : character used for stop, optional, defaults to '*'
>            character used for unknown, optional, defaults to 'X'
> 
> =cut
> 
> Any opinions?
> 

Considering sequence atoms as symbols seems the most natural
concept to me. Having single letters representing each symbol
makes symbol arrays and strings more or less equivalent in Perl.
This might not hold for multi-letter representations, so in the
first place I'd expect an array to be returned. However, this is
inconsistent with $seq->seq(), and reportedly inefficient due to
Perl's array implementation.

I know you could still split at every 3rd letter as a simple way
to get an array. I'd nevertheless accept a third optional
parameter denoting the 'join' character, with a default of ''.

Just my few thoughts.

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From paul-christophe.varoutas@curie.fr Thu Jan 11 00:48:13 2001 Date: Thu, 11 Jan 2001 01:48:13 +0100 From: Paul-Christophe Varoutas paul-christophe.varoutas@curie.fr Subject: [Bioperl-l] Emerging from obscurity
Hi everybody,

I am writing because I would like to start contributing to the bioperl 
project. But first of all let me introduce myself:

I am finishing my PhD at the Curie Institute in Paris, France. My subject 
is molecular genetics in yeast, and more particularly the study of the 
initiation of meiotic recombination.

Apart from molecular genetics, I have a rather strong background in 
algorithmics and programming, that I developped by studying alone and 
interacting a lot with people studying computer science. One of my favorite 
fields is OOP: a have read a lot of books on OOP design and have experience 
in designing projects using UML and implementing them in C++. I started 
using C++ on 1992 and ever since I have implemented lots of sexually 
attractive object classes, such as various types of neural networks 
(backpropagation nets, BAMs (bidirectional associative memories) and FAMs 
(like BAMs but integrating  fuzzy logic), various cryptographic algorithms, 
and a basic collection of bioinformatics objects (that was before I 
discovered bioperl ;-) ), that I then used to develop some small 
appications, the coolest one probably being a program doing ORF prediction 
using Fourier transforms.

As for Perl, I started learning it on 1995 (that was the year of the 5.001 
release). Slowly but steadily it has become my favorite language. I use it 
extensively to do virtually *everything*, including solving my small 
everyday problems, such as doing file management, automating various 
internet activities (from more low-level operations using the IO::Socket:: 
modules to FTP/telnet sessions and web stuff), automating the very few 
biocomputing needs I've had for my PhD project. I also use perl for CGI 
scripting.
I am fascinated by the power of regular expressions (I am reading 
"Mastering Regular Expressions" for the second time, and I am even more 
fascinated than the first time I read it, I'm still discovering astonishing 
details and realizing there is so much to learn about them), and try to use 
them whenever/wherever I can  B-) .

I discovered the bioperl project two years ago and I am following with big 
interest the discussion groups for almost a year. Many times I wanted to 
just jump in the discussions, but I didn't because I knew I would have no 
time to deal with it on top of my other activities.

So, after this rather long introduction, here is the subject of my mail: 
like all of you, I want to participate in making bioperl better.

As I mentionned above I am finishing my PhD, so I don't have much time for 
the moment. But will have finished the experimental part of my PhD by the 
end of January, so I will have some time to spare. I will probably pass 
large amounts of time in front of a macintosh writting my article and PhD. 
I *hate* macs (my favorite mac software is telnet for loging to our unix 
servers or to my home PC), and participating in the bioperl project will 
prevent me from getting insane :-).

I was thinking about participating in the discussions about the OOP design 
of bioperl, participating in the biocorba interoperability project, but for 
the moment I would prefer starting with something more "smooth", after all 
I am not (yet) familiar with all bioperl modules. So doing something that 
can get me more familiarized with the whole set of the bioperl modules 
should be a good start.

I figured out that I can help with some aspects of bioperl that can 
contribute to the enlargement of the bioperl community.
So, here is what I propose to do:

- Help figuring out bioperl 0.7 cross-platform compatibility with the MacOS 
platform. Almost all french labs use macintosh computers, and our lab has a 
lot of mac boxes with various types of processors and various versions of 
MacOS (from 7.5.3 to 9.0). Todd Richmond and Mark Colosimo have already 
pointed out that there are a lot of compatibility problems, their posts are 
going to be my starting point. I would like to make a list of all problems, 
figure out which ones can be solved reasonably easily, and make at least a 
subset of bioperl work on MacOS "Classic" (non-MacOS X) platforms, which is 
what most Mac people use, and most probably will continue using.

- Contribute to Shelly's effort for compatibility with the Windows NT/2000 
platform.

- Participate in the documentation project of bioperl. I know that there 
are already people working on various aspects of the documentation, so I 
would like Ewan / Hilmar to tell me what you prefer: participate in one of 
the ongoing projects or initiate another project to do something that is 
missing.

I am very glad to contribute to the bioperl group, you are doing some 
exceptionally good work out there.

(For those who are reading this line, thank you for reaching so far  :-)   ).


Paul-Christophe


--------------------------------------
Paul-Christophe Varoutas
Institut Curie - Section de Recherche - UMR144
Laboratoire de Genetique Moleculaire de la Recombinaison
26, rue d'Ulm
75248 Paris cedex 05
FRANCE
Tel: 01.42.34.66.36
Fax: 01.42.34.66.44
----------------------------------------


From jason@chg.mc.duke.edu Thu Jan 11 01:33:12 2001 Date: Wed, 10 Jan 2001 20:33:12 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Emerging from obscurity
Paul - We are very happy to have you aboard the project.  We are very
happy to add you skills to the team and I look forward to you getting
aquainted with the modules and helping us in the design (and redesign)  
of many of the objects.  

 The tasks you outline below sound like a very good starting point and are
very sorely needed as many of developers are only plugged into UNIX on a
regular basis.  The documentation will be a good starting point too, but I
strongly suggest you try and use the bioperl modules to solve a task you
have in your lab.  (I guess the Mac portability tackling will give you
this experience - but really try and use it to manipulate some of your 
yeast data).  This will give you the chance to both get used to
modules and help write documentation for people who are new to bioperl.
The developers who are familiar with the code too often skip over the
important details when writing docs.   If you are unsure of how to use a
module feel free to use the list for questions, I know there are many more
people who are looking for ways to get comfortable with the modules.

I'd also like to see us consider moving some of the
documentation/tutorials to the wiki web site to facilitate more people
contributing to it.  Perhaps some 'scenerio writing' which describes a
problem and how bioperl was used to solve it.

Again, welcome aboard and we look forward to your contributions.  

Jason

On Thu, 11 Jan 2001, Paul-Christophe Varoutas wrote:
[snip]
> 
> 
> - Help figuring out bioperl 0.7 cross-platform compatibility with the MacOS 
> platform. Almost all french labs use macintosh computers, and our lab has a 
> lot of mac boxes with various types of processors and various versions of 
> MacOS (from 7.5.3 to 9.0). Todd Richmond and Mark Colosimo have already 
> pointed out that there are a lot of compatibility problems, their posts are 
> going to be my starting point. I would like to make a list of all problems, 
> figure out which ones can be solved reasonably easily, and make at least a 
> subset of bioperl work on MacOS "Classic" (non-MacOS X) platforms, which is 
> what most Mac people use, and most probably will continue using.
> 
> - Contribute to Shelly's effort for compatibility with the Windows NT/2000 
> platform.
> 
> - Participate in the documentation project of bioperl. I know that there 
> are already people working on various aspects of the documentation, so I 
> would like Ewan / Hilmar to tell me what you prefer: participate in one of 
> the ongoing projects or initiate another project to do something that is 
> missing.
> 
> I am very glad to contribute to the bioperl group, you are doing some 
> exceptionally good work out there.
> 
> (For those who are reading this line, thank you for reaching so far  :-)   ).
> 
> 
> Paul-Christophe
> 
> 
> --------------------------------------
> Paul-Christophe Varoutas
> Institut Curie - Section de Recherche - UMR144
> Laboratoire de Genetique Moleculaire de la Recombinaison
> 26, rue d'Ulm
> 75248 Paris cedex 05
> FRANCE
> Tel: 01.42.34.66.36
> Fax: 01.42.34.66.44
> ----------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From schattner@alum.mit.edu Thu Jan 11 09:04:19 2001 Date: Thu, 11 Jan 2001 01:04:19 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Re: Initial draft of bioperl tutorial committed
Lorrie LeJeune wrote:
> 
> Peter Schattner wrote:
> 
> >I have committed an initial draft of an introductory bioperl tutorial
> >(called "bptutorial.pl") to the bioperl-live (main) repository.

> So I'd be happy to sign on as your editor and help you
> get it ship-shape. 

Thanks for your offer.  I look forward to getting your feedback and
recommendations regarding the tutorial.

Peter

From heikki@ebi.ac.uk Thu Jan 11 10:09:48 2001 Date: Thu, 11 Jan 2001 10:09:48 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
Dear Adrian,

I guess I was not too clear here. I'll post the reply to the list as
others might have misunderstood, too. 

The translate method in PrimarySeqI defaults to '*' and 'X' for stop
and any in its output, but there are arguments to the method that
allow you to change it. As The resulting protein sequence object can
have any come other characters in the one letter code stored in the
object. The same argumets are needed in the seq3 method so that the
corresponding three letter codes are always 'Ter' and 'Xaa' (IUPAC
standard).

	-Heikki

Adrian Goldman wrote:
> 
> Heikki,
> 
> I am not very good at listserv etiquette. Anyway, here is my 2c.. if you want to post it further on to the list server, it's OK by me. Or else you can just ignore what follows as my own personal opinion.
> 
> I don't think it makes much sense to use * as the default character for stop in 3-letter codes, nor X as the default for unknown, for the optional arguments you mention below. Ter (as you propose) for the termination codon and ?XXX for unknown make more sense to me.
> 
> Adrian
> 
> At 12:03 pm -0500 10/1/2001, bioperl-l-request@bioperl.org wrote:
> 
>      Message: 5
>      Date: Wed, 10 Jan 2001 12:26:53 +0000
>      From: Heikki Lehvaslaiho <heikki@ebi.ac.uk>
>      Organization: EMBL - EBI
>      To: bioperl-l <bioperl-l@bioperl.org>
>      Subject: [Bioperl-l] three letter codes for amino acids?
> 
>      I noticed that it is not possible to use three letter codes for amino
>      acids in any bioperl sequence objects. I think should be possible at
>      least to output in three letter code. Mapping three letter code back
>      to one letter code is not too hard, either, but is it a good idea to
>      have?
> 
>      I propose to put method 'seq3' into PrimarySeq.pm which is called from
>      Seq.pm, too.
> 
>      =head2 seq3
> 
>      Title : seq3
>      Usage : $string = $obj->seq3()
>      Function: Read only method that returns the amino acid sequence
>      as a string of three letter codes. moltype has to be
>      'protein'. Output follows the IUPAC standard plus
>      'Ter' for terminator.
>      Returns : A scalar
>      Args : character used for stop, optional, defaults to '*'
>      character used for unknown, optional, defaults to 'X'
> 
>      =cut
> 
>      Any opinions?
> 
>      -Heikki
> 
>      -- 
> 
> Professor Adrian Goldman, | Phone: 358-(0)9-191 58923
> Structural Biology Group, | FAX: 358-(0)9-191 58952
> Institute of Biotechnology | Sec: 358-(0)9-191 58921
> University of Helsinki, | Mobile: 358-(0)50-336 8960
> PL 56 | Home: 358-(0)9-728 7103
> 00014 Helsinki | email: Adrian.Goldman@Helsinki.fi
> 
> -- on sabbatical at Brookhaven National labs, June 2000-June 2001
> Adrian Goldman, Biology Department, Building 463 50 Bell Ave., Brookhaven National Lab., Upton NY 11973. Phone: 631-344-2671 (off) 631-344-3417 (lab), 631-344-3407 (FAX). email: agoldman@bnl.gov

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From heikki@ebi.ac.uk Thu Jan 11 12:00:26 2001 Date: Thu, 11 Jan 2001 12:00:26 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
> Considering sequence atoms as symbols seems the most natural
> concept to me. Having single letters representing each symbol
> makes symbol arrays and strings more or less equivalent in Perl.
> This might not hold for multi-letter representations, so in the
> first place I'd expect an array to be returned. However, this is
> inconsistent with $seq->seq(), and reportedly inefficient due to
> Perl's array implementation.
> 
> I know you could still split at every 3rd letter as a simple way
> to get an array. I'd nevertheless accept a third optional
> parameter denoting the 'join' character, with a default of ''.

Can be done.

In my mind the main use of this function is in displaying translations
on top of nucleotide sequnces. Gaps inside codons are clearer with
the three letter coding.

	-Heikki


-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From dalke@acm.org Thu Jan 11 12:32:21 2001 Date: Thu, 11 Jan 2001 05:32:21 -0700 From: Andrew Dalke dalke@acm.org Subject: [Bioperl-l] looking for datafile parsers
Hello,

  I'm working on a parser generator as part of the
Biopython development.  It's getting towards completion
which means it's time to start writing papers about it.  :)
Indeed, my paper was accepted for a talk at the upcoming
Python conference.  One of the reviewers wanted more
information comparing my work to others in the field, so
I've been digging up related project.  I figure on writing
another paper for Bioinformatics which will include some
more of this information.

  The most similar program is SRS, which is also a parser
generator, although they are context free while my parser is
(mostly) regular.  I tried to get a copy of the reference
paper (from Meth.Enzy.) from the library but it was checked
out.  I would love it if someone would offer to answer a
few questions for my about it, and to run some benchmarks
to see how fast it parses swissprot38, say, as compared to
how long it takes the bioperl code to parse the same file.
Any takers?

  There are a few projects which allow users to specific
a format using a configuration description which can roughly
be classified as a regular expression pattern matcher sitting
on top of line type recognizer.  This includes Biopy and
BioDB-Loader as well as the current Biopython parser.  Another
class of projects uses a common data structure then implements
readers/writers to the different formats at the expense of
throwing away some data, such as bioperl and SeqIO.  Swissknife
is an example of a library which reads/writes from a single
format into a data format tailored specifically to that format.
A few are special case programs (grep, NiceProt, sp2fasta)
which do one and only one thing, although in the case of
sw2xml that one thing converts the format (SWISS-PROT) to
another format (XML) for which many tools are readily available.
Most of the packages throw away formatting information and
only store the physical data, although get-sprot-entry is a nice
example of why keeping presentation information is useful.
The program creates an HTML page which looks the same as the
original format except that various fields are marked up with
hyperlinks.  Finally, the project I've been working on, Martel,
lets you develop parsers which handle most, if not all, of
these cases.

I want to make sure I covered everything so I've been searching
for SWISS-PROT parsers as my prototypical example.  A
description of what I found is below.  If something major is
missing, please tell me.  If you can provide assistence with
the SRS, GCG, Java or Lisp parts, also please tell me.


 Here's a key to some of the notation I use in the listings below:
count == count the number of records in a database
offset == generate offsets into the file for fast indexing
fasta == extract data for FASTA (ID, AC and SQ fields)
generic == extract generic sequence data, usually as a
   data structure containing fields common to multiple formats
   but ignoring some SWISS-PROT specific fields
all == extract all fields
validate == validate that a record is in the correct format
markup == identifies fields and saves the layout data so as
    to allow HTML markup without otherwise changing the format
    (timings not given for markup since it will depend on the
     specific markup requested, and because only Martel and
     get-sprot-entry preserve markup)

Performance is measured against the 80,000 records of
swissprot38


grep - http://www.gnu.org/gnulist/production/grep.html
  written in C
  count (when used as "grep ^ID | wc")
     takes 0m:57s to parse sprot38
  offset (when used as "grep -b ^ID")
  cannot be used for fasta, generic, all, validate, markup

one really large regular expression  (here as a bit of humor)
  written in C
  cannot be used for count, offset, fasta, generic, all, markup
  can be used for validate in theory, but I haven't tested it

bioperl - http://www.bioperl.org/
  written in Perl
  count (as a special case of generic)
  fasta (as a special case of generic)
  generic
    takes 30m:13s to parse sprot38
  cannot be used for index (?), all, validate, markup

biopython - http://www.biopython.org/
  written in Python
  count (as a special case of all)
  fasta (as a special case of all)
  generic (as a special case of all)
  all
    takes 28m:55s to parse sprot38
  validate
  cannot be used for index(?), markup

biojava - http://www.biojava.org/
  written in Java
  unknown (have source but need to figure it out)
  performance unknown (don't know how to code in Java)

Martel - http://www.biopython.org/~dalke/Martel/
  written in Python with a C extension
  count
    RecordReader.StartsWith "ID" takes 1m28s to parse sprot38
  index
  fasta (standard format def. but only using the ID and SQ tags)
    takes 9m:23s to parse sprot38
  generic (as a special case of all)
  all
    takes 23m:29s to parse sprot38
  validate
    with no callbacks takes 6m:41s
  markup
  

SRS - http://www.lionbio.co.uk/
  written in C (?)
  have never used it, but it can definitely do count, fasta,
  generic and all.  The standard swissprot format definition
       http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?
       -page+LibInfo+-id+01FXMii+-lib+SWISSPROT
  cannot be used to validate although SRS itself can.  I
  think SRS can be used to generate HTML markup but I can't
  begin to guess how that might be done.
    *** I really want to ask someone questions about SRS ***
    *** Any takers? ***
  I don't think it can be used to create your own indicies - 
    you must use its offset tables.

swissknife - ftp://ftp.ebi.ac.uk/pub/software/swissprot/
  written in Perl
  count
    lazy reader takes 1m:48s to parse sprot38
  fasta (getting the ->ID and ->SQ attributes)
    takes 8m:47s to parse sprot38
  generic (as a special case of all)
  all
    takes 38m:21s to parse sprot38
  cannot be used to validate, markup

Biopy - http://shag.embl-heidelberg.de:8000/Biopy/
  written in Python
  count (as a special case of all)
  index (by "position += length($_)")
  fasta (as a special case of all)
  generic (as a special case of all)
  all - requires additional programming to parse the subfields
    (it only identifies lines) so I actually wouldn't count
    this as a full parser.
    * takes roughly 25m to parse
  cannot be used to validate, markup

Darwin - http://cbrg.inf.ethz.ch/Darwinshome.html
  is its own language and set of libraries
  contains a converter from SWISS-PROT to its own format.
  I don't access to the source code so the following is based
  on the example parser at
    http://www.inf.ethz.ch/personal/hallett/drive/node92.html
  count (as a special case of all)
  fasta (as a special case of all)
  generic (as a special case of all)
  all - requires additional programming to parse the subfields
    although the real implementation may contain all of that.
  given example cannot be used to index, validate, markup
(Why does http://www.inf.ethz.ch/personal/hallett/drive/drive.html
say that SWISS-PROT 38 has only 77,977 record when my copy has
exactly 80,000?)

SeqIO - http://www.cs.ucdavis.edu/~gusfield/seqio.tar.gz
  written in C
  count (as a special case of generic)
  fasta (as a special case of generic)
  generic
    have not yet benchmarked
  cannot be used to index, all, validate, markup

readseq (C) - http://iubio.bio.indiana.edu/soft/molbio/readseq/
                 version1/readseq.shar
  written in C
  doesn't have swissprot and need to test of embl works instead
  to be tested  

readseq (Java) - http://iubio.bio.indiana.edu/soft/molbio/readseq/
                    java/readseq-source.zip
  written in Java
  have not yet explored (see above where I need help on how
   to write a good test program in Java.)

Boulder - http://stein.cshl.org/software/boulder/
  written in Perl
  count (as a special case of generic)
  fasta (as a special case of generic)
  generic
    have not yet benchmarked
  cannot be used for index, all, validate, markup

molbio++ - ftp://ftp.ebi.ac.uk/pub/software/unix/molbio.tar.Z
  written in (now obsolete) C++ which doesn't compile
  I think it can be classified as
  count (as a special case of generic)
  fasta (as a special case of generic)
  generic, although it calls for some extra parsing to get
     at subfields of a data line
     * will not be benchmarking since I don't want to spend
        the effort to get it to compile.
  cannot be used for index, all, validate, markup

BioDB-Loader - http://www.franz.com/services/conferences_seminars/
                 ismb2000/biodb1.tar.Z

  written in Common Lisp (Help! I know even less lisp than Java!)
  I'm guessing it can be classified as
  count (as a special case of generic)
  index
  fasta (as a special case of generic)
  generic, although it calls for some extra parsing to get
     at the subfields of a data line
     * have not benchmarked, although I have downloaded the Allegro
        common Lisp demo version.
  cannot be used for all, validate, markup

GCG - http://www.gcg.com/products/wis-package.html
  written in C (?)
  never used it.  Betting it can be classified as
  count (as a special case of generic)
  index
  fasta (as a special case of generic)
  generic
    have not benchmarked since I'm not spending that much
    money just to test the performance.
  cannot be used for all, validate, markup

sp2fasta - part of ftp://ftp.ncbi.nlm.nih.gov/toolkit/ ?
  Can't seem to find it in the current distribution.  Various
  web pages imply it is a C program to convert SWISS-PROT/EMBL
  to FASTA.
  count (if used together with grep and wc)
  fasta
    have not benchmarked since I cannot find code
  cannot be used for index, generic, all, validate, markup

sw2xml - http://www.vsms.nottingham.ac.uk/biodom/software/
             protsuite-user-dist/sw2xml-protbot.pl
  written in Perl.  It is a translation program from SWISS-PROT
  to XML so some additional, though minor, XML coding is needed
  to do the following.
  count (as a special case of all)
  fasta (as a special case of all)
  generic (as a special case of all)
  all
    have not yet benchmarked
  cannot be used to index, validate, markup (because of the 'tidy')

NiceProt - used at ExPASy
  implementation information not available
  only used to parse a single record
  parses the data file but doesn't build a data structure (?)
  so creation of fasta, generic and all require som modifications.
  cannot be used to count, index, validate(?), markup

get-sprot-entry - used at ExPASy
  implementation not available
  can be used to markup a record (eg, see
    http://expasy.cbr.nrc.ca/cgi-bin/get-sprot-entry?P52930 )
  doesn't build data structures or convert to another format
    so it cannot be used for anything else (true?)


Whew!  I'ld be surprised if I really did miss some other
major style of parsing.  Actually, I did - there are no
lex/yacc grammers for SWISS-PROT but I'm not surprised
because the lexing is strongly position dependent which
calls for tight, explicit, tricky communications with the
parser.

Any other suggestions?

Sincerely,

                    Andrew Dalke
                    dalke@acm.org



From simon.brocklehurst@CambridgeAntibody.com Thu Jan 11 13:59:22 2001 Date: Thu, 11 Jan 2001 13:59:22 +0000 From: Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com Subject: [Bioperl-l] Re: [Biojava-l] looking for datafile parsers
Hi Andrew,

You might be interested to know that CAT has contributed to biojava a
SAX2-compliant, event-based parsing framework for dealing with
bioinformatics data files.

Essentially, by using a SAX2 model, the framework allows users to build
arbritrary XML Content Handlers for dealing with data from
bioinformatics files in arbritary ways.  The framework generates SAX2
events from bioinformatics format files i.e. the input data isn't XML,
nor is it converted into XML internally.

It's a reasonable implementation of the SAX2 e.g. Users can:

o Set properties on SAX Parsers e.g. configuration of various features
namespace reporting etc.

o Handle infinitely large files, because it works like a SAXParser
should i.e. doesn't keep the whole file in memory etc.

o Deals with InputSources i.e. essentially various flavours of streams.

A couple of neat benefits of a implementationing of SAX2:

o It's trivial to create XML format versions of files so, with which you
can do whatever you want with these e.g. using XSLT

o By stringing together biojava SAXParsers which are non-validating,
with validating SAXParsers from e.g. IBM, you can create parsers that
validate against DTDs and/or XML Schemas that we produce for the data
formats supported by the framework.  Because, the bioinforamtics data
from is modelled in a strongly typed way by the framework, you can get
genuinely useful benefits from validation.

We haven't put SwissProt support into this framework as of yet -
biojava already had ways of handling SwissProt data before we put the
SAX2 framework in.  Currently we have in there OK support for NCBI Blast
and WU-Blast, and improving support for HMMER, and PDB data.

Hope this info is useful...

Simon
--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com



From ajm6q@virginia.edu Thu Jan 11 13:52:32 2001 Date: Thu, 11 Jan 2001 08:52:32 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] looking for datafile parsers
On Thu, 11 Jan 2001, Andrew Dalke wrote:

> Finally, the project I've been working on, Martel,
> lets you develop parsers which handle most, if not all, of
> these cases.

Excellent, I look forward to seeing your work.  Parsing is the meat and
potatoes of bioinformatics, and it's beginning to taste very stale (I
dunno, maybe it's been stale for awhile now).  My own secret wish list is
focused more on result file parsing; I once spent a fair amount of time
building a "robust" FASTA result file parser, but found myself constantly
needing to tweak it to keep up with fasta development changes.  You don't
have that problem with SwissProt or other static file formats.

> grep - http://www.gnu.org/gnulist/production/grep.html
>   written in C
>   count (when used as "grep ^ID | wc")
>      takes 0m:57s to parse sprot38
>   offset (when used as "grep -b ^ID")
>   cannot be used for fasta, generic, all, validate, markup

I've actually found that I now use grep and a small mix of perl more than
any other parsing routine (mainly because of the predicament I mention
above: when a format changes, I have to fix the entire parser, even if I
just want to pull out a few relevant fields at the moment).  My result
file "parsers" often take a few 'grep swipes' at the file (since the
second grep on the same file is commonly much faster than the first), and
as you show, it's very fast to begin.  The one extension to grep that I'd
dearly like to see (perhaps I'll submit a patch) would be to extend the -A
and -B (after-context and before-context flags) to take regexp's
themselves (i.e. instead of printing N lines after the first match,
continue printing until the second regexp is matched, or other
possibilities depending on specified flags).  Then you could start using
(multiple) greps to get 'fasta', 'generic', 'all' satisfied.

Use the shell, Luke.

-Aaron

-- 
 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  /
 \ University of Virginia  \
 /  (804) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o



From insana@ebi.ac.uk Thu Jan 11 18:15:04 2001 Date: Thu, 11 Jan 2001 18:15:04 +0000 (GMT) From: Joseph Insana insana@ebi.ac.uk Subject: [Bioperl-l] make tests
> Why don't you trap the warning in an eval/$SIG{__WARN__} - I don't see why
> you can't test for proper warnings, if that's what you were trying to do.

I didn't know that.
Now that I understood what you meant and read through the manual how
to apply it, I see it's the perfect solution.

Thank you very much
                   Joseph


From birney@ebi.ac.uk Fri Jan 12 22:10:36 2001 Date: Fri, 12 Jan 2001 22:10:36 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
[Ewan recovers from rereading the Bio::Root:: stuff...]

This is *mainly* for Jason and Hilmar, but in case there are other
people who want to chip in:


I want to completely detach RootI from the other Root::Objects (in
particular Err). This means a heavy refactoring of RootI - mainly in 
removing the code.

I will keep ->throw and ->warn but not ->verbose as a real method. (jason
- do you mind this?) (I will have a "deprecation warning" on verbose)


I am planning to do this on my local copy now and see how it pans out...


Bio::Root::Object in it's full glory will still be there for modules we
have not migrated to Bio::Root::RootI


thoughts anyone?


-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From lapp@gnf.org Fri Jan 12 22:32:11 2001 Date: Fri, 12 Jan 2001 14:32:11 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] RootI detachment proposal.
Ewan Birney wrote:
> 
> [Ewan recovers from rereading the Bio::Root:: stuff...]
> 
> This is *mainly* for Jason and Hilmar, but in case there are other
> people who want to chip in:
> 
> I want to completely detach RootI from the other Root::Objects (in
> particular Err). This means a heavy refactoring of RootI - mainly in
> removing the code.
> 
> I will keep ->throw and ->warn but not ->verbose as a real method. (jason
> - do you mind this?) (I will have a "deprecation warning" on verbose)
> 
> I am planning to do this on my local copy now and see how it pans out...
> 
> Bio::Root::Object in it's full glory will still be there for modules we
> have not migrated to Bio::Root::RootI
> 
> thoughts anyone?
> 

verbose() is being made use of heavily as far as I saw some code and code
migrations from Jason. I do think that it is beneficial and desirable to
have a central mechanism for regulating 'verbosity' (e.g., what happens
upon a warning being issued). I also don't see yet why having verbose() in
RootI hampers disentangling RootI from the other objects, or where this
should interfere. (People who don't want that feature simply override it
with a stub.)

Maybe I'm missing something. Ideally I don't have to come up with a
SeqIO-specific mechanism concerning client-side regulation of the severity
of warnings.

	Hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From birney@ebi.ac.uk Fri Jan 12 22:51:21 2001 Date: Fri, 12 Jan 2001 22:51:21 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
On Fri, 12 Jan 2001, Hilmar Lapp wrote:

> 
> verbose() is being made use of heavily as far as I saw some code and code
> migrations from Jason. I do think that it is beneficial and desirable to
> have a central mechanism for regulating 'verbosity' (e.g., what happens
> upon a warning being issued). I also don't see yet why having verbose() in
> RootI hampers disentangling RootI from the other objects, or where this
> should interfere. (People who don't want that feature simply override it
> with a stub.)
> 
> Maybe I'm missing something. Ideally I don't have to come up with a
> SeqIO-specific mechanism concerning client-side regulation of the severity
> of warnings.

Yeah. I know. I guess I am thinking with my C-extension hat on again.

Ok. verbose stays.



> 
> 	Hilmar
> 
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp@gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From jason@chg.mc.duke.edu Fri Jan 12 23:43:20 2001 Date: Fri, 12 Jan 2001 18:43:20 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] RootI detachment proposal.
On Fri, 12 Jan 2001, Ewan Birney wrote:

> 
> 
> [Ewan recovers from rereading the Bio::Root:: stuff...]
> 
> This is *mainly* for Jason and Hilmar, but in case there are other
> people who want to chip in:
> 
> 
> I want to completely detach RootI from the other Root::Objects (in
> particular Err). This means a heavy refactoring of RootI - mainly in 
> removing the code.
> 
> I will keep ->throw and ->warn but not ->verbose as a real method. (jason
> - do you mind this?) (I will have a "deprecation warning" on verbose)

well, actually verbose makes me happy because we can choose whether or not
warn will actually print out msgs.  Can it just be a get/set method and
warn can check to see if verbose > 0 before printing?  I like to use it as
a debugging flag as well so we can have object specific debugging flags.

> 
> 
> I am planning to do this on my local copy now and see how it pans out...
> 
> 
> Bio::Root::Object in it's full glory will still be there for modules we
> have not migrated to Bio::Root::RootI
> 
> 
> thoughts anyone?
> 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From birney@ebi.ac.uk Sat Jan 13 01:18:34 2001 Date: Sat, 13 Jan 2001 01:18:34 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] refactoring RootI
I have finished a very serious refactoring of RootI. This detaches
RootI from the other Root:: objects completely. verbose I think it handled
nicer. I would venture to say that the code is more readable.

I have changed the formatting somewhat of the stack trace in the
throw/warn statements. Your milage may vary here...

Jason, Hilmar - check it out and tell me what you think.

I am now a little exhausted although the final product I think is vastly
improved...


-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From dsokol@osnut.com Sat Jan 13 07:47:09 2001 Date: Sat, 13 Jan 2001 02:47:09 -0500 From: dsokol@osnut.com dsokol@osnut.com Subject: [Bioperl-l] Exciting New Nutraceutical Company- Promote your own ideas!
--=200101130127=
Content-Type: text/html;charset=US-ASCII

<!-- saved from url=(0022)http://internet.e-mail -->
<html>

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits</title>
</head>

<body bgcolor="#FFFFFF" text="#008000">
<p align="left"> bioperl-l@bioperl.org,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</p>
<p align="left">&nbsp;&nbsp;&nbsp; It was a pleasure learning about your
interests in biology from your website.&nbsp; Based on your
credentials, I am offering you the following opportunity, which I hope you may
find worthwhile.</p>
<p align="left">Thank you,</p>
<p align="left">Daniel</p>
<p align="center"><font face="Arial Black" size="5">&nbsp;<b><font color="#008000">Have
your nutraceutical ideas become reality and marketed to the general public-and perhaps even globally</font></b><font color="#008000"><b>.</b></font></font></p>
<p align="center"><b><u><font face="Arial Black" size="4">Design Your Own Herbal
and Nutritional Supplements and Reap the Financial Benefits from the Quality of
your own ideas!</font></u></b></p>
      <p align="center"><font face="Gill Sans Ultra Bold" size="4">Kava Kava, Ginseng,
      Echinacea, St. John's Wort...</font></p>
<p align="center"><font face="Gill Sans Ultra Bold" size="4">For <u>FREE</u>
information on these nutraceuticals, including their methods of synthesis,&nbsp;
you can go to <a href="http://www.osnut.com/freeinfo.htm">http://www.osnut.com/freeinfo.htm</a>
by clicking <a href="http://www.osnut.com/freeinfo.htm">HERE</a>.</font></p>
<p align="center"><font color="#008000" size="4">The
      explosion in the nutraceutical industry has left open the possibility for
considerable profits.&nbsp; New nutraceuticals and herbal formulas are being
      discovered, designed, and marketed every day!&nbsp; If you have a
      background in herbs/
      biology/ chemistry /nutrition and/or medicine, then OSnutraceuticals
      is the company for you.</font></p>
<p align="center"><font size="4" color="#008000">Open
      Source Nutraceuticals, Inc. is a company committed to
      excellence in the nutraceutical industry by providing an open
      source
      for the creation and standardization of nutraceuticals for
      naturally treating all kinds of conditions. By implementing
      a&nbsp;linux-like
      platform for discussion and protection of your ideas, OSnutraceuticals can
be the best way to have your innovations marketed to the general
      public and for you to reap the financial benefits from the
sales.</font></p>
      <p align="center"><font size="4" color="#008000">Sign up <b>NOW</b> and
      get 2 months <b>FREE</b>!</font></p>
      <p align="center"><font color="#008000" size="4">For
      more information, visit <a href="http://www.osnut.com">www.osnut.com</a></font></p>
      <p align="center"><font color="#008000" size="4">by
      clicking <a href="http://www.osnut.com">HERE!</a></font></p>
<p align="center"><font color="#008000" size="4">(Note:
<a href="http://www.osnut.com">www.osnut.com</a> is best viewed
      using Microsoft's Internet
      Explorer but can also be viewed with Netscape as well)</font></font></p>

<p align="center"><font size="3">&nbsp;</font><font size="4">If you feel
you received this ad by mistake, please contact <a href="mailto:dsokol@osnut.com">dsokol@osnut.com </a>and put the word
&quot;remove&quot; in the subject line.&nbsp; You will automatically be taken
off our mailing list!</font></p>

</body>
</html>

--=200101130127=--


From heikki@ebi.ac.uk Sat Jan 13 16:56:10 2001 Date: Sat, 13 Jan 2001 16:56:10 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
Jason Stajich wrote:
> 
> On Fri, 12 Jan 2001, Ewan Birney wrote:
> 
> >
> >
> > [Ewan recovers from rereading the Bio::Root:: stuff...]
> >
> > This is *mainly* for Jason and Hilmar, but in case there are other
> > people who want to chip in:
> >
> >
> > I want to completely detach RootI from the other Root::Objects (in
> > particular Err). This means a heavy refactoring of RootI - mainly in
> > removing the code.
> >
> > I will keep ->throw and ->warn but not ->verbose as a real method. (jason
> > - do you mind this?) (I will have a "deprecation warning" on verbose)
> 
> well, actually verbose makes me happy because we can choose whether or not
> warn will actually print out msgs.  Can it just be a get/set method and
> warn can check to see if verbose > 0 before printing?  I like to use it as
> a debugging flag as well so we can have object specific debugging flags.

I'd like to use verbose function but RootI documention is a bit hard
to read at the moment. I have not followed too closely the discussion
about RootI object but once this restructuring is done, it would be
great to have a few clear examples what RootI can do and what are the
options.

For example, I was pleasently surprised that I could ignore the
contructor method for a simple class which inherits from
Bio::Root:RootI. I was not sure if it worked before trying.

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From heikki@ebi.ac.uk Sat Jan 13 17:38:16 2001 Date: Sat, 13 Jan 2001 17:38:16 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] three letter codes for amino acids?
I just committed the first version(s) of Bio::SeqUtils. Add in it
any method you'd like Bio::PrimarySeqI compliant objects have.
I put it two methods: ->seq3 and ->seq3in. 

seq3in, since now we do not have to worry about messing with
interfaces, translates three letter amino acid codes into one letter
code an stores it in the current sequence object. It throws an
exception when seeing a code it does not know, although it probably
should only warn and let -verbosity decide what to do. 

As an extra feature, both methods know about selenocystein (Sel, U).

Have fun,

	-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From schattner@alum.mit.edu Sat Jan 13 21:41:42 2001 Date: Sat, 13 Jan 2001 13:41:42 -0800 From: Peter Schattner schattner@alum.mit.edu Subject: [Bioperl-l] Molecular weight calculations
I've recently been revisiting the dna & protein molecular wieght
calculations in SeqStats.pm and realize I have a few related questions I
would like to pose to the more bio-chemically oriented folks on the list.

In nucleic acid weight calculations:

1.  Should SeqStats use the charged or the neutral molecular weight of
the sugar-phosphate backbone? Given that these groups are charged at
physiological pH it seems reasonable to me - and the one biochemist with
whom I spoke - to use the charged values.  However, at least one
commercial package (VectorNTI) uses neutral weights so I am unsure. (The
difference is ~0.5% - 1% ).

2. For the initial (5') and final (3') sugar phosphate, should SeqStats
add an extra OH and an extra H respectively?  Again adding the weight of
the additional water seems readonable to me but is not the way the
weight calculation is sometimes performed. (The diference here is 18
which is negligible except when computing molecular weights of very
short oligos.)

In protein weight calculations:

3. Should SeqStats use the charged or the neutral molecular weights of
the acidic and basic amino acid residues (eg aspartate, glutamate,
histidine, arginine, lysine) in its computations? Given that these amino
acids are charged at physiological pH it seems reasonable to use charged
values.  However, again VectorNTI uses neutral weights so I am unsure. 
(The difference is ~0.5% - 1%  times the fraction of amino acids in the
protein which are acidic or basic).

 Although the difference in calculated weights is small, my
understanding is that with mass spectroscopy becoming increasingly
important for protein and nucleic acid analysis, having more precise
molecular weights might be useful (but if that's not really true, I'd
like to know that too.)  

It's easy enough to implement the calculation in any of these ways.Just
want to do it in the way that seems most useful.

Thanks for the help.

Peter

(The only downside of all this is that my revisiting of these
caclulations was triggered by Keith James discovering a bug in the
molecular weight calculations in the current (0.6)  version of
SeqStats.pm which causes it to return inaccurate values :--(. 
Everything is fixed for the - hopefully soon - 0.7 release, but in the
meantime the molecular weight routines of SeqStats should be avoided. 
The other methods of SeqStats.pm are fine.)

From birney@ebi.ac.uk Sun Jan 14 12:39:36 2001 Date: Sun, 14 Jan 2001 12:39:36 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Re: pSW problem
yOn Sat, 13 Jan 2001, Peter Schattner wrote:

> Hi Ewan
> 
> I just noticed that the demo of pSW in bptutorial.pl no longer works on
> my machine.
> Nor does examples/pSW.pl.  In either case I get an error message like
> that shown below. I can't
> tell what's going on. Any ideas what may have changed?
>

i will track this down. I spotted this as well ;)

 
> Peter
> 
> 
> [peter@pschattner examples]$ perl -w psw.pl
> Use of uninitialized value at
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/pSW.pm line 298.
> Use of uninitialized value at
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/pSW.pm line 298.
> Warning Error
>  Passed in NULL objects into Align_Sequences_ProteinSmithWaterman!
> 
> -------------------- EXCEPTION --------------------
> MSG: Unable to build an alignment
> CONTEXT: Error in uNKNOWN CONTEXT
> SCRIPT: psw.pl
> STACK:
> Bio::Tools::pSW::align_and_show(299)
> main::psw.pl(89)
> ---------------------------------------------------
> 
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------



From birney@ebi.ac.uk Sun Jan 14 12:51:17 2001 Date: Sun, 14 Jan 2001 12:51:17 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] updated
A couple of days I updated the task list for 0.7

http://bio.perl.org/wiki/html/BioPerl/TaskList.html

which is getting much more "green". Hilmar - I think we drop some of the
more unlikely things to make it into 0.7 (NetIO class for example?) and
concentrate on the last important features ...




-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From birney@ebi.ac.uk Sun Jan 14 12:52:27 2001 Date: Sun, 14 Jan 2001 12:52:27 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] RootI detachment proposal.
On Sat, 13 Jan 2001, Heikki Lehvaslaiho wrote:

> I'd like to use verbose function but RootI documention is a bit hard
> to read at the moment. I have not followed too closely the discussion
> about RootI object but once this restructuring is done, it would be
> great to have a few clear examples what RootI can do and what are the
> options.

have you cvs updated recently? I think the RootI is looking in much better
shape at the moment...

> 
> For example, I was pleasently surprised that I could ignore the
> contructor method for a simple class which inherits from
> Bio::Root:RootI. I was not sure if it worked before trying.
> 
> 	-Heikki
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/                      http://www.ebi.ac.uk/mutations/
>      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
>     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
>    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
>   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
>      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From hlapp@gmx.net Tue Jan 16 19:02:47 2001 Date: Tue, 16 Jan 2001 11:02:47 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] refactoring RootI
Ewan Birney wrote:
> 
> I have finished a very serious refactoring of RootI. This detaches
> RootI from the other Root:: objects completely. verbose I think it handled
> nicer. I would venture to say that the code is more readable.
> 
> I have changed the formatting somewhat of the stack trace in the
> throw/warn statements. Your milage may vary here...
> 
> Jason, Hilmar - check it out and tell me what you think.
> 
> I am now a little exhausted although the final product I think is vastly
> improved...
> 

Well, that was a radical surgery :) Even though SteveC won't be
excited about it, it looks we now have a relatively clear and
straight code base there. It also seems that Err.pm is now
superfluous, so we may want to deprecate it.

We should also build a test for $obj->throw(), that it really
prints a meaningful stack trace. In addition, there should be a
test demonstrating that $obj->verbose(2) really turns warn() into
throw().

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From hlapp@gmx.net Tue Jan 16 19:53:58 2001 Date: Tue, 16 Jan 2001 11:53:58 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Status 0.7
Ewan Birney wrote:
> 
> A couple of days I updated the task list for 0.7
> 
> http://bio.perl.org/wiki/html/BioPerl/TaskList.html
> 
> which is getting much more "green". Hilmar - I think we drop some of the
> more unlikely things to make it into 0.7 (NetIO class for example?) and
> concentrate on the last important features ...
> 

I think we should stick to our goal of finalizing the 0.7 release
by the end of January. The situation actually doesn't look bad.
Major things remaining to be addressed as I see it basically
comprise of the following.

1) Fuzzy locations coverage. This is probably the most significant
hurdle. Jason's already elaborating an interface outline. If
anyone has suggestions/views/experience, feel encouraged to post.
You may also want to check out Ewan's proposal
(http://bioperl.org/pipermail/bioperl-l/2000-November/001724.html).

2) With the preceding being addressed, a review of SeqFeatureI and
BioCorba interoperability may go hand in hand. Jason, Brad, is
BioCorba 0.2 interoperability still within sight?

3) BPlite update. Lorenz seems to have abandoned the list, or is
too busy with other things. It's priority 2, but I think at the
same time as we are phasing out support for Blast.pm we need to
increase support for BPlite. Anyone out there who would volunteer
to assume responsibility?

4) SeqAnalysisParserI needs more elaboration, according to a
discussion we (Jason, Ewan, I) had in December. It'll probably be
the three of us who thrash this out.

5) Bio::SeqFeature::Transcript object. This will be related to
GeneStructure and the concept has been worked out between Ewan and
myself. Still, I'll need to put it into Perl code :)

6) Bugs reported on Incoming. (!) (These tend to be forgotten, but
I'm sure they won't be fixed in a matter of minutes.)

7) The rest I think (I hope :) is smaller fixups, some of which I
need to address myself.

We'll probably have to drop Root::StreamIO (priority 3), and
probably also fixing Blast.pm bugs, unless SteveC finds the time
to fix them. It seems that almost all priority 2 tasks will make
it into 0.7, BioCorba 0.2 being the only one not started yet.

Since more or less all of us can do BioPerl work only on weekends,
I suggest that we freeze the code on a Monday. I'll be off to San
Jose (is anyone else going to attend the Microarray Meeting at
BiOS?) the next weekend, so I propose to schedule the 0.7 code
freeze for Feb. 5th (one week earlier would be Jan 29th). Note
that once this is agreed upon, it will be a firm deadline.

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From jason@chg.mc.duke.edu Tue Jan 16 20:36:22 2001 Date: Tue, 16 Jan 2001 15:36:22 -0500 From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] refactoring RootI
----- Original Message -----
From: "Hilmar Lapp" <hlapp@gmx.net>
To: "Ewan Birney" <birney@ebi.ac.uk>
Cc: <bioperl-l@bioperl.org>
Sent: Tuesday, January 16, 2001 2:02 PM
Subject: Re: [Bioperl-l] refactoring RootI


> Ewan Birney wrote:
> >
> > I have finished a very serious refactoring of RootI. This detaches
> > RootI from the other Root:: objects completely. verbose I think it
handled
> > nicer. I would venture to say that the code is more readable.
> >
> > I have changed the formatting somewhat of the stack trace in the
> > throw/warn statements. Your milage may vary here...
> >
> > Jason, Hilmar - check it out and tell me what you think.
> >
> > I am now a little exhausted although the final product I think is vastly
> > improved...
> >
>
> Well, that was a radical surgery :) Even though SteveC won't be
> excited about it, it looks we now have a relatively clear and
> straight code base there. It also seems that Err.pm is now
> superfluous, so we may want to deprecate it.

I am very impressed as well, it should be a lot simplier.  I did notice the
warn/throw changed to only accept 1 parameter while I think it accepted 2
before - 1st paramet was printed as
MSG: $_[0]
second as
NOTE: $_[1]

But I don't think it is seriously important.
>
> We should also build a test for $obj->throw(), that it really
> prints a meaningful stack trace. In addition, there should be a
> test demonstrating that $obj->verbose(2) really turns warn() into
> throw().

Did that in t/RootI.t I think, but it may not be extremely complete.  Tried
to make it catch all the thrown errors in eval, I didn't play with the
SIG{__WARN__} settings enough to try and catch errors on warn when verbose==
1.

>
> Hilmar
> --
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>


From lapp@gnf.org Tue Jan 16 22:34:33 2001 Date: Tue, 16 Jan 2001 14:34:33 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] Refactor mercilessly
I found some thoughts about code refactoring at
http://www.extremeprogramming.org/rules/refactor.html. As we are
experiencing something similar with Bio::Root::*, what do people think
about the points made there with particular regard to Bioperl? I enclose
some quotes from that page.

	Hilmar

<quote>
We computer programmers hold onto our
software designs long after they have become
unwieldy. We continue to use and reuse code that is
no longer maintainable because it still works in some
way and we are afraid to modify it. 
[...]
Refactor mercilessly to keep the design
simple as you go and to avoid needless clutter and
complexity. Keep your code clean and concise so it
is easier to understand, modify, and extend. Make
sure everything is expressed once and only once.
[...]
There is a certain amount of Zen to
refactoring. It is hard at first because you must be
able to let go of that perfect design you have
envisioned and accept the design that was
serendipitously discovered for you by refactoring.
You must realize that the design you envisioned was
a good guide post, but is now obsolete.
</quote>

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From jason@chg.mc.duke.edu Tue Jan 16 22:54:20 2001 Date: Tue, 16 Jan 2001 17:54:20 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] what to do about Blast.pm, parsing
On the refactor front -

I think BPlite is a good way to go for moving functionality from Blast.pm,
however things like to_html/from_html are very nice and I'd like to see
migrated along.  Perhaps we could get a poll or priority list of features
from Blast.pm which identify what we use it for to be sure they are
migrated first.  Another alternative is to go for a clean code base and
write a module like what I've started locally called YABP (Yet Another
Blast Parser).  I'd like us to really identify the functions we want
before starting to write it since porting all of Blast.pm to a new module
is sort of silly if we aren't going to see signif benefit in
functionality or speed.  I do see the value in having a lightweight module
to accomplish some tasks and a heavyweight one for doing others.

I also have been playing with Parse::RecDescent some.  While writing a
grammar is not the most fun I've ever had, I've been able to write a
parser for GenBank files and get at least accession,locus, and sequence
lines parsed (I know, big deal).  Feature table will be a bit more fun,
but I think it may be a useful exercise whether or not we will really just
write grammars for seqformats I don't know.  Perhaps a grammar could be
written for blast files - might be more trouble than it's worth...

Just some thought rattling around...

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From jason@chg.mc.duke.edu Tue Jan 16 23:00:14 2001 Date: Tue, 16 Jan 2001 18:00:14 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Refactor mercilessly
On Tue, 16 Jan 2001, Hilmar Lapp wrote:

> I found some thoughts about code refactoring at
> http://www.extremeprogramming.org/rules/refactor.html. As we are
> experiencing something similar with Bio::Root::*, what do people think
> about the points made there with particular regard to Bioperl? I enclose
> some quotes from that page.
> 
> 	Hilmar
> 
I like XP for bioperl, but I ask who are our users as users are supposed
to drive the product?  It seems to be the users are also the system
developers.  So I think we have to stop occasionally and ask - what do I
want to be able to do with this system/api?  This is where some of the
list subscribers who don't want to develop code can really help out by
identifying areas that bioperl needs to focus on or where needs aren't
being met.

> <quote>
> We computer programmers hold onto our
> software designs long after they have become
> unwieldy. We continue to use and reuse code that is
> no longer maintainable because it still works in some
> way and we are afraid to modify it. 
> [...]
> Refactor mercilessly to keep the design
> simple as you go and to avoid needless clutter and
> complexity. Keep your code clean and concise so it
> is easier to understand, modify, and extend. Make
> sure everything is expressed once and only once.
> [...]
> There is a certain amount of Zen to
> refactoring. It is hard at first because you must be
> able to let go of that perfect design you have
> envisioned and accept the design that was
> serendipitously discovered for you by refactoring.
> You must realize that the design you envisioned was
> a good guide post, but is now obsolete.
> </quote>
> 
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp@gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From ajm6q@virginia.edu Tue Jan 16 23:25:13 2001 Date: Tue, 16 Jan 2001 18:25:13 -0500 (EST) From: Aaron J Mackey ajm6q@virginia.edu Subject: [Bioperl-l] what to do about Blast.pm, parsing
On Tue, 16 Jan 2001, Jason Stajich wrote:

> I also have been playing with Parse::RecDescent some.  While writing a
> grammar is not the most fun I've ever had, I've been able to write a
> parser for GenBank files and get at least accession,locus, and sequence
> lines parsed (I know, big deal).  Feature table will be a bit more fun,
> but I think it may be a useful exercise whether or not we will really just
> write grammars for seqformats I don't know.  Perhaps a grammar could be
> written for blast files - might be more trouble than it's worth...

I've often thought the same (and then stepped back and wondered if
blast/fasta/hmmer output could be expressed in BNF [ Backus-Naur Form ]).
It seems like an excellent project for an undergrad CS major who wanted to
crossover into bioinformatics.  There's too much grunt work involved for
any of us to want to do it, though ;)  Maybe we should take this off-list
Jason, but do you have any comments on Parse::ResDecent vs. Parse::Yapp
utility?

-Aaron

-- 
 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  /
 \ University of Virginia  \
 /  (804) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o



From jason@chg.mc.duke.edu Wed Jan 17 02:34:11 2001 Date: Tue, 16 Jan 2001 21:34:11 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Status 0.7
On Tue, 16 Jan 2001, Hilmar Lapp wrote:

> Ewan Birney wrote:
> > 
> > A couple of days I updated the task list for 0.7
> > 
> > http://bio.perl.org/wiki/html/BioPerl/TaskList.html
> > 
> > which is getting much more "green". Hilmar - I think we drop some of the
> > more unlikely things to make it into 0.7 (NetIO class for example?) and
> > concentrate on the last important features ...
> > 
> 
> I think we should stick to our goal of finalizing the 0.7 release
> by the end of January. The situation actually doesn't look bad.
> Major things remaining to be addressed as I see it basically
> comprise of the following.
> 
> 1) Fuzzy locations coverage. This is probably the most significant
> hurdle. Jason's already elaborating an interface outline. If
> anyone has suggestions/views/experience, feel encouraged to post.
> You may also want to check out Ewan's proposal
> (http://bioperl.org/pipermail/bioperl-l/2000-November/001724.html).

Hopefully will have something by the end of the week or early next week.
> 
> 2) With the preceding being addressed, a review of SeqFeatureI and
> BioCorba interoperability may go hand in hand. Jason, Brad, is
> BioCorba 0.2 interoperability still within sight?

I haven't played with this much, I was planning on doing it after the
SeqFeatureI - LocationI stuff was settled.

> 
> 3) BPlite update. Lorenz seems to have abandoned the list, or is
> too busy with other things. It's priority 2, but I think at the
> same time as we are phasing out support for Blast.pm we need to
> increase support for BPlite. Anyone out there who would volunteer
> to assume responsibility?
> 
> 4) SeqAnalysisParserI needs more elaboration, according to a
> discussion we (Jason, Ewan, I) had in December. It'll probably be
> the three of us who thrash this out.
 Hmm, we need to determine what the future of SeqFeatureProducerI is as
 well in this context.
> 
> 5) Bio::SeqFeature::Transcript object. This will be related to
> GeneStructure and the concept has been worked out between Ewan and
> myself. Still, I'll need to put it into Perl code :)
> 
> 6) Bugs reported on Incoming. (!) (These tend to be forgotten, but
> I'm sure they won't be fixed in a matter of minutes.)
> 
> 7) The rest I think (I hope :) is smaller fixups, some of which I
> need to address myself.
> 
> We'll probably have to drop Root::StreamIO (priority 3), and
> probably also fixing Blast.pm bugs, unless SteveC finds the time
> to fix them. It seems that almost all priority 2 tasks will make
> it into 0.7, BioCorba 0.2 being the only one not started yet.

I wanted to wait until code was stable before working on BioCorba stuff
since it is entirely dependant on the bioperl modules api.

> 
> Since more or less all of us can do BioPerl work only on weekends,
> I suggest that we freeze the code on a Monday. I'll be off to San
> Jose (is anyone else going to attend the Microarray Meeting at
> BiOS?) the next weekend, so I propose to schedule the 0.7 code
> freeze for Feb. 5th (one week earlier would be Jan 29th). Note
> that once this is agreed upon, it will be a firm deadline.

Yes.  Feb 5 is reasonable.  Let's see how close we are the week before and
take stock.  Thanks for being the lead on this!

> 
> 	Hilmar
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 





From SnyderEE@pbrc.edu Wed Jan 17 02:43:48 2001 Date: Tue, 16 Jan 2001 20:43:48 -0600 From: Eric Snyder SnyderEE@pbrc.edu Subject: [Bioperl-l] Map Manipulation and Genetic Analysis
Hello BioPerl Folks,

I was thumbing through the BioPerl modules list and noticed that there was not any coverage in the area of processing (non-sequence) maps and genetic data.  I am working on some programs for processing physical and genetic maps, as well as genotypic and phenotypic data.   I was wondering, is there any interest in these areas in the BioPerl community or, have I overlooked previous work on these things?  

I know of some of the stuff that Lincoln Stein has done (on ACEDB, RH mapping, etc.) but I have not seen anything in the form of reusable software components for basic map manipulation, comparison, etc.  Nor am I aware of modules for manipulating raw data for genetic analysis.  I am fairly new to working with genetic data.  I would be interested in hearing of leads in this area.  However, if it is not already done, I would be willing to write it in the context of BioPerl. 

Cheers,


Eric E. Snyder
Associate Professor
Pennington Biomedical Research Center
6400 Perkins Road
Baton Rouge, LA 70808-4124
USA
Phone:  (225) 763-3185
Fax: (225) 763-2525
Cell: (225) 235-6271
Email: eesnyder@pbrc.edu 
ICBM: N 30 24'14.0", W 91 07'20.0"


From jason@chg.mc.duke.edu Wed Jan 17 22:07:29 2001 Date: Wed, 17 Jan 2001 17:07:29 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Map Manipulation and Genetic Analysis
Eric - 

Heikki and I had batted around talking about MarkerI for describing
Markers which can be used to build maps.  I have some code that I am using
for some analysis which I am happy to donate when it is finished.  It
doesn't do much to represent maps other than assume that markers with the
same mapid are part of the same map (data is stored in db).  But I think a
good representation of Markers first and then Maps would be very good for
bioperl and those trying to bridge the gap between genetic analysis, maps,
and sequence based investigation.

-Jason
On Tue, 16 Jan 2001, Eric Snyder wrote:

> Hello BioPerl Folks,
> 
> I was thumbing through the BioPerl modules list and noticed that there
> was not any coverage in the area of processing (non-sequence) maps and
> genetic data.  I am working on some programs for processing physical
> and genetic maps, as well as genotypic and phenotypic data.  I was
> wondering, is there any interest in these areas in the BioPerl
> community or, have I overlooked previous work on these things?
> 
> I know of some of the stuff that Lincoln Stein has done (on ACEDB, RH
> mapping, etc.) but I have not seen anything in the form of reusable
> software components for basic map manipulation, comparison, etc.  Nor
> am I aware of modules for manipulating raw data for genetic analysis.  
> I am fairly new to working with genetic data.  I would be interested
> in hearing of leads in this area.  However, if it is not already done,
> I would be willing to write it in the context of BioPerl.
> 
> Cheers,
> 
> 
> Eric E. Snyder
> Associate Professor
> Pennington Biomedical Research Center
> 6400 Perkins Road
> Baton Rouge, LA 70808-4124
> USA
> Phone:  (225) 763-3185
> Fax: (225) 763-2525
> Cell: (225) 235-6271
> Email: eesnyder@pbrc.edu 
> ICBM: N 30 24'14.0", W 91 07'20.0"
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From imre.vastrik@helsinki.fi Thu Jan 18 10:29:58 2001 Date: Thu, 18 Jan 2001 12:29:58 +0200 From: Imre Vastrik imre.vastrik@helsinki.fi Subject: [Bioperl-l] BPlite bug
Don't know if this one is for Lorenz or Jason:

BPlite seems to be unaware of ' Frame = ...' lines in NCBI TBLASTN etc
reports. Consequently parsing of the alignment lines does not work
properly. The bug does not show up with the current test, since it is
BLASTP report (lacks Frame lines).
A quick hack would be to introduce the following line between lines 115
and 120:

    elsif ($_ =~ /^\s*Frame/)    {next}

However, the frame info, of course, will be lost.
Bug report filed.


Rgds.,

imre

From jason@chg.mc.duke.edu Thu Jan 18 17:55:53 2001 Date: Thu, 18 Jan 2001 12:55:53 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html

Please look it over, I didn't describe the detail of the fuzzy feature
methods because I'm not sure there will be extra methods, just overriding
things like start,end to be remapped.  The different feature types need to
be differentiated so that Bio::SeqIO::FTHelper can handle then differently
when parsing/writing.

Ewan, Let me know what I've left off.  Hilmar does this sound reasonable,
straightforward enough to you?

Some may have a beef about the name - SplitSeqFeature - you are welcome
to propose a better one.

Send you comments or make corrections to the wiki (send a courtesy 
note to let us know to check the webpage). 

Thanks for you help.

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 



From hlapp@gmx.net Thu Jan 18 19:11:57 2001 Date: Thu, 18 Jan 2001 11:11:57 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] Re: LocationI
Jason Stajich wrote:
> 
> Interfaces:
> 
> Bio::LocationI -> ISA RangeI
>  Purpose: capture location information - such as in an EMBL/GenBank
>           feature
>         /source 1..345
>  Methods: RangeI methods, and ...? [start/end/strand]
> 
>  Questions:  How is a LocationI object going to be different from the
>              vanilla SeqFeatureI or should be migrate some methods from
>              SeqFeature (start/end/strand) to LocationI and make
>              SeqFeaturesI more about tags (primary/source/has_tag/each_tag)
>              and gff stuff?

In principle I think yes. SeqFeatureI could still keep
start/end/strand and map these to calls into the location object.
Or, SeqFeatureI loses it (i.e., it's no longer mandatory), but for
simplicity SeqFeature::Generic keeps it.

> 
> Bio::ComplexLocationI -> ISA Bio::LocationI
>  Purpose: capture location information for features that are not linear
>          as in an EMBL/Genbank join
>          CDS             join(544..589,688..1032)
> 
>  Methods:
>         - sub_Locations() -> a list of LocationI objects that indicate
>           start/stop boundaries for this object must override overlap,
>           contains, etc from RangeI with since coordinates are not
>           contiguous
> 
> Objects:
>  Bio::SeqFeature::Generic -> ISA Bio::SeqFeatureI, Bio::LocationI
>         add the location() method to this object, the LocationI object
>         returned will be a reference to $self.
> 
> Bio::SeqFeature::Complex -> ISA Bio::SeqFeatureI, Bio::ComplexLocationI
>  Purpose: implementation to handle those join() statements

This is the outline you pretty much follow in the proposal on
Wiki. The point I'm not so happy with is that purely
location-specific issues change the class (type) of a SeqFeature.

> 
> I'm still not clear on what a fuzzy location is supposed to represent
> ie  - does that mean we know that the feature is located somewhere
> in the range, but we don't know the exact start/stop? 

Exactly. At least to my understanding.

> Why can't you treat
> it like real start/stop since we don't have any more information?  Or
> would union/intersection calculations need to behave differently?
> 

Well, biologically you can't, because annotating a sequence with
such a feature without indicating the uncertainty of start and end
is deceptive. For cDNA entries this is sometimes crucial: <1..100
as CDS location means that the entry doesn't even contain the
start of the CDS, and it's totally unclear where that is.

	Hilmar

-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From hlapp@gmx.net Thu Jan 18 19:26:57 2001 Date: Thu, 18 Jan 2001 11:26:57 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Jason Stajich wrote:
> 
> http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html
> 
> Please look it over, I didn't describe the detail of the fuzzy feature
> methods because I'm not sure there will be extra methods, just overriding
> things like start,end to be remapped.  The different feature types need to
> be differentiated so that Bio::SeqIO::FTHelper can handle then differently
> when parsing/writing.
> 
> Ewan, Let me know what I've left off.  Hilmar does this sound reasonable,
> straightforward enough to you?
> 

You didn't include actual interface definitions, did you? Just
wondering whether I missed the link.

As mentioned before, what bothers me is that in this layout
location-specific issues impact the class (type) of a SeqFeature.
Why should any SeqFeature change it's type only because its
location becomes uncertain or compound, and vice-versa?

I'd rather favor uncoupling a feature and its location, with
features having a reference to a location object which will give
further detailsif the application worries. An application that
doesn't do anything with the coordinates wouldn't notice a change,
but an application that e.g. draws features on sequences will have
to decide what to do if the location object says that the
coordinates are not well determined. Retrieving the sequence part
the feature refers to on its attached seq will also be affected:
doing so for a feature with an uncertain location will result in
an exception being thrown. Separating SeqFeatureI and LocationI
allows also for the following: assume a feature with uncertain
start and end. If you're satisfied with an average start and end,
you can substitute the location object by a Range with certain
start and end, and voila - drawing, sequence excision etc will
just work fine on the very same feature object.

Maybe I'm missing something.

	Hilmar

-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From jason@chg.mc.duke.edu Thu Jan 18 19:41:51 2001 Date: Thu, 18 Jan 2001 14:41:51 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
On Thu, 18 Jan 2001, Hilmar Lapp wrote:

> Jason Stajich wrote:
> > 
> > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html
> > 
> > Please look it over, I didn't describe the detail of the fuzzy feature
> > methods because I'm not sure there will be extra methods, just overriding
> > things like start,end to be remapped.  The different feature types need to
> > be differentiated so that Bio::SeqIO::FTHelper can handle then differently
> > when parsing/writing.
> > 
> > Ewan, Let me know what I've left off.  Hilmar does this sound reasonable,
> > straightforward enough to you?
> > 
> 
> You didn't include actual interface definitions, did you? Just
> wondering whether I missed the link.

No - didn't describe actual interfaces since we are still struggling
through this.  Will do that when we agree enough.

> 
> As mentioned before, what bothers me is that in this layout
> location-specific issues impact the class (type) of a SeqFeature.
> Why should any SeqFeature change it's type only because its
> location becomes uncertain or compound, and vice-versa?


Ewan and I had decoupled the LocationI from SeqFeature but there was no
seen advantage, just interface mish-mash, perhaps we were too hasty?

What you suggest above could be done as:

Bio::SeqFeatureI ISA RangeI

method : location 
desc   : Get/Set method
args   : LocationI object
returns: LocationI object

method : start()
desc   : start location of seqfeature

sub start {
	my($self) = @_;
	return $self->location->start()
}

... similar for end ...

Bio::LocationI ISA RangeI

Bio::SplitLocationI ISA Bio::LocationI

method: sub_SeqFeatures()
desc  : method for obtaining list of sub Locations - they could be
        SeqFeature::Exons, SeqFeature::Generic, or LocationI's?
returns: list of LocationI or SeqFeatureI objects?

Bio::FuzzyLocationI ISA Bio::LocationI

method: get_embl_fuzzy_string()
desc  : possible method to return location as an embl string for a fuzzy
       location
returns: string


Does this seem more agreeable - location is decoupled from SeqFeature, but
we have to support backwards compatibility with SeqFeatureI ISA RangeI
which means all SeqFeatures have a start/end... 


> 
> I'd rather favor uncoupling a feature and its location, with
> features having a reference to a location object which will give
> further detailsif the application worries. An application that
> doesn't do anything with the coordinates wouldn't notice a change,
> but an application that e.g. draws features on sequences will have
> to decide what to do if the location object says that the
> coordinates are not well determined. Retrieving the sequence part
> the feature refers to on its attached seq will also be affected:
> doing so for a feature with an uncertain location will result in
> an exception being thrown. Separating SeqFeatureI and LocationI
> allows also for the following: assume a feature with uncertain
> start and end. If you're satisfied with an average start and end,
> you can substitute the location object by a Range with certain
> start and end, and voila - drawing, sequence excision etc will
> just work fine on the very same feature object.
> 
> Maybe I'm missing something.
> 
> 	Hilmar
> 
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 




From hlapp@gmx.net Thu Jan 18 20:34:24 2001 Date: Thu, 18 Jan 2001 12:34:24 -0800 From: Hilmar Lapp hlapp@gmx.net Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Jason Stajich wrote:
> 
> What you suggest above could be done as:
> 
> Bio::SeqFeatureI ISA RangeI
> 
> method : location
> desc   : Get/Set method
> args   : LocationI object
> returns: LocationI object
> 
> method : start()
> desc   : start location of seqfeature
> 
> sub start {
>         my($self) = @_;
>         return $self->location->start()
> }
> 

Note that as one of the few noticeable changes in the SeqFeatureI
API this call should be allowed to throw an exception if
	1) the start location is uncertain
	2) the start location does not refer to the attached seq
	(to be disputed)

> ... similar for end ...
> 
> Bio::LocationI ISA RangeI
> 
> Bio::SplitLocationI ISA Bio::LocationI
> 
> method: sub_SeqFeatures()
> desc  : method for obtaining list of sub Locations - they could be
>         SeqFeature::Exons, SeqFeature::Generic, or LocationI's?
> returns: list of LocationI or SeqFeatureI objects?
> 

Yeah, that's the really hairy case. We probably should define
first what we would like to be able to do with compound locations.
This is a strong call for feedback: what do people out there using
the package intend to do with compound locations? E.g. if you draw
annotations, would you just draw the part referring to the
attached seq? Ensembl people, any experience/wishlists for this?

An obvious requirement is the ability to recover the original
GenEmbl location string, so all the information necessary should
be present.

A compound location indeed is somewhat a hybrid between a location
and a feature, because a sublocation clearly only makes sense if
you also know the sequence it refers to. The sequence can be
identified by its name (but then which name? the name in the
location line as given in GenBank?), or by an object reference?
The latter can be very expensive, because the sequence can be
quite long, and if there are many of such sublocations, you
quickly eat up your memory. You could also construct the seq
object as sort of a dummy, without really holding the seq string.
Not really convincing. So why not the simple case: a
CompoundLocation has a method sub_Locations(). Each sublocation
has a method seqname() (or seq_id() or whatever you prefer), which
returns the same string as $feature->seqname() for subfeatures
lying on the same seq, and a different name for those referring to
other seqs. $feature->seq() for features with a compound location
throws an exception, unless all sublocations are on the same
(attached) sequence.

Too simple?

> Bio::FuzzyLocationI ISA Bio::LocationI
> 
> method: get_embl_fuzzy_string()
> desc  : possible method to return location as an embl string for a fuzzy
>        location
> returns: string
> 

min_start()/max_start() etc should also be included. start() and
end() in an implementation are overridden and throw exceptions,
depending on which end is uncertain (and least they should be
expected to throw exceptions). A certain end can be determined by
min_start() == max_start() (or .._end(), resp.).

> Does this seem more agreeable - location is decoupled from SeqFeature, but
> we have to support backwards compatibility with SeqFeatureI ISA RangeI
> which means all SeqFeatures have a start/end...
> 

I indeed like the decoupled approach much better.

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
-----------------------------------------------------------------

From birney@ebi.ac.uk Thu Jan 18 23:27:53 2001 Date: Thu, 18 Jan 2001 23:27:53 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
On Thu, 18 Jan 2001, Jason Stajich wrote:

> On Thu, 18 Jan 2001, Hilmar Lapp wrote:
> 
> > Jason Stajich wrote:
> > > 
> > > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html
> > > 
> > > Please look it over, I didn't describe the detail of the fuzzy feature
> > > methods because I'm not sure there will be extra methods, just overriding
> > > things like start,end to be remapped.  The different feature types need to
> > > be differentiated so that Bio::SeqIO::FTHelper can handle then differently
> > > when parsing/writing.
> > > 
> > > Ewan, Let me know what I've left off.  Hilmar does this sound reasonable,
> > > straightforward enough to you?
> > > 
> > 
> > You didn't include actual interface definitions, did you? Just
> > wondering whether I missed the link.
> 
> No - didn't describe actual interfaces since we are still struggling
> through this.  Will do that when we agree enough.
> 
> > 
> > As mentioned before, what bothers me is that in this layout
> > location-specific issues impact the class (type) of a SeqFeature.
> > Why should any SeqFeature change it's type only because its
> > location becomes uncertain or compound, and vice-versa?
> 
> 
> Ewan and I had decoupled the LocationI from SeqFeature but there was no
> seen advantage, just interface mish-mash, perhaps we were too hasty?


Just to chime in, my original proposal had locations separate from
SeqFeatures, but at the end of the day we seemed to be making two parallel
interface heirarchies with no real gain in abstraction or understanding,
and the potential for generating alot of confusion

So - I guess to flip around the question - what do we gain from hanging
location "off" seqfeature rather than merging the interfaces?


(remember interface definitions can be implemented with any number of
objects or object collections if so desired...)


e.



From birney@ebi.ac.uk Thu Jan 18 23:37:24 2001 Date: Thu, 18 Jan 2001 23:37:24 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
On Thu, 18 Jan 2001, Hilmar Lapp wrote:

> 
> Note that as one of the few noticeable changes in the SeqFeatureI
> API this call should be allowed to throw an exception if
> 	1) the start location is uncertain
> 	2) the start location does not refer to the attached seq
> 	(to be disputed)

My feeling is that seqfeature->start should still be well defined. It is
up to the SeqFeature implementing class to "make a sensible
decision" about start/end points.


If it is fuzzy/complex/strange the client can test. If the client does not
want to test and just wants to "draw it", I think inisiting that
start/end/seqname return *something* is valid. Otherwise the client has
no real option to figure out what to do with these things...

If we let the implementaiton objects get away with not implementing this,
the interface becomes less useful...

</snip>

> annotations, would you just draw the part referring to the
> attached seq? Ensembl people, any experience/wishlists for this?

Experience on our side is that

90% of things are either SeqFeatures or FeaturePairs and fit the simple
seqfeature interface just fine

the remaining 10% are genes and could be handled via some sort of complex
location thing. As genes have transcripts have exons, simple mapping to
complex locations is not on. For other internal reasons, Ensembl is very
likely to keep with specialised adaptor classes which map Ensembl genes to
Bioperl SeqFeatures, so we are flexible here...


> 
> An obvious requirement is the ability to recover the original
> GenEmbl location string, so all the information necessary should
> be present.

Right. 

> 

</snip>

> 
> min_start()/max_start() etc should also be included. start() and
> end() in an implementation are overridden and throw exceptions,
> depending on which end is uncertain (and least they should be
> expected to throw exceptions). A certain end can be determined by
> min_start() == max_start() (or .._end(), resp.).

I would be in favour or min_start/max_start but against letting start
throw an exception. The implementation has to decide how to "become a hard
feature" from being Fuzzy. It is up to the implementation. As long as this
is documented, this is no more arbitary than letting the client decide.

> 
> > Does this seem more agreeable - location is decoupled from SeqFeature, but
> > we have to support backwards compatibility with SeqFeatureI ISA RangeI
> > which means all SeqFeatures have a start/end...
> > 
> 
> I indeed like the decoupled approach much better.
> 

If we go for a decoupled approach I am keen on it being justified by more
than just "it feels good". We are increasing the complexity here alot and
we need justification...


> 	Hilmar
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From lapp@gnf.org Fri Jan 19 01:28:09 2001 Date: Thu, 18 Jan 2001 17:28:09 -0800 From: Hilmar Lapp lapp@gnf.org Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Ewan Birney wrote:
> 
> 
> >
> > min_start()/max_start() etc should also be included. start() and
> > end() in an implementation are overridden and throw exceptions,
> > depending on which end is uncertain (and least they should be
> > expected to throw exceptions). A certain end can be determined by
> > min_start() == max_start() (or .._end(), resp.).
> 
> I would be in favour or min_start/max_start but against letting start
> throw an exception. The implementation has to decide how to "become a hard
> feature" from being Fuzzy. It is up to the implementation. As long as this
> is documented, this is no more arbitary than letting the client decide.
> 

I think it is more arbitrary, and I'll tell you why. There is more
than one interpretation of fuzzy locations. I name two for which I
think the BioPerl core is not in a position to take the decision from
the client, which is why it shouldn't pretend that it is:
1) Uncertainty about the real location, that is, it is clear that the
described feature sits at a particular position, but for one reason or
another the producer of the feature can only give an estimated range
for start and/or end. Now, we can implement (and document) the rule
that in such cases $feature->start() and $feature->end() will always
return the widest (or smallest, or average, make your choice) possible
range. A client is then free to rely on it, thinking that what the
BioPerl developers decided for is probably the wisest choice you can
make. That's already catch #1. Catch #2 happens if there is a user of
the client program who, because he's a good user, read the
documentation of the client program, but not that of BioPerl. Do we
request users of programs that use BioPerl to read through the BioPerl
documentation as well?
2) The location is undefined. A location saying <1..100 is undefined
for that feature in its biological meaning. You're not supposed to
make up a value for an undefined value. If you had an interface
dividing two integers and returning an integer (to prevent you from
responding NAN or INF), and the denominator is zero, what do you
return?

I strongly believe that every client that does something sensible with
the feature coordinates should know, and should be required to make
sure in order to be safe from an exception, what type of coordinates
it is dealing with. It is not the task of BioPerl to relieve the
client from thinking, but it is its task to provide every information
the client needs for making an educated decision.

You can always divide by a number without checking for zero, but by
doing so you accept the risk that some day you might get an exception.
The same holds for clients calling $feature->start() instead of
obtaining the location object and examining it for its capabilities.

Maybe I'm missing an important point in having $feature->start()
guaranteed to be exception-free.

> >
> > I indeed like the decoupled approach much better.
> >
> 
> If we go for a decoupled approach I am keen on it being justified by more
> than just "it feels good". We are increasing the complexity here alot and
> we need justification...
> 

First for clarification: I thought we agree that we have different
interfaces, that is, SeqFeatureI (ISA RangeI) and LocationI (ISA
RangeI), don't we?

Regarding complexity, the question is whether we better have
subinterfaces for each of FuzzyLocation, CompoundLocation, etc (what
is etc?), or whether we pack all into one interface. I have a
preference for the first, because it let's you find out the type of
location by checking $loc->isa('Bio::SomeLocationInterface'). I maybe
missing another equally elegant way if everything's in one interface.

The increase in complexity is fairly little I think. All interfaces
can be put into their own subdirectory (Bio::Loc?). Only those people
are really concerned with it who want to deal with the coordinates in
a very reliable way (that is, avoid exceptions and deal with any
possible sort of location type). And these people really should care
what type of location they could encounter, and they mean. Everyone
else could simply use LocationI which in essence is probably the same
as RangeI.

Regarding your point that there can be many implementations of an
interface, sure that's true. In principle I have no problem with
$feature->location() returning $self, assuming that the SeqFeature
object implements LocationI itself. But I do think it's bad if a
SeqFeature implements every type of location interface itself, because
if I wanted to change the type of a feature's location I would end up
instantiating a SeqFeature passed to a SeqFeature as its location
object, which is weird isn't it. I say weird because it's not
lightweight. No more of those beast-like classes, please. I don't
think the reduction in hierarchy complexity achieved by beast classes
makes them easier to learn, or to use. 

You may ask why I wish to change the type of a location. Consider a
client program that draws features. When it encounters a feature with
a FuzzyLocation, it may want to ask the user what to do. The user may
even be able to set a preference like 'always take the widest possible
range'. Then the client program simply replaces the FuzzyLocation with
a Range object denoting the widest possible range and passes the
feature on to the drawing module. No code change necessary there. And
the user knows what he's doing, it's not just an arbitrary decision of
a backend library.

So, I still think that having not only individual interfaces, but also
individual implementations for the different location types is
justified, doesn't add too much complexity (in fact, it reduces hidden
complexity), and provides a clear API for programmers.

Long mail, sorry for wasting your time to read it, but you asked.

	Hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From birney@ebi.ac.uk Fri Jan 19 08:45:58 2001 Date: Fri, 19 Jan 2001 08:45:58 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more...
Ok. Hilmar and I are now probably into the "code aesthetics"  part of this
debate, which definitely is worth having but someone sometime has to make
a decision.


I suggest that we keep bashing this out on the list for a couple more days
(please... other people... if you have a view, do chip in). If Hilmar and
I are still disagreeing with aesthetics I would like to nominate Jason to
tie-break on the way to go (is this ok with you Hilmar and Jason...?)


We have two points of contention:

(a) Explicit Location objects or not.

Hilmar suggests an explicit location object

   SeqFeatureI has-a LocationI

   LocationI is sub classed for Split (join statements) and Fuzzies


Benefits - (a) easy to mix and match implementations of locations to
different feature objects, and (b) if mix and matching locations to
features is common, more realisatic. Hilmar argues that is clearer as
well.

Against - more objects and infact the majority of seqfeatures are little
more than the location, and two extra strings. 


For backwards compatibility, I think SeqFeatureI->start would *have* to be
delegated to SeqFeatureI->location->start - otherwise too much code will
break... (of course, this delegation could just be for a while as we move
code and people over to using "proper" locations)


People might be interested that I originally argued for an explicit
location object about 1 month ago. I don't now... 


I am suggesting that SeqFeatures do not have an explicit location object,
but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting
from a common SeqFeature interface


Benefits - (a) less objects (b) only one place where the client gets the
information and (c) more backwardly compatible.


Effectively my main argument is that there will always be a pretty clear
cut relationship that "this type of SeqFeature" is always "this class of
location" so the splitting of the location away from the SeqFeature is
just suggesting a mix-and-match world which doesn't actually exist.
Simpler and stronger to go for the combined interface in my view.




(b) ->start ->end throwing exceptions or not.


Hilmar says that for at least Fuzzies and possibly Splits the client
should figure out by rooting around the object how to map these more
complex locations to a simple start,end. The interface should allow
exceptions to be thrown on ->start/->end indicating that the client should
be treating this seqfeature somehow differently...


Basically we pass the buck to the client.


I say that the implementation objects have to provide a default mapping
of whatever ->start and ->end are. This means that clients can live in
this happy world of "I have well defined start/ends" if they so wish
without writing extra code. Smart clients are encouraged to root around in
the objects for their "real" interpretation of the fuzziness.


There are three reasons why I favour this:


   (a) Clients for dumping/drawing/manipulation have to treat large
numbers of sequence features as a pretty homogeneous mass. If we make
seqfeatures less homogeneous then every client is going to have to figure
out how to "homogenize" the seqfeatures - this will be different client to
client although for the main case they just want a "default way" of
handling them. We are encouraging a diversity of views when our clients
really want us to solve the problems for them.


   (b) as 99% of features are nice, well behaved "hard features" many
pieces of client code written with the bioperl libaries will just assumme
->start,->end do not throw exceptions. When this piece of code is used by
another user with a fuzzy feature, there will be a rather deep exception
thrown by bioperl through the client code. I think both the user and the
client with some justification will blame bioperl for this, no matter how
much we say "you should have read the documentation and written 3
different subroutines to replace every time you go

   if( $one->start == $two->start ) 

gets replaced by

   if( &my_exact_function($one,$two) ) {

   }


...

sub my_exact_function {


   # one of many if statements...

   if( $one->isa('Bio::FuzzyFeatureI') && 
	$two->isa('Bio::SimpleFeatureI') {
      ...

   }

}


   (c) long experience with seqfeatures has made me claim that the
following rules are generally just what people want:


    - simple features - easy

    - join statements - ignore leading and trailing '<' '>' and take the
edge start/end points on the sequence you are looking at


    - fuzzy features - either skip or - if you have to draw/compare them,
take start/end as the min hard location mentioned and the maximum hard
location mentioned, irregardless of the internal grammar.




I reckon bioperl will be better to implement the (c) method by default
without preventing smart clients from making their own decisions.






Another long email, but worth I think knowing where we disagree...






-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From gert.thijs@esat.kuleuven.ac.be Fri Jan 19 12:14:53 2001 Date: Fri, 19 Jan 2001 13:14:53 +0100 From: gert thijs gert.thijs@esat.kuleuven.ac.be Subject: [Bioperl-l] split seq feature and fuzzy feature proposal
Hilmar Lapp wrote:
> 
> Yeah, that's the really hairy case. We probably should define
> first what we would like to be able to do with compound locations.
> This is a strong call for feedback: what do people out there using
> the package intend to do with compound locations? E.g. if you draw
> annotations, would you just draw the part referring to the
> attached seq? Ensembl people, any experience/wishlists for this?
> 

I hope do not mind me giving some comments on this issue.
I am writing some programs to automatically extract genes and intergenic
regions from DNA sequences. So, I am mostly interested in the type of a
feature and also its start and end position in the sequence. The main problem
I am facing is that sometimes a feature is not extracted from the sequence
because it has a fuzzy location. 
eg. if the location of a CDS is described as "join(AL101010.1:1..201,123..245)
this CDS is not add to the list of feature and it is impossible to do anything
usefull with this sequence for me. 
In my opinion, I think it is important that a feature is created even if the
location is fuzzy. When there is a problem, it should be possible to access
the description of the location.

Gert



-- 
+ Gert Thijs              
+ 
+ email: gert.thijs@esat.kuleuven.ac.be 
+ homepage: http://www.esat.kuleuven.ac.be/~thijs
+ 
+ K.U.Leuven
+ ESAT-SISTA 
+ Kasteelpark Arenberg 10 
+ B-3001 Leuven-Heverlee  
+ Belgium  
+ Tel : +32 16 32 18 84
+ Fax : +32 16 32 19 70

From arek@ebi.ac.uk Fri Jan 19 09:58:45 2001 Date: Fri, 19 Jan 2001 09:58:45 +0000 (GMT) From: Arek Kasprzyk arek@ebi.ac.uk Subject: [Bioperl-l] Re: [Fwd: Re: marker manipulation in bioperl]
On Fri, 19 Jan 2001, Heikki Lehvaslaiho wrote:


Hi guys,
I have not been following this discussion very closely but
thought you may find useful to poke around a set of ensembl modules which
called ensembl-map. I think that some of the ideas you are talking
about have been implemented there. 

Arek




 
> -------- Original Message --------
> Subject: Re: marker manipulation in bioperl
> Date: Thu, 18 Jan 2001 13:06:26 -0500 (EST)
> From: Jason Stajich <jason@chg.mc.duke.edu>
> To: Heikki Lehvaslaiho <heikki@ebi.ac.uk>
> CC: Eric Snyder <SnyderEE@pbrc.edu>
> 
> Heikki - yes I think going via Variation::VariantI is a good way - I
> am 
> not as familiar as I'd like to be with the Variation objects, but this
> makes sense and I could imagine actually having ways to handle alleles
> later on which might become useful.  
> 
> I'd still like to have an interface describe a Marker so we can do
> some
> fun inheritance things later with different types of markers.  So I'd
> make
> a MarkerI and it would subclasses VariantI and add the methods
> pcr_fwd,
> pcr_rev (or a more appropriate function name).
> 
> Eric [ might want to read below first ] does the OO stuff make sense
> here?
> If we make MarkerI with basic methods pcrprimers, chrom, sequence
> location
> then a concrete implementation of this can be GenericMarker, and
> various
> subclasses - RhMarker, STSMarker, MicrosatteliteMarker or
> GeneticMarker,
> RhMarker, ... depending on how you want to describe them.  If they
> have
> specific attributes or methods that are particular to that type of
> marker.
> 
> Then on the Maps front, something like a 
> LinkageMap could be then build using GeneticMarkers or STSMarkers
> as they implemented a function like get_genetic_location... or
> get_location('cM');
> 
> Am I too far out there in interface land for you?
> 
> -jason
> On Thu, 18 Jan 2001, Heikki Lehvaslaiho wrote:
> > 
> > Jason,
> > 
> > I finally found my notes on upgrading the Ensembl Variation class.  
> > The problem there is that the SNP with an ID can have several
> > locations in a genome. At the moment when several locations are needed
> > I simply return several Variation objects with same ID. Not very
> > pretty, but the interface requires me to return SeqFeature objects not
> > something that contains them.
> > 
> > So, your needs. You said that you need the following methods:
> > 
> > fwd_primer, rev_primer, length, genetic_location, marker_sequence
> > 
> > The following lists where they could go (+) are are already in
> > Variation
> > classes(%) :
> > 
> > Bio::Variation::VariantI
> >  subclassed by DNAMutation, RNAChange, AAChange
> > 
> > + fwd_primer, (moltype not protein)
> > + rev_primer, (moltype not protein)
> > % length,
> > % add_DBLink
> > % each_DBLink
> > % status
> > 
> > Bio::Variation::SeqDiff (VariantI holder class)
> > % chromosome
> > + genetic_location, (for strings like 12p13.3 )
> > 
> > Bio::Variation::Allele
> > 	isa Bio::PrimarySeq
> > % marker_sequence
> > 	->seq
> > 	has additional methods repeat_unit and repeat_count
> > 	to describe the sequence: e.g. (CA)5
> > 
> > 
> > Separately, these are the methods that I have in Variation:
> > 
> > Bio::Ensembl::ExternalData::Variation
> > -------------------------------------
> >  same inheritance as in VariantI
> > 
> > in addition:
> > 
> > start_in_clone_coord
> > end_in_clone_coord
> > (status)
> > alleles	    (string as opposed to Allele object in VariantI)
> > (upStreamSeq) (same as in VariantI)
> > (dnStreamSeq) (same as in VariantI)
> > 
> > 
> > So, it seems to me almost everything can be accomodated within
> > VariantI implementing objects.
> > 
> > Do you want to say if marker is defined on DNA or RNA? 
> > moltype method?
> > What additional methods you can think of having?
> > 
> > 
> > It might be enough just to have a 
> > Bio::Variation::Marker class (isa Bio::Variation::VariantI)
> > add 
> >   + fwd_primer, (moltype not protein)
> >   + rev_primer, (moltype not protein)
> > into Bio::Variation::VariantI
> > 
> > and have method for genetic_location and override status method to
> > accept
> > any scalar (it is now restricted to values 'suspected'/'proven').  It
> > might
> > be a good idea to have a separate chromosome method a la GenBank/EMBL?
> > 
> > + chromosom
> > + genetic_location
> > + status
> > 
> > You could use Allele class and VariantI method to manipulate the
> > sequence
> > data of you could come up with a simplier implementation or interface.
> > 
> > What do you think?
> > 
> > Yours,
> > 
> > 	-Heikki
> > 
> > 
> > 
> > Jason Stajich wrote:
> > > 
> > > I won't be writing anything substantial until holidays are over, I have
> > > just been thinking about this and had some time to play last week as
> > > things were slow for me.  I guessed you would have some ideas and insight.
> > > Let's see if we start coming up with an interface or extensions to
> > > VariationI after Jan 1st.
> > > 
> > > Happy holidays.
> > > -jason
> > > 
> > > On Sat, 23 Dec 2000, Heikki Lehvaslaiho wrote:
> > > 
> > > > Hi Jason,
> > > >
> > > > Sorry I have not answered. I am on holiday and Christmas is in a day
> > > > or two.
> > > >
> > > >
> > > > Jason Stajich wrote:
> > > > >
> > > > > I'm trying to write some code that allows me to manipulate marker
> > > > > information (SNPs, Microsattelites, STS).  Thought it might be a useful
> > > > > bioperl object.  Right now I want to associate the following data with a
> > > > > marker name - fwd_primer, rev_primer, length, genetic_location,
> > > > > marker_sequence.  I am also querying GDB, genbank, and local databases for
> > > > > this and thought it would make sense to create a reusable object.  Does
> > > > > any/all of this fit into any of the Variation modules?  I feel like if
> > > >
> > > > It fits fine. You could also have a look what I have put into
> > > > ensembl-external as a Variation class. That is a gough and dirty class
> > > > for holding SNP information.
> > > >
> > > > I have plans somewhere to extend it .... (I can not find the text I
> > > > wrote...have to look with more time in my hands.... )
> > > >
> > > > > there isn't one already this should somehow fall into the Variation
> > > > > category.  I have already written many throw away scripts to manipulate
> > > > > the information, but it seems to me that this should be a object.  I can
> > > > > relate the information to physical sequence via blast and the
> > > > > marker_sequence or e-PCR and the primers, but often I might want to
> > > > > process the markers for something else.
> > > > >
> > > > > Bio::Variation::GeneticMarker?  A SNP would be a sequence change, but also
> > > > > a marker ... I imagine this working on multiple levels - sequence, maps,
> > > > > etc.
> > > >
> > > > I think we should see what could be put into a interface file and what
> > > > into an istantiable class.
> > > >
> > > > Bio::Variation::MarkerI
> > > > Bio::Variation::Marker
> > > >
> > > > Altenatively, Bio::Variation::VariationI is already there and can me
> > > > extended.
> > > >
> > > > I have to go...
> > > > Are you going to do write this right now or can we think about this
> > > > over the holidays?
> > > >
> > > >       -Heikki
> > > >
> > > > > Jason Stajich
> > > > > jason@chg.mc.duke.edu
> > > > > Center for Human Genetics
> > > > > Duke University Medical Center
> > > > > http://www.chg.duke.edu/
> > > >
> > > > --
> > > > ______ _/      _/_____________________________________________________
> > > >       _/      _/                      http://www.ebi.ac.uk/mutations/
> > > >      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
> > > >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> > > >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> > > >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> > > >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> > > > ___ _/_/_/_/_/________________________________________________________
> > > >
> > > 
> > > Jason Stajich
> > > jason@chg.mc.duke.edu
> > > Center for Human Genetics
> > > Duke University Medical Center
> > > http://www.chg.duke.edu/
> > 
> > -- 
> > ______ _/      _/_____________________________________________________
> >       _/      _/                      http://www.ebi.ac.uk/mutations/
> >      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
> >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> > ___ _/_/_/_/_/________________________________________________________
> > 
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/
> 

-------------------------------------------------------------------------------
Dr Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton, 
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
-------------------------------------------------------------------------------



From heikki@ebi.ac.uk Fri Jan 19 14:05:14 2001 Date: Fri, 19 Jan 2001 14:05:14 +0000 From: Heikki Lehvaslaiho heikki@ebi.ac.uk Subject: [Bioperl-l] Re: [Fwd: Re: marker manipulation in bioperl]
Arek Kasprzyk wrote:
> 
> On Fri, 19 Jan 2001, Heikki Lehvaslaiho wrote:
> 
> Hi guys,
> I have not been following this discussion very closely but
> thought you may find useful to poke around a set of ensembl modules which
> called ensembl-map. I think that some of the ideas you are talking
> about have been implemented there.

The URl is:

http://www.ensembl.org/cgi-bin/cvsweb/cvsweb.cgi/ensembl-map/modules/Bio/EnsEMBL/Map/

	-Heikki

> Arek
> 
> 
> > -------- Original Message --------
> > Subject: Re: marker manipulation in bioperl
> > Date: Thu, 18 Jan 2001 13:06:26 -0500 (EST)
> > From: Jason Stajich <jason@chg.mc.duke.edu>
> > To: Heikki Lehvaslaiho <heikki@ebi.ac.uk>
> > CC: Eric Snyder <SnyderEE@pbrc.edu>
> >
> > Heikki - yes I think going via Variation::VariantI is a good way - I
> > am
> > not as familiar as I'd like to be with the Variation objects, but this
> > makes sense and I could imagine actually having ways to handle alleles
> > later on which might become useful.
> >
> > I'd still like to have an interface describe a Marker so we can do
> > some
> > fun inheritance things later with different types of markers.  So I'd
> > make
> > a MarkerI and it would subclasses VariantI and add the methods
> > pcr_fwd,
> > pcr_rev (or a more appropriate function name).
> >
> > Eric [ might want to read below first ] does the OO stuff make sense
> > here?
> > If we make MarkerI with basic methods pcrprimers, chrom, sequence
> > location
> > then a concrete implementation of this can be GenericMarker, and
> > various
> > subclasses - RhMarker, STSMarker, MicrosatteliteMarker or
> > GeneticMarker,
> > RhMarker, ... depending on how you want to describe them.  If they
> > have
> > specific attributes or methods that are particular to that type of
> > marker.
> >
> > Then on the Maps front, something like a
> > LinkageMap could be then build using GeneticMarkers or STSMarkers
> > as they implemented a function like get_genetic_location... or
> > get_location('cM');
> >
> > Am I too far out there in interface land for you?
> >
> > -jason
> > On Thu, 18 Jan 2001, Heikki Lehvaslaiho wrote:
> > >
> > > Jason,
> > >
> > > I finally found my notes on upgrading the Ensembl Variation class.
> > > The problem there is that the SNP with an ID can have several
> > > locations in a genome. At the moment when several locations are needed
> > > I simply return several Variation objects with same ID. Not very
> > > pretty, but the interface requires me to return SeqFeature objects not
> > > something that contains them.
> > >
> > > So, your needs. You said that you need the following methods:
> > >
> > > fwd_primer, rev_primer, length, genetic_location, marker_sequence
> > >
> > > The following lists where they could go (+) are are already in
> > > Variation
> > > classes(%) :
> > >
> > > Bio::Variation::VariantI
> > >  subclassed by DNAMutation, RNAChange, AAChange
> > >
> > > + fwd_primer, (moltype not protein)
> > > + rev_primer, (moltype not protein)
> > > % length,
> > > % add_DBLink
> > > % each_DBLink
> > > % status
> > >
> > > Bio::Variation::SeqDiff (VariantI holder class)
> > > % chromosome
> > > + genetic_location, (for strings like 12p13.3 )
> > >
> > > Bio::Variation::Allele
> > >     isa Bio::PrimarySeq
> > > % marker_sequence
> > >     ->seq
> > >     has additional methods repeat_unit and repeat_count
> > >     to describe the sequence: e.g. (CA)5
> > >
> > >
> > > Separately, these are the methods that I have in Variation:
> > >
> > > Bio::Ensembl::ExternalData::Variation
> > > -------------------------------------
> > >  same inheritance as in VariantI
> > >
> > > in addition:
> > >
> > > start_in_clone_coord
> > > end_in_clone_coord
> > > (status)
> > > alleles         (string as opposed to Allele object in VariantI)
> > > (upStreamSeq) (same as in VariantI)
> > > (dnStreamSeq) (same as in VariantI)
> > >
> > >
> > > So, it seems to me almost everything can be accomodated within
> > > VariantI implementing objects.
> > >
> > > Do you want to say if marker is defined on DNA or RNA?
> > > moltype method?
> > > What additional methods you can think of having?
> > >
> > >
> > > It might be enough just to have a
> > > Bio::Variation::Marker class (isa Bio::Variation::VariantI)
> > > add
> > >   + fwd_primer, (moltype not protein)
> > >   + rev_primer, (moltype not protein)
> > > into Bio::Variation::VariantI
> > >
> > > and have method for genetic_location and override status method to
> > > accept
> > > any scalar (it is now restricted to values 'suspected'/'proven').  It
> > > might
> > > be a good idea to have a separate chromosome method a la GenBank/EMBL?
> > >
> > > + chromosom
> > > + genetic_location
> > > + status
> > >
> > > You could use Allele class and VariantI method to manipulate the
> > > sequence
> > > data of you could come up with a simplier implementation or interface.
> > >
> > > What do you think?
> > >
> > > Yours,
> > >
> > >     -Heikki
> > >
> > >
> > >
> > > Jason Stajich wrote:
> > > >
> > > > I won't be writing anything substantial until holidays are over, I have
> > > > just been thinking about this and had some time to play last week as
> > > > things were slow for me.  I guessed you would have some ideas and insight.
> > > > Let's see if we start coming up with an interface or extensions to
> > > > VariationI after Jan 1st.
> > > >
> > > > Happy holidays.
> > > > -jason
> > > >
> > > > On Sat, 23 Dec 2000, Heikki Lehvaslaiho wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > Sorry I have not answered. I am on holiday and Christmas is in a day
> > > > > or two.
> > > > >
> > > > >
> > > > > Jason Stajich wrote:
> > > > > >
> > > > > > I'm trying to write some code that allows me to manipulate marker
> > > > > > information (SNPs, Microsattelites, STS).  Thought it might be a useful
> > > > > > bioperl object.  Right now I want to associate the following data with a
> > > > > > marker name - fwd_primer, rev_primer, length, genetic_location,
> > > > > > marker_sequence.  I am also querying GDB, genbank, and local databases for
> > > > > > this and thought it would make sense to create a reusable object.  Does
> > > > > > any/all of this fit into any of the Variation modules?  I feel like if
> > > > >
> > > > > It fits fine. You could also have a look what I have put into
> > > > > ensembl-external as a Variation class. That is a gough and dirty class
> > > > > for holding SNP information.
> > > > >
> > > > > I have plans somewhere to extend it .... (I can not find the text I
> > > > > wrote...have to look with more time in my hands.... )
> > > > >
> > > > > > there isn't one already this should somehow fall into the Variation
> > > > > > category.  I have already written many throw away scripts to manipulate
> > > > > > the information, but it seems to me that this should be a object.  I can
> > > > > > relate the information to physical sequence via blast and the
> > > > > > marker_sequence or e-PCR and the primers, but often I might want to
> > > > > > process the markers for something else.
> > > > > >
> > > > > > Bio::Variation::GeneticMarker?  A SNP would be a sequence change, but also
> > > > > > a marker ... I imagine this working on multiple levels - sequence, maps,
> > > > > > etc.
> > > > >
> > > > > I think we should see what could be put into a interface file and what
> > > > > into an istantiable class.
> > > > >
> > > > > Bio::Variation::MarkerI
> > > > > Bio::Variation::Marker
> > > > >
> > > > > Altenatively, Bio::Variation::VariationI is already there and can me
> > > > > extended.
> > > > >
> > > > > I have to go...
> > > > > Are you going to do write this right now or can we think about this
> > > > > over the holidays?
> > > > >
> > > > >       -Heikki
> > > > >
> > > > > > Jason Stajich
> > > > > > jason@chg.mc.duke.edu
> > > > > > Center for Human Genetics
> > > > > > Duke University Medical Center
> > > > > > http://www.chg.duke.edu/
> > > > >
> > > > > --
> > > > > ______ _/      _/_____________________________________________________
> > > > >       _/      _/                      http://www.ebi.ac.uk/mutations/
> > > > >      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
> > > > >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> > > > >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> > > > >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> > > > >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> > > > > ___ _/_/_/_/_/________________________________________________________
> > > > >
> > > >
> > > > Jason Stajich
> > > > jason@chg.mc.duke.edu
> > > > Center for Human Genetics
> > > > Duke University Medical Center
> > > > http://www.chg.duke.edu/
> > >
> > > --
> > > ______ _/      _/_____________________________________________________
> > >       _/      _/                      http://www.ebi.ac.uk/mutations/
> > >      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
> > >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> > >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> > >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> > >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> > > ___ _/_/_/_/_/________________________________________________________
> > >
> >
> > Jason Stajich
> > jason@chg.mc.duke.edu
> > Center for Human Genetics
> > Duke University Medical Center
> > http://www.chg.duke.edu/
> >
> 
> -------------------------------------------------------------------------------
> Dr Arek Kasprzyk
> EMBL-European Bioinformatics Institute.
> Wellcome Trust Genome Campus, Hinxton,
> Cambridge CB10 1SD, UK.
> Tel: +44-(0)1223-494606
> Fax: +44-(0)1223-494468
> -------------------------------------------------------------------------------

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________

From jason@chg.mc.duke.edu Fri Jan 19 15:00:46 2001 Date: Fri, 19 Jan 2001 10:00:46 -0500 (EST) From: Jason Stajich jason@chg.mc.duke.edu Subject: [Bioperl-l] Bio::Index::Abstract & bug #860
Looking through this bug - I had 'fixed' it by adding 
use DB_File; at the top, but now I realize that may not be the best 
since it still causes an error when -type is specified as 'SDBM_File'.
Could just add both in the 'use' but what if DB_File is not present... 

The code for the method dbm_package assumes that if you specify a package
it will have been already 'included'.  What to do...  Try and require both
in the BEGIN block so they are explictly loaded no matter what?  Trap
errors if DB_file is not present and user asks for it?

>From Bio::Index::Abstract

sub dbm_package {
    my( $self, $value ) = @_;
    
    if ($value) {
        $self->{'_dbm_package'} = $value;
    }
    elsif (! $self->{'_dbm_package'}) {
        if ($USE_DBM_TYPE) {
            $self->{'_dbm_package'} = $USE_DBM_TYPE;
        } else {
            my( $type );
            # DB_File isn't available on all systems
            eval {
                require DB_File;
                DB_File->import("$DB_HASH");
            };
            if ($@) {
                require SDBM_File;
                $type = 'SDBM_File';
            } else {
                $type = 'DB_File';
            }
            $USE_DBM_TYPE = $self->{'_dbm_package'} = $type;
        }
    }
    return $self->{'_dbm_package'};
}



Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 


---------- Forwarded message ----------
Date: Fri, 19 Jan 2001 09:55:29 +0000 (GMT)
From: K Howe <klh@sanger.ac.uk>
To: Jason Stajich <jason@chg.mc.duke.edu>
Subject: Re: biperl bug #860


Hi Jason,

We use the following command:

bpindex.pl -fmt EMBL -dir /nfs/disk92/Pfam/index -type DB_File
pfamseq.index <embl flat file>

where /nfs/disk92/Pfam/index is the intended location of the index file,
and pfamseq.index is the name of it. The key thing is that we explicitly
give the type as DB_File, and when this happens, it dies (when you don't
specify type, and it has to make a guess as to which dbm type to use, it
works, but this is not scalalble for us, since the default dmb file in
bioperl may change from DB_File in the future).

Hope this is enough information.

Best,

Kevin

On Thu, 18 Jan 2001, Jason Stajich wrote:

> Kevin - I'm trying to track down a bug you submitted for
> Bio::Index::Abstract - I may have fixed it, but I want to be sure.  Can
> you give me an example of how to invoke bpfetch/bpindex so to throw an
> error due to a potentially missing require.
> 
> Thanks.
> -Jason




From birney@ebi.ac.uk Fri Jan 19 16:51:12 2001 Date: Fri, 19 Jan 2001 16:51:12 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [Bioperl-l] Bio::Index::Abstract & bug #860
On Fri, 19 Jan 2001, Jason Stajich wrote:

> Looking through this bug - I had 'fixed' it by adding 
> use DB_File; at the top, but now I realize that may not be the best 
> since it still causes an error when -type is specified as 'SDBM_File'.
> Could just add both in the 'use' but what if DB_File is not present... 
> 
> The code for the method dbm_package assumes that if you specify a package
> it will have been already 'included'.  What to do...  Try and require both
> in the BEGIN block so they are explictly loaded no matter what?  Trap
> errors if DB_file is not present and user asks for it?

Go for a require run-time load....

check out pSW.pm for an example or the SeqIO.pm for another run-time load.



> 
> >From Bio::Index::Abstract
> 
> sub dbm_package {
>     my( $self, $value ) = @_;
>     
>     if ($value) {
>         $self->{'_dbm_package'} = $value;
>     }
>     elsif (! $self->{'_dbm_package'}) {
>         if ($USE_DBM_TYPE) {
>             $self->{'_dbm_package'} = $USE_DBM_TYPE;
>         } else {
>             my( $type );
>             # DB_File isn't available on all systems
>             eval {
>                 require DB_File;
>                 DB_File->import("$DB_HASH");
>             };
>             if ($@) {
>                 require SDBM_File;
>                 $type = 'SDBM_File';
>             } else {
>                 $type = 'DB_File';
>             }
>             $USE_DBM_TYPE = $self->{'_dbm_package'} = $type;
>         }
>     }
>     return $self->{'_dbm_package'};
> }
> 
> 
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/ 
> 
> 
> ---------- Forwarded message ----------
> Date: Fri, 19 Jan 2001 09:55:29 +0000 (GMT)
> From: K Howe <klh@sanger.ac.uk>
> To: Jason Stajich <jason@chg.mc.duke.edu>
> Subject: Re: biperl bug #860
> 
> 
> Hi Jason,
> 
> We use the following command:
> 
> bpindex.pl -fmt EMBL -dir /nfs/disk92/Pfam/index -type DB_File
> pfamseq.index <embl flat file>
> 
> where /nfs/disk92/Pfam/index is the intended location of the index file,
> and pfamseq.index is the name of it. The key thing is that we explicitly
> give the type as DB_File, and when this happens, it dies (when you don't
> specify type, and it has to make a guess as to which dbm type to use, it
> works, but this is not scalalble for us, since the default dmb file in
> bioperl may change from DB_File in the future).
> 
> Hope this is enough information.
> 
> Best,
> 
> Kevin
> 
> On Thu, 18 Jan 2001, Jason Stajich wrote:
> 
> > Kevin - I'm trying to track down a bug you submitted for
> > Bio::Index::Abstract - I may have fixed it, but I want to be sure.  Can
> > you give me an example of how to invoke bpfetch/bpindex so to throw an
> > error due to a potentially missing require.
> > 
> > Thanks.
> > -Jason
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------


From hlapp@gmx.net Fri Jan 19 19:13:57 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Fri, 19 Jan 2001 11:13:57 -0800 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: Message-ID: <3A6891F5.A0C2BEBE@gmx.net> Ewan Birney wrote: > > Ok. Hilmar and I are now probably into the "code aesthetics" part of this > debate, which definitely is worth having but someone sometime has to make > a decision. > > I suggest that we keep bashing this out on the list for a couple more days > (please... other people... if you have a view, do chip in). If Hilmar and > I are still disagreeing with aesthetics I would like to nominate Jason to > tie-break on the way to go (is this ok with you Hilmar and Jason...?) > Jason, you're going to play the Supreme Court judge here (no appeals possible) :-) In fact, I'd like to hear more feedback from actual users of these features. It seems that most people are happy if only those special GenBank features no longer get completely lost. However, there are people who do want to do meaningful stuff with the coordinates. One of these is our group in Vienna (yes, we draw features, and yes, that adds to my concern). The other I know of is David with his GUI, which is why I put him on cc. David, any strong or weak feelings about this issue from your perspective? The BioJava project came up, as far as I can recall, with a Location class model separate from the Feature class. I put Matthew and Thomas on the cc to ask for their experience with this model, and what the feedback from the biojava community was so far. > We have two points of contention: > > (a) Explicit Location objects or not. > > Hilmar suggests an explicit location object > > SeqFeatureI has-a LocationI > > LocationI is sub classed for Split (join statements) and Fuzzies > > Benefits - (a) easy to mix and match implementations of locations to > different feature objects, and (b) if mix and matching locations to > features is common, more realisatic. Hilmar argues that is clearer as > well. > > Against - more objects and infact the majority of seqfeatures are little > more than the location, and two extra strings. > > For backwards compatibility, I think SeqFeatureI->start would *have* to be > delegated to SeqFeatureI->location->start - otherwise too much code will > break... (of course, this delegation could just be for a while as we move > code and people over to using "proper" locations) > I agree completely here. I even think $feature->start() can stay there forever. > People might be interested that I originally argued for an explicit > location object about 1 month ago. I don't now... > > I am suggesting that SeqFeatures do not have an explicit location object, > but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting > >from a common SeqFeature interface > > Benefits - (a) less objects (b) only one place where the client gets the > information and (c) more backwardly compatible. > I'd like to note here that 'less objects' is not a benefit by itself, unless loading modules imposes a significant run-time performance hit, which I think we agree it doesn't. Having less objects I think does constitute a benefit if it removes redundant definitions, and makes for a steeper learning curve of the API, that is, if they're easier to use. This is the point I doubt here: I think further inflating SeqFeatureI flattens the learning curve. And I think Location (where) and Feature (what) are not redundant. As for the backward compatibility, I think the only problem here is the exception yes/no issue, isn't it? So, backward compatibility does not argue against decoupling Location/Feature, does it? > Effectively my main argument is that there will always be a pretty clear > cut relationship that "this type of SeqFeature" is always "this class of > location" so the splitting of the location away from the SeqFeature is > just suggesting a mix-and-match world which doesn't actually exist. It does exist. It may not be the most frequent case, but it is a use case for us. And probably for everyone who draws features. > Simpler and stronger to go for the combined interface in my view. > > (b) ->start ->end throwing exceptions or not. > > Hilmar says that for at least Fuzzies and possibly Splits the client > should figure out by rooting around the object how to map these more > complex locations to a simple start,end. The interface should allow > exceptions to be thrown on ->start/->end indicating that the client should > be treating this seqfeature somehow differently... > > Basically we pass the buck to the client. > Right. And I said that's where it belongs. > I say that the implementation objects have to provide a default mapping > of whatever ->start and ->end are. This means that clients can live in > this happy world of "I have well defined start/ends" if they so wish > without writing extra code. Smart clients are encouraged to root around in > the objects for their "real" interpretation of the fuzziness. > > There are three reasons why I favour this: > > (a) Clients for dumping/drawing/manipulation have to treat large > numbers of sequence features as a pretty homogeneous mass. If we make > seqfeatures less homogeneous then every client is going to have to figure > out how to "homogenize" the seqfeatures - this will be different client to > client although for the main case they just want a "default way" of > handling them. We are encouraging a diversity of views when our clients > really want us to solve the problems for them. > This can be solved easily. For FuzzyLocation we implement a default way of computing valid start/end, which can be activated (globally) by client code. (I hear you saying if we do it this way it should be activated by default :-) > (b) as 99% of features are nice, well behaved "hard features" many > pieces of client code written with the bioperl libaries will just assumme > ->start,->end do not throw exceptions. When this piece of code is used by > another user with a fuzzy feature, there will be a rather deep exception > thrown by bioperl through the client code. I think both the user and the > client with some justification will blame bioperl for this, no matter how > much we say "you should have read the documentation and written 3 > different subroutines to replace every time you go > > if( $one->start == $two->start ) > > gets replaced by > > if( &my_exact_function($one,$two) ) { > > } > > ... > > sub my_exact_function { > > # one of many if statements... > > if( $one->isa('Bio::FuzzyFeatureI') && > $two->isa('Bio::SimpleFeatureI') { > ... > > } > > } > This can be accomplished much simpler: if($user_prefs{"fuzzyLocs"} eq "simplifyToWidest") { $loc1 = $feat_one->location(); $range = new Bio::Range(-start => $loc1->min_start(), -end => $loc1->max_end()); $feat_one->location($range); # same for $feat_two follows ... } # carry on as if there were no fuzzy etc features # and you're safe from exceptions > (c) long experience with seqfeatures has made me claim that the > following rules are generally just what people want: > > - simple features - easy > > - join statements - ignore leading and trailing '<' '>' and take the > edge start/end points on the sequence you are looking at > > - fuzzy features - either skip or - if you have to draw/compare them, > take start/end as the min hard location mentioned and the maximum hard > location mentioned, irregardless of the internal grammar. > > I reckon bioperl will be better to implement the (c) method by default > without preventing smart clients from making their own decisions. > Well, I think you can have a full model and still always provide simple implementations satisfying most people's use cases (to be activated by client code, or activated by default, I think that's a matter of taste). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From td2@sanger.ac.uk Fri Jan 19 19:45:44 2001 From: td2@sanger.ac.uk (Thomas Down) Date: Fri, 19 Jan 2001 19:45:44 +0000 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... In-Reply-To: <3A6891F5.A0C2BEBE@gmx.net>; from hlapp@gmx.net on Fri, Jan 19, 2001 at 11:13:57AM -0800 References: <3A6891F5.A0C2BEBE@gmx.net> Message-ID: <20010119194544.F9203@jabba.sanger.ac.uk> On Fri, Jan 19, 2001 at 11:13:57AM -0800, Hilmar Lapp wrote: > > The BioJava project came up, as far as I can recall, with a > Location class model separate from the Feature class. I put > Matthew and Thomas on the cc to ask for their experience with this > model, and what the feedback from the biojava community was so > far. Yes, we have this approach (well, strictly speaking we have a Location interface plus various implementation). It's worked pretty well for us so far -- any type of feature can have any type of location attached to it (point, range, compound), and it's efficient in terms of memory usage. We've also found that the Location objects can be kind-of useful on their own -- I've got all sorts of scripts which use bare Locations for tracking coverage, or even keeping track of available space when working out an optimal GUI layout. I don't know exactly how this experience would translate into your design, though. > > People might be interested that I originally argued for an explicit > > location object about 1 month ago. I don't now... > > > > I am suggesting that SeqFeatures do not have an explicit location object, > > but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting > > >from a common SeqFeature interface The only potential consideration is that this then makes any further polymorphism of SeqFeature quite difficult. We're experimenting with polymorphic features in BioJava -- look at the org.biojava.bio.seq.genomic package for lots of useful sub-interfaces of Feature. If you are thinking of ever going down this route, beware the possible explosion of combinations of feature type and location type. > > Benefits - (a) less objects (b) only one place where the client gets the > > information and (c) more backwardly compatible. > > I'd like to note here that 'less objects' is not a benefit by > itself, unless loading modules imposes a significant run-time > performance hit, which I think we agree it doesn't. Having less > objects I think does constitute a benefit if it removes redundant > definitions, and makes for a steeper learning curve of the API, > that is, if they're easier to use. This is the point I doubt here: > I think further inflating SeqFeatureI flattens the learning curve. > And I think Location (where) and Feature (what) are not redundant. Actually, my understanding is that the per-object overhead in perl is pretty high, especially for objects implemented as hashes. If you ever want to hold millions of SeqFeatures in memory (a not unreasonable requirement, I'd suggest), a few hundred bytes per location might come back with a vengence. Of course, this can probably be mitigated by implementing the locations as C structs. Is this approach currently being used in BioPerl? So I'm going to be inconclusive. I like the seeparate Locations design, but I'd suggest investigating the memory-usage issues before deciding one way or the other. Thomas. From lapp@gnf.org Fri Jan 19 20:53:00 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Fri, 19 Jan 2001 12:53:00 -0800 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: <3A6891F5.A0C2BEBE@gmx.net> <20010119194544.F9203@jabba.sanger.ac.uk> Message-ID: <3A68A92C.74811DA3@gnf.org> Thomas Down wrote: > > Actually, my understanding is that the per-object overhead in > perl is pretty high, especially for objects implemented as > hashes. If you ever want to hold millions of SeqFeatures in > memory (a not unreasonable requirement, I'd suggest), a few > hundred bytes per location might come back with a vengence. Hmm. I guess I can't make a sensible comment on this. Anyone else out there who has experienced a performance drawback imposed by Perl's object handling (well, I know in fact it's not objects Perl handles ...)? If this problem is real, any chances this will be mitigated in upcoming Perl releases (5.6? 6.0?)? In general I hate having to adapt an object model to the limitations of a language ... :( > > Of course, this can probably be mitigated by implementing the > locations as C structs. Is this approach currently being > used in BioPerl? > Well, given the users on Win32 and Mac this is probably not an option for any module that is somewhat part of the core. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mwilkinson@gene.pbi.nrc.ca Fri Jan 19 20:49:30 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Fri, 19 Jan 2001 14:49:30 -0600 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: <3A6891F5.A0C2BEBE@gmx.net> Message-ID: <3A68A85A.CA49BC31@gene.pbi.nrc.ca> Hi all! > However, there are people who do want to do meaningful stuff with > the coordinates. One of these is our group in Vienna (yes, we draw > features, and yes, that adds to my concern). The other I know of > is David with his GUI, which is why I put him on cc. Hey! Don't forget the primary author of the SeqCanvas GUI :-) If it's okay I have $0.02 to contribute too... > I agree completely here. I even think $feature->start() can stay > there forever. > >snip< > And I think Location (where) and Feature (what) are not redundant. This, to me, is the crux of the argument, and I have to side with Hilmar on this. From a biological perspective, location and feature are absolutely *not* redundant. We are arguing about how to represent something computationally that has not been universally agreed upon even by the geneticists/MolBiologists themselves: What is a gene? I personally think that Hilmars view is more "biologically correct" (tm), that a gene, or more generally a feature, is best described as it was described to me as a first year undergraduate many years ago, "a functional unit of DNA". These "functional units" may be overlapping, even extensively, but if they do not have *exactly* the same function then they should probably be considered entirely different features, rather than a single feature with multiple compositions... (I hope I am not over-interpreting your views, Hilmar...). This single-feature-multiple-function is an absolute nightmare for annotators!! So, in my world view, $Feature->start should only be ambiguous if that *unique functional unit* has a bona fide ambiguous start. In such a case, I would then side with Ewan in his proposal that there should, nevertheless, be a default $Feature->start value for these fuzzy features (NO EXCEPTION THROWING!!), but that they are somehow "flagged" such that smarter clients will be able to easily query these features for their fuzziness and display this fuzziness if they have the ability (interestingly, we just initiated a research project with several CompSci students to investigate how to best visualize exactly these kinds of "fuzzy" or ambiguous situations!!). This was not my primary consideration when I was writing SeqCanvas, but I have already noticed that this module, as it stands, is nowhere near sufficient to represent "reality", and will need to be thought-out from scratch over the next few months as our group trips over these kinds of problems more and more often. (Stay tuned! I intend to re-focus my energies on this code as soon as other more pressing issues are out of the way!) So, w.r.t. SeqCanvas & other GUI's which exist already, I would hope that these are not an issue in this debate! My personal opinion is that BioPerl should make the capturing of biological reality its primary concern and, within reason, leave the problem of parsing and displaying this data to the client; "it's an S.E.P.". If it is generally agreed upon by the community that $Feature->start is no longer an adequate representation of "reality", then it should be dumped, regardless of what parsers may already exist. $Feature->start is not the holy grail, the biological data is. (Personally, I can't imagine a scenario where $Feature->start would no longer be useful... but you probably understand what I am getting at...) > It does exist. It may not be the most frequent case, but it is a > use case for us. And probably for everyone who draws features. indeed, it does exist! And it looks like it will only get worse as we learn more... Anyway, for what it's worth, that's my two bits :-) Cheers all! M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From birney@ebi.ac.uk Fri Jan 19 21:10:16 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 19 Jan 2001 21:10:16 +0000 (GMT) Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... In-Reply-To: <3A68A92C.74811DA3@gnf.org> Message-ID: On Fri, 19 Jan 2001, Hilmar Lapp wrote: Hilmar. I agree with all your statements basically, but I'm still sticking to my claims. I think we need to hear more feedback - I'd be really interested in david's call on the defaultness of ->start not throwing an exception. We then might need to call in the supreme court here... > Thomas Down wrote: > > > > Actually, my understanding is that the per-object overhead in > > perl is pretty high, especially for objects implemented as > > hashes. If you ever want to hold millions of SeqFeatures in > > memory (a not unreasonable requirement, I'd suggest), a few > > hundred bytes per location might come back with a vengence. > > Hmm. I guess I can't make a sensible comment on this. Anyone else out > there who has experienced a performance drawback imposed by Perl's > object handling (well, I know in fact it's not objects Perl handles > ...)? Oh yes ;) Ensembl can trivially generate > 10,000 features in a modest sized query. To get this to happen in any sensible way we have a packed C struct. I would be against inisiting on two objects have to be present. Of course having $seqfeature->location return $self for these cases could really solve it. Therefore this is not a show-stopper for Ensembl, but be aware that if we made this the default for Bioperl we would be doubling our memory for feature-heavy queries, and I suspect suffering for it. > > If this problem is real, any chances this will be mitigated in > upcoming Perl releases (5.6? 6.0?)? In general I hate having to adapt > an object model to the limitations of a language ... :( > > > > > Of course, this can probably be mitigated by implementing the > > locations as C structs. Is this approach currently being > > used in BioPerl? > > > > Well, given the users on Win32 and Mac this is probably not an option > for any module that is somewhat part of the core. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From mdalphin@amgen.com Fri Jan 19 22:27:19 2001 From: mdalphin@amgen.com (Mark Dalphin) Date: Fri, 19 Jan 2001 14:27:19 -0800 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: Message-ID: <3A68BF46.10887F6F@amgen.com> Ewan Birney wrote: > On Fri, 19 Jan 2001, Hilmar Lapp wrote: > > Hilmar. I agree with all your statements basically, but I'm still sticking > to my claims. I think we need to hear more feedback - I'd be really > interested in david's call on the defaultness of ->start not throwing an > exception. > I would rather NOT have an exception thrown. It is too hard to recover from that. In fact, for much code, either the maximum, minimum or average really doesn't matter (for example, on many displays, the same set of pixels will be lit up) and the client program will need to make that choice in anycase. As a "compromise", I wonder about having ->start() return 'a value' in a SCALAR context, basically continuing what it now does, while in an ARRAY context, it might return something like: ($start, $fuzzy_type, $fuzzy_value) = $obj->start(). This way, the client can obtain the data, if desired, or ignore it. I don't know how to defined "fuzzy_type", but it would be something like { '<' | '>' | '.' } representing the NCBI types '<', '>' and '.' (ie extends-5-prime, extends-3-prime and between-two-values). Then $start would be either the lowest, highest or average value and $fuzzy_value would represent the distance around it. (BTW, I tried using the average +- $fuzzy_value and found the code messy. It seems to be tidier to use: $start is the most extreme point (5' or 3') and $fuzzy value brings you inwards). Finally, the code could be written to make a SCALAR $start return one of the above (min, max, avg) OR throw and exception as a parameter. But I don't think it is worth the trouble. Just my $0.02. Mark -- Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work) From mrp@sanger.ac.uk Mon Jan 22 13:07:52 2001 From: mrp@sanger.ac.uk (Matthew Pocock) Date: Mon, 22 Jan 2001 13:07:52 +0000 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: <3A6891F5.A0C2BEBE@gmx.net> Message-ID: <3A6C30A7.842C63AD@sanger.ac.uk> Hi. Just thought I'd have a short inane ramble. Please ignore everything that you don't agree with. I'm realy looking at this more as a user of the libraries rather than an implementer, so things may look different your side of the fence. If you intend to end up with multiple feature implementations and multiple types of locations (point, range, fuzzy etc.) then you should definitely consider composition - Location interface, Feature interface hasA Location. Please don't do things like having FuzzyFeature extends Feature, FuzzyLocation - if Feature must extend Location, then it should be the stupidest extention possible - otherwise people will get realy confused realy quickly. We make a lot of stuff very easy by defining that every Location has min & max that are the lowest and highest index that are within the location. If Feature must extend Location, then it's min & max should delegate off to min & max in it's location delegate. These methods should never throw exceptions. If you go for the composition/delegation aproach, then it feels wrong to me that Feature extends Location - but there is no reason why the current implementations of Feature shouldn't implement it, or the Feature interface may choose to define min/max (or do you use start/end?) so that the legacy code runs. If you go for Location & Feature, the hierachy of features should represent the semantic knowledge about what you are annotating, and the (potential) location hieracy hanging off a feature should be shallow - just pertain to that feature only. Locations are stupid math objects. For example, if you have a gene feature, it's location should span the entire gene area, where as the feature may only contain child exon features that span part of that region. Otherwise, you end up with two hierachies that look nearly exactly the same as each other & life gets confusing. It works well for us putting strand info in features and leaving locations a-directional. Strand stuff requires semantic knowledge (you need context), and that belongs in features - they represent the biological information. Horible EMBL locations that reference other sequences could be handled with complicated sequence/featre/location implementations/interfaces - or - you could just build an assembly of the two entries and project the feature into assembly-space to get out something that you can represent cleanly. I don't know how well bioperl does assemblies... Anyway, that's it. These are the kind of details that give me the Hammer Hooror tingley spine every time I think about them. Eugh. Embl locations suck. Matthew From dblock@gene.pbi.nrc.ca Mon Jan 22 18:20:16 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Mon, 22 Jan 2001 12:20:16 -0600 (CST) Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... In-Reply-To: <3A6C30A7.842C63AD@sanger.ac.uk> Message-ID: Hello everyone! Just back from Calgary, doing final bits of paperwork to prepare for my defense. After Feb 20, my mind will be a lot clearer! Okay, I just read through everybody's arguments, and since you want my opinion, I'll give it to you. Our pathway to enlightenment here has been that we started with simple cases, then met complex cases and had to tear everything down multiple times to accomodate complexity. So it looks like BioPerl is doing that now with fuzzy locations (which have been tossed around the list for longer than I've been on it). We should bite the bullet and build for posterity. Extensibility is a major priority in this situation, and for that reason, Hilmar wins my vote :) Backwards compatibility- I would like it very much if for simple cases, a simple location object was by default created. A complex location object should only be created when complex location input is given. Then the familiar start, end notation would refer to the default simple location object. I like the idea of some sort of global environment-type variable that would set the policy for fuzzy instances. A well-documented default would be fine here as well. What Workbench would do would be to use the default behaviour (widest, probably) for fuzzy locations, and then when details were requested, would show that fuzziness at the base-pair level. So it would be great if start, end returned hard locations according to some policy that could be defined (at object creation?), and details would be returned only when requested. In that case, could location be an optional object, only created when needed? So start, end would return numbers, either hard numbers given to them at creation, or numbers computed by a location object. A different call ($feature->detailedstart or something) would call $feature->start if there was no more info on the location, and would call the location object otherwise. This could then return whatever array or hash we decide on. That would take care of the memory concerns (we create a lot of objects with Workbench as well), since in most cases, the start/end pair would be all that was stored. The complexities could be handled whenever the client desired complexity. Would it be necessary to flag objects that have detailed location information? Well, that's a simple check for the presence of a LocationI object attached to the SeqFeature object. Okay, there's my opinion. Let me know what you think. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From birney@ebi.ac.uk Mon Jan 22 20:13:06 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Mon, 22 Jan 2001 20:13:06 +0000 (GMT) Subject: [Bioperl-l] conceeding to has-a location Message-ID: Ok. It looks like I have to conceed the has-a location, as long as I am allowed to return $self for C extensions for ensembl ;) I think I have "won" on the no exception throwing (???) Jason/Hilmar - what do you think? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Mon Jan 22 21:07:20 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 22 Jan 2001 16:07:20 -0500 (EST) Subject: [Bioperl-l] conceeding to has-a location In-Reply-To: Message-ID: I think the discussion generated a number of good points. I was ambivalent about separating the Location initially, but I can definitely see a advantages to has-a location now as well. So I agree that this model seems the best. I don't know if Object creation penalties will come back to haunt us, but this model seems the most biologically applicable. As for exception throwing, in the simple case no exceptions thrown ie everything the bioperl currently supports. If we want to later on define a structure for delegating start/end calculation (DetermineStartEndFromFuzzyLocationAdaptor) then maybe we can do that and exceptions could be thrown by that model. However, in the current model I'd like to rely on start/end to be callable even if it is delegating to the Location object and thus no exceptions at this time. Are we going to end up ripping this out and rewriting again? I will update the wiki text to reflect these agreements and we can see where we stand. I'm hoping we can have a reasonable agreement by the Thurs so the coding can begin. -Jason On Mon, 22 Jan 2001, Ewan Birney wrote: > > Ok. It looks like I have to conceed the has-a location, as long as I am > allowed to return $self for C extensions for ensembl ;) > > > I think I have "won" on the no exception throwing (???) > > > Jason/Hilmar - what do you think? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Tue Jan 23 09:27:47 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 23 Jan 2001 01:27:47 -0800 Subject: [Bioperl-l] Feature/Location References: Message-ID: <3A6D4E93.DC069A8A@gmx.net> Ewan Birney wrote: > > Ok. It looks like I have to conceed the has-a location, as long as I am > allowed to return $self for C extensions for ensembl ;) > > I think I have "won" on the no exception throwing (???) > I think it's BioPerl that won -- by all the feedback we got. We can have more confidence now that it makes some sense what we code. Thanks to everyone, and sorry for forgetting you, Mark, I'm glad you stepped in without being asked. I see that exception throwing in ->start()/end() is not the best idea for many applications. In a sense the situation may be similar to SeqIO, where we now have client-controllable severity level of putative format violations (which in fact mostly are BioPerl incapabilities). So, we can design the start/end implementation along a client-controllable policy, with a relaxed default. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 23 15:41:52 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 23 Jan 2001 10:41:52 -0500 (EST) Subject: [Bioperl-l] Feature/Location In-Reply-To: <3A6D4E93.DC069A8A@gmx.net> Message-ID: On Tue, 23 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > Ok. It looks like I have to conceed the has-a location, as long as I am > > allowed to return $self for C extensions for ensembl ;) > > > > I think I have "won" on the no exception throwing (???) > > > > I think it's BioPerl that won -- by all the feedback we got. We > can have more confidence now that it makes some sense what we > code. Thanks to everyone, and sorry for forgetting you, Mark, I'm > glad you stepped in without being asked. > > I see that exception throwing in ->start()/end() is not the best > idea for many applications. In a sense the situation may be > similar to SeqIO, where we now have client-controllable severity > level of putative format violations (which in fact mostly are > BioPerl incapabilities). So, we can design the start/end > implementation along a client-controllable policy, with a relaxed > default. I'll see how that shakes out as we start to look at implementation. Also - should our locations go into a new directory? Interfaces - Bio::Location::LocationI Bio::Location::SplitLocationI Bio::Location::FuzzyLocationI Implementations - Bio::Location::SimpleLocation Bio::Location::SplitLocation Bio::Location::FuzzyLocation I updated the wiki - please feel free to make corrections, clarifications, or to elaborated the interfaces. SplitLocationI will have a method sub_Locations which returns the list of LocationI objects that represent the sub locations of the, well, location. In code terms - # get a $geneobj somehow my $location = $geneobj->location; if( $location->isa('Bio::Location::SplitLocationI') ) { foreach my $exon ( $location->sub_locations() ){ print "exon at ", $exon->start, "..", $exon->end, "\n"; } } One problem with this approach - what if I want to actually have the real Exon object.... Must I instead iterate through what is returned by sub_Features? Does the SeqFeature::GeneStructureI instead handle all of this and I should instead call $geneobj->exons() not touching the Location objects (makes most sense to me). -jason > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From dblock@gene.pbi.nrc.ca Tue Jan 23 15:56:08 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 23 Jan 2001 09:56:08 -0600 (CST) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: > > I updated the wiki - please feel free to make corrections, clarifications, > or to elaborated the interfaces. SplitLocationI will have a method > sub_Locations which returns the list of LocationI objects that represent > the sub locations of the, well, location. In code terms - > > # get a $geneobj somehow > my $location = $geneobj->location; > if( $location->isa('Bio::Location::SplitLocationI') ) { > foreach my $exon ( $location->sub_locations() ){ > print "exon at ", $exon->start, "..", $exon->end, "\n"; > } > } > > One problem with this approach - what if I want to actually have the real > Exon object.... Must I instead iterate through what is returned > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > of this and I should instead call $geneobj->exons() not touching the > Location objects (makes most sense to me). > > -jason > That would be good. Then you could call that exon's location method to get the location object of the exon. So you have two routes to the start/end pair. That sounds good to me. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From birney@ebi.ac.uk Tue Jan 23 16:45:34 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 23 Jan 2001 16:45:34 +0000 (GMT) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: On Tue, 23 Jan 2001, David Block wrote: > > > > I updated the wiki - please feel free to make corrections, clarifications, > > or to elaborated the interfaces. SplitLocationI will have a method > > sub_Locations which returns the list of LocationI objects that represent > > the sub locations of the, well, location. In code terms - > > > > # get a $geneobj somehow > > my $location = $geneobj->location; > > if( $location->isa('Bio::Location::SplitLocationI') ) { > > foreach my $exon ( $location->sub_locations() ){ > > print "exon at ", $exon->start, "..", $exon->end, "\n"; > > } > > } > > > > One problem with this approach - what if I want to actually have the real > > Exon object.... Must I instead iterate through what is returned > > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > > of this and I should instead call $geneobj->exons() not touching the > > Location objects (makes most sense to me). > > > > -jason > > > > That would be good. Then you could call that exon's location method to > get the location object of the exon. So you have two routes to the > start/end pair. That sounds good to me. <> I think we are giving ourselves *alot of rope* to hang ourselves here and we will end up with different conventions about how to descend these objects... But... I guess I should roll with the has-a decision. So... my view here would be that in "stupid" implementations location->sub_locations() give separate location objects, but in "smart" implementations (perhaps bioperl's gene/transcript object?) is gives the same location object as the exon, therefore guarenteeing that whichever route you take to an exon's location, you get the same thing... ie... this is up to the implmentation, and the generic implementation has to be "stupid" I guess.... > > -- > David Block > dblock@gene.pbi.nrc.ca > http://bioinfo.pbi.nrc.ca/dblock/wiki > Plant Biotechnology Institute > National Research Council of Canada > Saskatoon, Saskatchewan > > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 23 18:00:37 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 23 Jan 2001 12:00:37 -0600 (CST) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: On Tue, 23 Jan 2001, Ewan Birney wrote: > On Tue, 23 Jan 2001, David Block wrote: > > > > > > > I updated the wiki - please feel free to make corrections, clarifications, > > > or to elaborated the interfaces. SplitLocationI will have a method > > > sub_Locations which returns the list of LocationI objects that represent > > > the sub locations of the, well, location. In code terms - > > > > > > # get a $geneobj somehow > > > my $location = $geneobj->location; > > > if( $location->isa('Bio::Location::SplitLocationI') ) { > > > foreach my $exon ( $location->sub_locations() ){ > > > print "exon at ", $exon->start, "..", $exon->end, "\n"; > > > } > > > } > > > > > > One problem with this approach - what if I want to actually have the real > > > Exon object.... Must I instead iterate through what is returned > > > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > > > of this and I should instead call $geneobj->exons() not touching the > > > Location objects (makes most sense to me). > > > > > > -jason > > > Okay, for clarity, this only is relevant when there is a SplitLocationI situation, correct? So the implementation of SplitLocationI was going to be an array of simple LocationI's? If not, then what I'm talking about is irrelevant. > > > > That would be good. Then you could call that exon's location method to > > get the location object of the exon. So you have two routes to the > > start/end pair. That sounds good to me. > > <> > > I think we are giving ourselves *alot of rope* to hang ourselves here and > we will end up with different conventions about how to descend these > objects... Different conventions for different situations? What I was talking about was the two different situations: 1) gene drawing, I want to know all the locations that are 'gene' so I can draw them somehow -> sub_locations gives me a list of simple locations that I can iterate through. I don't care about the nature of the exons I am drawing, just that they belong to a gene. 2) exon interrogation, I want to examine each exon individually. Now I want the gene/transcript's exons method to give me each exon. Each of those also has a location. The exon's location method links to the location object that is linked to by the sub_location call, so there is no duplication of data. And if any of these exon locations are fuzzy or split, etc., the location object gives us that. > > But... I guess I should roll with the has-a decision. Yes, you should (hee, hee, we win). So... my view here > would be that in "stupid" implementations location->sub_locations() give > separate location objects, but in "smart" implementations (perhaps > bioperl's gene/transcript object?) is gives the same location object as > the exon, therefore guarenteeing that whichever route you take to an > exon's location, you get the same thing... I think that's what I was thinking too, isn't it? > > > ie... this is up to the implmentation, and the generic implementation has > to be "stupid" I guess.... No comment. > > > > > > > > > > -- > > David Block > > dblock@gene.pbi.nrc.ca > > http://bioinfo.pbi.nrc.ca/dblock/wiki > > Plant Biotechnology Institute > > National Research Council of Canada > > Saskatoon, Saskatchewan > > > > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From jason@chg.mc.duke.edu Tue Jan 23 18:32:21 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 23 Jan 2001 13:32:21 -0500 (EST) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: On Tue, 23 Jan 2001, David Block wrote: > On Tue, 23 Jan 2001, Ewan Birney wrote: > > > On Tue, 23 Jan 2001, David Block wrote: > > > > > > > > > > I updated the wiki - please feel free to make corrections, clarifications, > > > > or to elaborated the interfaces. SplitLocationI will have a method > > > > sub_Locations which returns the list of LocationI objects that represent > > > > the sub locations of the, well, location. In code terms - > > > > > > > > # get a $geneobj somehow > > > > my $location = $geneobj->location; > > > > if( $location->isa('Bio::Location::SplitLocationI') ) { > > > > foreach my $exon ( $location->sub_locations() ){ > > > > print "exon at ", $exon->start, "..", $exon->end, "\n"; > > > > } > > > > } > > > > > > > > One problem with this approach - what if I want to actually have the real > > > > Exon object.... Must I instead iterate through what is returned > > > > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > > > > of this and I should instead call $geneobj->exons() not touching the > > > > Location objects (makes most sense to me). > > > > > > > > -jason > > > > > > Okay, for clarity, this only is relevant when there is a SplitLocationI > situation, correct? So the implementation of SplitLocationI was going to > be an array of simple LocationI's? If not, then what I'm talking about is > irrelevant. No you're right, I imagine it will be a list of LocationI objects at some point. sub_Locations will be a SplitLocationI method. > > > > > > > That would be good. Then you could call that exon's location method to > > > get the location object of the exon. So you have two routes to the > > > start/end pair. That sounds good to me. > > > > <> > > > > I think we are giving ourselves *alot of rope* to hang ourselves here and > > we will end up with different conventions about how to descend these > > objects... I agree, I think this is why you and I were leaning towards collapsing Location into SeqFeature, but I also agree with many of arguments for splitting the 2. > > > Different conventions for different situations? What I was talking about > was the two different situations: > 1) gene drawing, I want to know all the locations that are 'gene' so I can > draw them somehow -> sub_locations gives me a list of simple locations > that I can iterate through. I don't care about the nature of the exons I > am drawing, just that they belong to a gene. > > 2) exon interrogation, I want to examine each exon individually. Now I > want the gene/transcript's exons method to give me each exon. Each of > those also has a location. The exon's location method links to the > location object that is linked to by the sub_location call, so there is no > duplication of data. So we use the sub_SeqFeature method on a SeqFeatureI to get the list of sub-features for a feature (since exons should be sub features of gene). In specialized objects like Gene we could call exons() to get these objects. > And if any of these exon locations are fuzzy or split, etc., the location > object gives us that. > Without getting lost in example land - here is one question of how to instantiate these things: Imagine the case of parsing a GenBank/EMBL file with annotated genes on a genomic sequence via the bioperl SeqIO system. We get to a SplitLocation. How should we represent the object? If primary_tag == 'CDS' do we instantiate a GeneStructure object? Otherwise we will instantiate all features with SeqFeature::Generic, some will have LocationI locations, some will have SplitLocationI locations. Assuming we sufficiently capture all of the information encoded about the Feature a user could write code to transform collections of CDS, source, exon, etc.. primary tags retrieved from a GenBank/EMBL parse into a GeneStructure object. I am going to guess that at some point we'd like to write an object that handles this gene instantiation in the common case or at least gives good examples on how to do it. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Tue Jan 23 22:08:20 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 23 Jan 2001 17:08:20 -0500 (EST) Subject: [Bioperl-l] Location naming semantics Message-ID: Anyone with a problem with these names? If so, please shout now. Interfaces Bio::LocationI Bio::Location::SplitLocationI Bio::Location::FuzzyLocationI Implementations Bio::Location::SimpleLocation Bio::Location::SplitLocation Bio::Location::FuzzyLocation Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Wed Jan 24 09:55:01 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 24 Jan 2001 01:55:01 -0800 Subject: [Bioperl-l] Re: [Bioperl-guts-l] RestrictionEnzyme.pm References: <5.0.2.1.2.20010124095100.00a81978@mailhost.curie.fr> Message-ID: <3A6EA675.ABBE91FF@gmx.net> Paul-Christophe Varoutas wrote: > > Hi again, > > Yesterday night I started experimenting with RestrictionEnzyme.pm. > > I liked very much the '-MAKE' =>'custom' switch in the constructor but I > think it would nevertheless be a good idea to write a public method which > updates the enzyme list from the NEBASE site. > > I suggest to write a sub (lets call it update_list or update_RE_list) that: > > - goes to the NEBASE site and gets the last version of the restriction > enzyme list. We can choose between http/ftp and various types of > lists/formats. My preference would be to go to their ftp site and get what > they call "format 18": DNAStrider format, list of all commercially > available enzymes. The file is ftp://ftp.nebase.com/pub/nebase/striderc.*, > the extension of the file reflects the version). > - saves this list in a text file, in the Bio/Tools/ directory. An > alternative is to update the enzyme list in the RestrictionEnzyme.pm file > itself, at the beginning of the file, within the definition of the %RE > hash, but intuitively I would not tend to recommend it, as I don't know if > writting in a file at the same time it is being read by the perl > interpreter will behave well in all operating systems. Tell me what you > think about it. You normally can't write to Bio/Tools as a user (under Unix), and a user client shouldn't attempt to do so under any circumstances. Regarding the ability to update the list of known REs, I see the following options. 1) Accept an additional (named!) parameter at initialization that denotes a file (in DNAStrider format?) containing the enzymes to be known in addition to a collection of hard-coded enzymes. 2) Same as before, but the parameter denotes a URL from where to obtain this file. 3) Put all hard-coded enzymes into a file that resides at a known place within the Bio/ directory tree, and read (parse) that upon initialization of RestrictionEnzyme.pm. An update would mean updating that file. I'm not sure option 3) would have compelling advantages to the present layout. Options 1) and 2) are certainly worthwhile to pursue and in essence are almost identical, the only difference being how to open the stream containing the enzyme data. So, one could try to combine both into one parameter, and have the code figure out whether it's a file or a http/ftp URL. Hilmar Do you already have a CVS write account? > - if the enzyme list is saved in a separate file, I will also modify the > initialisation of the %RE hash, with code that reads and parses the enzyme > list file. > > If this sounds OK to you, I will write it this weekend and submit it. Of > course if you had something completely different in mind please say it, I > will try to adapt to it. > > Paul-Christophe > -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 16:18:56 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 11:18:56 -0500 (EST) Subject: [Bioperl-l] Location committed Message-ID: cvs update -d in order to get the new directory. Location objects have been created and extracted from the clutches of SeqFeatureI. A LocationI object does support strandness because that is part of RangeI. A SeqFeatureI is still a RangeI for the practical purpose of backwards compatibility and simplicity, but this actually delegates things like start/end/strand to the LocationI object contained by the SeqFeatureI. If you want to debate any parts of this object model, start now because the code is still in the early stages. We currently have a Bio::Location::Simple to handle the current bioperl location behavior. Next step is to write implementations of the SplitLocationI and attach to the SeqIO parsing. I'll be adding more module documentation soon, but wanted to get these interfaces and simple implementation out there first so that others can help find problems AND possibly help write objects.... -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 16:54:57 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 16:54:57 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 Message-ID: Just to warn people, Test.pm does not ship with 5.004, patch level 4, which is a relatively common perl version installed. It can be installed, but I suspect moans and compliants about this to some extent. I will fix up the makefile to Barf more intelligently if it can't find test.pm... (Jason - I can't test your new objects at the moment due to the above problem... I'll try to do it from my laptop and I have asked Sanger systems to install test.pm...) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 17:22:52 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 12:22:52 -0500 (EST) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: I was hoping no one would notice... ;) I found this out last week then I tried to install Test.pm on a 5.00404 system and it won't install requires at least 5.00504. This means our test suite doesn't work under 5.00404. That is not good... grrrr... not wanting to back port all the Test.pm dependacies... Do we have to roll our own replacement and have that be included? On Wed, 24 Jan 2001, Ewan Birney wrote: > > Just to warn people, Test.pm does not ship with 5.004, patch level 4, > which is a relatively common perl version installed. It can be installed, > but I suspect moans and compliants about this to some extent. > > I will fix up the makefile to Barf more intelligently if it can't find > test.pm... > > > > (Jason - I can't test your new objects at the moment due to the above > problem... I'll try to do it from my laptop and I have asked Sanger > systems to install test.pm...) > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 17:26:39 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 17:26:39 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > I was hoping no one would notice... ;) > > I found this out last week then I tried to install Test.pm on a 5.00404 > system and it won't install requires at least 5.00504. This means our > test suite doesn't work under 5.00404. That is not good... > > grrrr... not wanting to back port all the Test.pm dependacies... Do we > have to roll our own replacement and have that be included? > this is a potential SHOW STOPPER. time to think about this... We have to either (a) jettison 5.00404 compatibility OR (b) back-port test suite (sorry jason) OR (c) roll own replacement I don't like any of these. Comments? (PS - I am not a big, huge, boy this is really making my life easier fan of Test.pm --- what is wrong with print "ok 2\n";. I don't need a module for this!) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 24 18:00:46 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 18:00:46 +0000 (GMT) Subject: [Bioperl-l] BOSC 2001 [Bioinformatics Open Source Conference] Message-ID: [PLEASE do not reply to this mail as it is cross-posted to many lists. PLEASE reply to bosc@bubbles.sonsorol.org. I am assumming that people KNOW HOW TO DRIVE THEIR MAIL CLIENTS. Think before hitting reply! This is an experiment to see how smart the general bioinformatics hacker is] We will be attempting to run another Bioinformatics Open Source Conference just before ISMB 2001. We have recieved information that this is likely to be able to occur and will possibly have extensive computer support, therefore allowing development to occur as well as talks. At the moment we are gathering our thoughts and generally mapping out the form of the conference. We would like input from the wider open source bioinformatics community for ideas about the conference. The practical aims of this is to (a) come up with a format for the day(s) (b) appoint a committee to run the conference. It is likely that myself and Chris Dadigidan will be the core of the committee as we've done this before and we know what is going on. (Frankly if someone wants to take over my cheer-leading role, you are more than welcome! Endless patience and good email-discipline is a must...) I would suggest the following committee membership: Each of the major groups nominate one person on the committee. I would suggest: bioperl (possibly me or chris), biojava, biopython, emboss, acedb, ensembl (possibly me) each has one person assigned to be on the committee. Then I would like to see if we can reach out into the smaller projects, including ones I haven't listed here, such as the nascent bioLISPers, I believe there is an open source Bio PathWays group, the Apollo/Gadfly people might want to make sure they are represented. (biocorba and bioxml - you are smaller projects at the moment) Ideally one or two people can come from the smaller projects. Total committee should be 8 or less. [PS - if you know of people who "have a project" but they are in the primordial soup stage of the project, please forward this mail onto them] Comments should be addressed to bosc@bubbles.sonsorol.org - like I said, I expect the major projects to assign their own representitive or say they are not interested. Ewan Birney ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 24 18:40:53 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 24 Jan 2001 10:40:53 -0800 Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 References: Message-ID: <3A6F21B5.BAD76917@gmx.net> Ewan Birney wrote: > > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > I was hoping no one would notice... ;) > > > > I found this out last week then I tried to install Test.pm on a 5.00404 > > system and it won't install requires at least 5.00504. This means our > > test suite doesn't work under 5.00404. That is not good... > > > > grrrr... not wanting to back port all the Test.pm dependacies... Do we > > have to roll our own replacement and have that be included? > > > > > > this is a potential SHOW STOPPER. time to think about this... > > We have to either > > (a) jettison 5.00404 compatibility OR > > (b) back-port test suite (sorry jason) OR > > (c) roll own replacement > > I don't like any of these. Comments? > > (PS - I am not a big, huge, boy this is really making my life easier fan > of Test.pm --- what is wrong with print "ok 2\n";. I don't need a module > for this!) > Well, I did find the test script code more concise after migrating to Test.pm. However, we don't use much of its functionality yet, only rather basic things. Would it be that hard to roll our own Test.pm version that offers just the basic things we're currently using, maybe even by porting the original? Would make the switch to the system module easy, once we drop 5.004 compatibility (we won't keep that eternally, will we?)? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 18:49:45 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 13:49:45 -0500 Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 References: Message-ID: <008f01c08636$6bba2090$61eb0398@mc.duke.edu> ----- Original Message ----- From: "Ewan Birney" To: "Jason Stajich" Cc: Sent: Wednesday, January 24, 2001 12:26 PM Subject: Re: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > I was hoping no one would notice... ;) > > > > I found this out last week then I tried to install Test.pm on a 5.00404 > > system and it won't install requires at least 5.00504. This means our > > test suite doesn't work under 5.00404. That is not good... > > > > grrrr... not wanting to back port all the Test.pm dependacies... Do we > > have to roll our own replacement and have that be included? > > > > > > this is a potential SHOW STOPPER. time to think about this... > > We have to either > > (a) jettison 5.00404 compatibility OR > > (b) back-port test suite (sorry jason) OR > > (c) roll own replacement > I can back port if necessary, I'd rather us have roll our own t/Test.pm that duplicates Test.pm functionality that we use though. I do rather like no having to keep track of which test number I am at so adding a test to the middle of the pack doesn't involve upping the ones that follow by 1. > I don't like any of these. Comments? > > > (PS - I am not a big, huge, boy this is really making my life easier fan > of Test.pm --- what is wrong with print "ok 2\n";. I don't need a module > for this!) > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > From agoldman@bnl.gov Wed Jan 24 19:31:39 2001 From: agoldman@bnl.gov (Adrian Goldman) Date: Wed, 24 Jan 2001 14:31:39 -0500 Subject: [Bioperl-l] Re: Restriction Enzyme methods... In-Reply-To: <200101241705.f0OH5Dp16304@pw600a.bioperl.org> References: <200101241705.f0OH5Dp16304@pw600a.bioperl.org> Message-ID: Hi, As a _user_ of the RestrictionEnzyme module, I think it is worth noting that -- as far as I can divine from the 0.6 code and my (admitedly limited) knowledge of Perl -- neither the external interface (make=custom) nor the internals support offset cutters correctly. (The documentation also implies the same.) Obviously this would have to be fixed before anything useful could be done with updating from rebase. The problem, as I understand it, is that the module supports essentially only one index into the restriction site; it recognises offset recognition elements correctly (requiring use of the reverse complement) but it can't "cut the DNA" correctly. As it turns out, it is enough for my application that the recognition is correct, as I am looking for the absence of sites -- but I hardly think that that is usual. If I'm wrong about the above, I'd _love_ to be told how to specify an offset cutter correctly through the make=custom switch!... Adrian Goldman >Paul-Christophe Varoutas wrote: > > > > Hi again, > > > > Yesterday night I started experimenting with RestrictionEnzyme.pm. > > > > I liked very much the '-MAKE' =>'custom' switch in the constructor but I > > think it would nevertheless be a good idea to write a public method which > > updates the enzyme list from the NEBASE site. > > > > I suggest to write a sub (lets call it update_list or update_RE_list) that: > > > > - goes to the NEBASE site and gets the last version of the restriction > > enzyme list. We can choose between http/ftp and various types of > > lists/formats. My preference would be to go to their ftp site and get what > > they call "format 18": DNAStrider format, list of all commercially > > available enzymes. The file is ftp://ftp.nebase.com/pub/nebase/striderc.*, > > the extension of the file reflects the version). > > - saves this list in a text file, in the Bio/Tools/ directory. An > > alternative is to update the enzyme list in the RestrictionEnzyme.pm file > > itself, at the beginning of the file, within the definition of the %RE > > hash, but intuitively I would not tend to recommend it, as I don't know if > > writting in a file at the same time it is being read by the perl > > interpreter will behave well in all operating systems. Tell me what you > > think about it. > >You normally can't write to Bio/Tools as a user (under Unix), and >a user client shouldn't attempt to do so under any circumstances. >Regarding the ability to update the list of known REs, I see the >following options. >1) Accept an additional (named!) parameter at initialization that >denotes a file (in DNAStrider format?) containing the enzymes to >be known in addition to a collection of hard-coded enzymes. >2) Same as before, but the parameter denotes a URL from where to >obtain this file. >3) Put all hard-coded enzymes into a file that resides at a known >place within the Bio/ directory tree, and read (parse) that upon >initialization of RestrictionEnzyme.pm. An update would mean >updating that file. > >I'm not sure option 3) would have compelling advantages to the >present layout. Options 1) and 2) are certainly worthwhile to >pursue and in essence are almost identical, the only difference >being how to open the stream containing the enzyme data. So, one >could try to combine both into one parameter, and have the code >figure out whether it's a file or a http/ftp URL. > > Hilmar > >Do you already have a CVS write account? > > > - if the enzyme list is saved in a separate file, I will also modify the > > initialisation of the %RE hash, with code that reads and parses the enzyme > > list file. > > > > If this sounds OK to you, I will write it this weekend and submit it. Of > > course if you had something completely different in mind please say it, I > > will try to adapt to it. > > > > Paul-Christophe > > > >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- Professor Adrian Goldman, | Phone: 358-(0)9-191 58923 Structural Biology Group, | FAX: 358-(0)9-191 58952 Institute of Biotechnology | Sec: 358-(0)9-191 58921 University of Helsinki, | Mobile: 358-(0)50-336 8960 PL 56 | Home: 358-(0)9-728 7103 00014 Helsinki | email: Adrian.Goldman@Helsinki.fi -- on sabbatical at Brookhaven National labs, June 2000-June 2001 Adrian Goldman, Biology Department, Building 463 50 Bell Ave., Brookhaven National Lab., Upton NY 11973. Phone: 631-344-2671 (off) 631-344-3417 (lab), 631-344-3407 (FAX). email: agoldman@bnl.gov From birney@ebi.ac.uk Wed Jan 24 19:39:03 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 19:39:03 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: <3A6F21B5.BAD76917@gmx.net> Message-ID: On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > Well, I did find the test script code more concise after migrating > to Test.pm. However, we don't use much of its functionality yet, > only rather basic things. Would it be that hard to roll our own > Test.pm version that offers just the basic things we're currently > using, maybe even by porting the original? Would make the switch > to the system module easy, once we drop 5.004 compatibility (we > won't keep that eternally, will we?)? This is an ok route for me as well. I guess it is not too hard. Is this going to drop between the three of us? Jason are you volunteering (I am aware jason you have done the lion's share of towards branch coding so far...) > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 19:50:37 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 14:50:37 -0500 (EST) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Ewan Birney wrote: > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > Well, I did find the test script code more concise after migrating > > to Test.pm. However, we don't use much of its functionality yet, > > only rather basic things. Would it be that hard to roll our own > > Test.pm version that offers just the basic things we're currently > > using, maybe even by porting the original? Would make the switch > > to the system module easy, once we drop 5.004 compatibility (we > > won't keep that eternally, will we?)? > > This is an ok route for me as well. I guess it is not too hard. > > Is this going to drop between the three of us? Jason are you volunteering > (I am aware jason you have done the lion's share of towards branch coding > so far...) It's pretty mindless to make the corrections so I can do it while I'm waiting for some analysis to finish. Can we just be sure that we are doing what seems to be the RIGHT thing. I really don't want to break the build on 5.00404 so let's roll our own Test.pm with and ok() and skip() methods or backport (my vote) to the old way of sub test {}. I'd put the Test module in t/Test.pm. > > > > > > Hilmar > > -- > > ----------------------------------------------------------------- > > Hilmar Lapp email: hlapp@gmx.net > > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > > ----------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 19:54:24 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 19:54:24 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > > On Wed, 24 Jan 2001, Ewan Birney wrote: > > > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > > > > Well, I did find the test script code more concise after migrating > > > to Test.pm. However, we don't use much of its functionality yet, > > > only rather basic things. Would it be that hard to roll our own > > > Test.pm version that offers just the basic things we're currently > > > using, maybe even by porting the original? Would make the switch > > > to the system module easy, once we drop 5.004 compatibility (we > > > won't keep that eternally, will we?)? > > > > This is an ok route for me as well. I guess it is not too hard. > > > > Is this going to drop between the three of us? Jason are you volunteering > > (I am aware jason you have done the lion's share of towards branch coding > > so far...) > > It's pretty mindless to make the corrections so I can do it while I'm > waiting for some analysis to finish. Can we just be sure that we are > doing what seems to be the RIGHT thing. I really don't want to break the > build on 5.00404 so let's roll our own Test.pm with and ok() and skip() > methods or backport (my vote) to the old way of sub test {}. I'd put the > Test module in t/Test.pm. I don't understand the backport sub test {} method - does this mean each Test uses this routine? I trust your call in here... From jason@chg.mc.duke.edu Wed Jan 24 19:57:26 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 14:57:26 -0500 (EST) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Ewan Birney wrote: > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > > > It's pretty mindless to make the corrections so I can do it while I'm > > waiting for some analysis to finish. Can we just be sure that we are > > doing what seems to be the RIGHT thing. I really don't want to break the > > build on 5.00404 so let's roll our own Test.pm with and ok() and skip() > > methods or backport (my vote) to the old way of sub test {}. I'd put the > > Test module in t/Test.pm. > > > I don't understand the backport sub test {} method - does this mean each > Test uses this routine? > > I trust your call in here... Stupid me, I meant to vote for a rolled our own Test.pm with ok and skip methods. The other option is to go back to what we had which was to defined a method test() in every .t file. But that is sort of dumb to copy and paste that method into every t file. We should at a minimum make one file that has the necessary method(s) - why not mimic Test.pm in this case with ok() and skip()... > > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 19:59:50 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 19:59:50 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > On Wed, 24 Jan 2001, Ewan Birney wrote: > > > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > > > > > > > It's pretty mindless to make the corrections so I can do it while I'm > > > waiting for some analysis to finish. Can we just be sure that we are > > > doing what seems to be the RIGHT thing. I really don't want to break the > > > build on 5.00404 so let's roll our own Test.pm with and ok() and skip() > > > methods or backport (my vote) to the old way of sub test {}. I'd put the > > > Test module in t/Test.pm. > > > > > > I don't understand the backport sub test {} method - does this mean each > > Test uses this routine? > > > > I trust your call in here... > > Stupid me, I meant to vote for a rolled our own Test.pm with ok and skip > methods. > > The other option is to go back to what we had which was to defined a > method test() in every .t file. But that is sort of dumb to copy and > paste that method into every t file. We should at a minimum make one file > that has the necessary method(s) - why not mimic Test.pm in this case with > ok() and skip()... Sounds good to me. I can help at least with the Testing of the test suite, and possibly some of the leg work (not tonight... about to get dinner...) > > > > > > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 20:00:55 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 15:00:55 -0500 (EST) Subject: [Bioperl-l] SplitFeature parsing Message-ID: Anyone want to help look at SplitLocation adding in SeqIO/FTHelper? We do this sub_SeqFeature addition with the code 'EXPAND' but it never gets reinterpreted when FT writing. Anyways, I suspect we chuck all this and go to a SplitLocation - right? All are invited to help here, but otherwise I'll go ahead and try and do it myself. Will need to first implement a Bio::Location::SplitLocation and then change the FTHelper code. Diving in soon... Hoping we can make the end of the month goal. Hilmar it might be helpful to recap the todo list on email so anyone who wants to join in knows what is left to do... -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 20:06:03 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 20:06:03 +0000 (GMT) Subject: [Bioperl-l] SplitFeature parsing In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > Anyone want to help look at SplitLocation adding in SeqIO/FTHelper? We > do this sub_SeqFeature addition with the code 'EXPAND' but it never gets > reinterpreted when FT writing. Anyways, I suspect we chuck all this and > go to a SplitLocation - right? All are invited to help here, but > otherwise I'll go ahead and try and do it myself. Will need to first > implement a Bio::Location::SplitLocation and then change the FTHelper > code. Diving in soon... Go for it. I can certainly review it. I still have RichSeqI stuff to do. . Not enough time... > > Hoping we can make the end of the month goal. Hilmar it might be helpful > to recap the todo list on email so anyone who wants to join in knows what > is left to do... > > -jason > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From lapp@gnf.org Wed Jan 24 20:11:48 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 12:11:48 -0800 Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 References: Message-ID: <3A6F3704.5C90DFB@gnf.org> Jason Stajich wrote: > > I meant to vote for a rolled our own Test.pm with ok and skip > methods. > > The other option is to go back to what we had which was to defined a > method test() in every .t file. But that is sort of dumb to copy and > paste that method into every t file. We should at a minimum make one file > that has the necessary method(s) - why not mimic Test.pm in this case with > ok() and skip()... > That's exactly what I meant. Then the whole test code migration was not in vain, because in the end (whenever that is :) we simply exchange a use statement at the top of each test script to use the system-supplied Test.pm. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 20:43:57 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 15:43:57 -0500 (EST) Subject: [Bioperl-l] Bio::SeqIO::FTHelper Message-ID: Looking at FTHelper _parse_loc method. Do we want our good buddy FTHelper to continue to create tag/value pairs for sub_Features to represent things like '_part_feature' and '_zero_width_feature'? Or are we happy with having Split/Fuzzy Locations handle this representation? -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Wed Jan 24 21:01:54 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 13:01:54 -0800 Subject: [Bioperl-l] Bio::SeqIO::FTHelper References: Message-ID: <3A6F42C2.3C1D65CC@gnf.org> Jason Stajich wrote: > > Looking at FTHelper _parse_loc method. > > Do we want our good buddy FTHelper to continue to create tag/value pairs > for sub_Features to represent things like '_part_feature' and > '_zero_width_feature'? Or are we happy with having Split/Fuzzy Locations > handle this representation? > Maybe I'm missing something, but I think migrating semantics from undocumented tags to explicit types was one of the objectives. If it was really undocumented, there shouldn't be client code relying on those tags outside of the Bioperl core itself. In theory :) People out there, if you have a client that relies on undocumented tags in a SeqFeature::Generic, please shout. Otherwise you can reckon that these tags will be gone. Please also shout if you have clients relying on documented tags pertaining to location and length (I recall having added some tags to the documentation in summer or fall last year, but hopefully no-one noticed :o) Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From MEColosimo@alumni.carnegiemellon.edu Wed Jan 24 23:31:27 2001 From: MEColosimo@alumni.carnegiemellon.edu (Marc Colosimo) Date: Wed, 24 Jan 2001 18:31:27 -0500 Subject: [Bioperl-l] Re: [Bioperl-guts-l] RestrictionEnzyme.pm (Hilmar Lapp) References: <200101241703.f0OH3ip16053@pw600a.bioperl.org> Message-ID: <3A6F65CC.3FB627FA@alumni.carnegiemellon.edu> > Paul-Christophe Varoutas wrote: > > > > Hi again, > > > > Yesterday night I started experimenting with RestrictionEnzyme.pm. > > > > I liked very much the '-MAKE' =>'custom' switch in the constructor but I > > think it would nevertheless be a good idea to write a public method which > > updates the enzyme list from the NEBASE site. > > > > I suggest to write a sub (lets call it update_list or update_RE_list) that: > > > > - goes to the NEBASE site and gets the last version of the restriction > > enzyme list. We can choose between http/ftp and various types of > > lists/formats. My preference would be to go to their ftp site and get what > > they call "format 18": DNAStrider format, list of all commercially > > available enzymes. The file is ftp://ftp.nebase.com/pub/nebase/striderc.*, > > the extension of the file reflects the version). > > - saves this list in a text file, in the Bio/Tools/ directory. An > > alternative is to update the enzyme list in the RestrictionEnzyme.pm file > > itself, at the beginning of the file, within the definition of the %RE > > hash, but intuitively I would not tend to recommend it, as I don't know if > > writting in a file at the same time it is being read by the perl > > interpreter will behave well in all operating systems. Tell me what you > > think about it. > > You normally can't write to Bio/Tools as a user (under Unix), and > a user client shouldn't attempt to do so under any circumstances. > Regarding the ability to update the list of known REs, I see the > following options. > 1) Accept an additional (named!) parameter at initialization that > denotes a file (in DNAStrider format?) containing the enzymes to > be known in addition to a collection of hard-coded enzymes. > 2) Same as before, but the parameter denotes a URL from where to > obtain this file. > 3) Put all hard-coded enzymes into a file that resides at a known > place within the Bio/ directory tree, and read (parse) that upon > initialization of RestrictionEnzyme.pm. An update would mean > updating that file. I would like to out that not all systems are as mean as Unix. Also, it would be nice to read them in save them in the local space. That way the user can just tell it to use the one in his/her space. That way they can have different ones (for what every reason). If your going through the trouble of doing this. Could you also add the ability to use multiple enzymes and/or list multiple enzymes? > > > I'm not sure option 3) would have compelling advantages to the > present layout. Options 1) and 2) are certainly worthwhile to > pursue and in essence are almost identical, the only difference > being how to open the stream containing the enzyme data. So, one > could try to combine both into one parameter, and have the code > figure out whether it's a file or a http/ftp URL. > > Hilmar Marc From jason@chg.mc.duke.edu Wed Jan 24 23:44:06 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 18:44:06 -0500 (EST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split Message-ID: I'd just like to reiterate - beware bioperl-live is development code. I added these handlers for Fuzzy and Split features. I decided to create methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether or now we saw the <, > descriptors. I probably need some more test cases to make sure we are really getting everything to work, but the test in t/SeqIO test.genbank in genbank.out seem to work for most things except the variation feature type which uses the operator 'replace'. We'll have to define that in the FTHelper model, I didn't plan for it. So the checked in code will screw up the variation features, but everything else seems to work. I'd like to do a better job detecting the feature location type from [.., ., ^] and use that to describe the Location object better, but we have the case of '<' and '>' which are technically fuzzy so I'm not sure how I really want to store these types of locations. Anyways, I'm not being very clear, so have a look, I know there are areas of improvement, you too can help us make this a robust parser.... -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Wed Jan 24 23:57:45 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 18:57:45 -0500 (EST) Subject: [Bioperl-l] failing test (singular!) Message-ID: So I don't owe beer in the Ensembl method of open source... I am getting errors from t/LiveSeq.t because start/end are not defined when a new LiveSeq Exon is instantiated. I'll be sure and look at it Thursday when I get a chance. Apologies for somehow breaking that, it is not clear where the error is, but it has something to do with either FTHelper changes or the way SeqFeatures get their start/end/strand information (my money is on this) with the new Location model. Everything else seems to pass except for the occasional crapout on NCBI website connection in the t/DB.t test. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Thu Jan 25 00:22:40 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 16:22:40 -0800 Subject: [Bioperl-l] failing test (singular!) References: Message-ID: <3A6F71D0.601FF22E@gnf.org> Jason Stajich wrote: > > So I don't owe beer in the Ensembl method of open source... > I am getting errors from t/LiveSeq.t because start/end are not defined > when a new LiveSeq Exon is instantiated. I'll be sure and look at it > Thursday when I get a chance. Apologies for somehow breaking that, it is > not clear where the error is, but it has something to do with either > FTHelper changes or the way SeqFeatures get their start/end/strand > information (my money is on this) with the new Location model. > > Everything else seems to pass except for the occasional crapout on NCBI > website connection in the t/DB.t test. > Cool, Jason. I have the feeling 0.7 can become a release we can really be proud of. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Thu Jan 25 00:36:29 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 16:36:29 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: Message-ID: <3A6F750D.367663F8@gnf.org> Jason Stajich wrote: > > I'd just like to reiterate - beware bioperl-live is development code. > > I added these handlers for Fuzzy and Split features. I decided to create > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > or not we saw the <, > descriptors. I probably need some more test cases I may have missed the obvious solution, but how are we going to distinguish 'unknown start/end' and 'somewhere in between'? That is, '<150' meaning 'before position 150', making it non-obvious how to return a minimal start, and '120.130' meaning it's between two known positions. Will I have to test fuzzy_start() before I'm allowed to safely call min_start()? (no, I don't want to suggest exceptions ... :O) > to make sure we are really getting everything to work, but the test in > t/SeqIO test.genbank in genbank.out seem to work for most things except > the variation feature type which uses the operator 'replace'. We'll have > to define that in the FTHelper model, I didn't plan for it. > I'm not sure the 'replace' operator is still standard (i.e., allowed). I seem to recall that it is no longer among the allowed operators, so you might wish to double-check on NCBI's feature table grammar definition. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason@chg.mc.duke.edu Thu Jan 25 15:06:17 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 25 Jan 2001 10:06:17 -0500 (EST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: <3A6F750D.367663F8@gnf.org> Message-ID: On Wed, 24 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > I added these handlers for Fuzzy and Split features. I decided to create > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > or not we saw the <, > descriptors. I probably need some more test cases > > I may have missed the obvious solution, but how are we going to > distinguish 'unknown start/end' and 'somewhere in between'? That is, > '<150' meaning 'before position 150', making it non-obvious how to > return a minimal start, and '120.130' meaning it's between two known > positions. Will I have to test fuzzy_start() before I'm allowed to > safely call min_start()? (no, I don't want to suggest exceptions ... > :O) Hmm, perhaps I was confused. I thought Split Location would deal with min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a better word, suggestions welcome]. All 3 can be present in the same location so they have to be independent operators. When you call start, it will return what it thinks is the start but you'll have to test to see if the range or the start is fuzzy ($loc->range_fuzzy || $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an exception here, but can be persuaded. Feel free to suggest a better set of methods for this. Now I'm cheating because I just added range_fuzzy this morning since I wanted to think about that some more. Learning by doing.... Oh and I think I just messed up - I'm not handling the 3'/5' different for the fuzziness, (< vs >). Will fix that by start_fuzzy/end_fuzzy returning -1, 0, 1 meaning 5', not fuzzy, on 3'. Unless you think it should return "<100" or "100>" instead? > > > to make sure we are really getting everything to work, but the test in > > t/SeqIO test.genbank in genbank.out seem to work for most things except > > the variation feature type which uses the operator 'replace'. We'll have > > to define that in the FTHelper model, I didn't plan for it. > > > > I'm not sure the 'replace' operator is still standard (i.e., allowed). > I seem to recall that it is no longer among the allowed operators, so > you might wish to double-check on NCBI's feature table grammar > definition. okay, well, we'll have to think about whether or not we want to just handle non-standard operators in a bulk way 'NonStandardLocation' which stores a tag that describes the operator so that we can preserve the tag name, or if we should build the flexibility some other way. Clearly what is being output right now variation 2913^2913 /replace="g" is relatively different from variation replace(347,"c") > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Thu Jan 25 15:19:53 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 25 Jan 2001 15:19:53 +0000 (GMT) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Jason Stajich wrote: > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > Jason Stajich wrote: > > > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > > > I added these handlers for Fuzzy and Split features. I decided to create > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > > or not we saw the <, > descriptors. I probably need some more test cases > > > > I may have missed the obvious solution, but how are we going to > > distinguish 'unknown start/end' and 'somewhere in between'? That is, > > '<150' meaning 'before position 150', making it non-obvious how to > > return a minimal start, and '120.130' meaning it's between two known > > positions. Will I have to test fuzzy_start() before I'm allowed to > > safely call min_start()? (no, I don't want to suggest exceptions ... > > :O) > > Hmm, perhaps I was confused. I thought Split Location would deal with > min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a > better word, suggestions welcome]. All 3 can be present in the same > location so they have to be independent operators. When you call > start, it will return what it thinks is the start but you'll have to > test to see if the range or the start is fuzzy ($loc->range_fuzzy || > $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an > exception here, but can be persuaded. In my experience it is crucial to treat join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not as a class of FuzzyFeature. The above syntax is the most used "fuzziness" and nearly everyone discards the leading and trailing '<' '>' as it means "partial gene" with the coordinates interpreted in a hard way. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From Shailesh L Mistry" Hi All, Here is the latest news about testing bioperl on WinNT :- 1) The BPbl2seq.t fails on test 7 because it wants the value to be 2e-053 but it gets 2e-53. (not sure if this is worth pursuing). 2) The alarm function has not been fixed and so blast.t, html.t and SimilarityPair.t fail. A decision needs to be made about whether to avoid using it or to just put a switch in to detect for Win32. 3) Index.t still has a file handle bug in it (Bug 865), so it can't be checked any further. 4) There is an intermittent problem with gdb.t and liveseq.t, both of which are proving difficult to track down. I hope this helps. Shelly. PS. My email is still stuffed so replies from me may be delayed. From jason@chg.mc.duke.edu Thu Jan 25 16:27:39 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 25 Jan 2001 11:27:39 -0500 (EST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Ewan Birney wrote: > On Thu, 25 Jan 2001, Jason Stajich wrote: > > > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > Jason Stajich wrote: > > > > > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > > > > > I added these handlers for Fuzzy and Split features. I decided to create > > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > > > or not we saw the <, > descriptors. I probably need some more test cases > > > > > > I may have missed the obvious solution, but how are we going to > > > distinguish 'unknown start/end' and 'somewhere in between'? That is, > > > '<150' meaning 'before position 150', making it non-obvious how to > > > return a minimal start, and '120.130' meaning it's between two known > > > positions. Will I have to test fuzzy_start() before I'm allowed to > > > safely call min_start()? (no, I don't want to suggest exceptions ... > > > :O) > > > > Hmm, perhaps I was confused. I thought Split Location would deal with > > min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start > > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a > > better word, suggestions welcome]. All 3 can be present in the same > > location so they have to be independent operators. When you call > > start, it will return what it thinks is the start but you'll have to > > test to see if the range or the start is fuzzy ($loc->range_fuzzy || > > $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an > > exception here, but can be persuaded. > > In my experience it is crucial to treat > join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not > as a class of FuzzyFeature. > > The above syntax is the most used "fuzziness" and nearly everyone discards > the leading and trailing '<' '>' as it means "partial gene" with the > coordinates interpreted in a hard way. Okay I was interpreting this as a SplitLocation with 3 LocationI objects 2 of which are Fuzzy Locations... I just wasn't handling all the possible cases of 10..<100 10..100> <10..100 10>..100 I consider this fuzzy -- since a start or end point is not well defined. I also consider 5.12 fuzzy since its 'range' is not fuzzy. > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Thu Jan 25 16:38:21 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 25 Jan 2001 16:38:21 +0000 (GMT) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Jason Stajich wrote: > > On Thu, 25 Jan 2001, Ewan Birney wrote: > > > On Thu, 25 Jan 2001, Jason Stajich wrote: > > > > > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > > > Jason Stajich wrote: > > > > > > > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > > > > > > > I added these handlers for Fuzzy and Split features. I decided to create > > > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > > > > or not we saw the <, > descriptors. I probably need some more test cases > > > > > > > > I may have missed the obvious solution, but how are we going to > > > > distinguish 'unknown start/end' and 'somewhere in between'? That is, > > > > '<150' meaning 'before position 150', making it non-obvious how to > > > > return a minimal start, and '120.130' meaning it's between two known > > > > positions. Will I have to test fuzzy_start() before I'm allowed to > > > > safely call min_start()? (no, I don't want to suggest exceptions ... > > > > :O) > > > > > > Hmm, perhaps I was confused. I thought Split Location would deal with > > > min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start > > > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a > > > better word, suggestions welcome]. All 3 can be present in the same > > > location so they have to be independent operators. When you call > > > start, it will return what it thinks is the start but you'll have to > > > test to see if the range or the start is fuzzy ($loc->range_fuzzy || > > > $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an > > > exception here, but can be persuaded. > > > > In my experience it is crucial to treat > > join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not > > as a class of FuzzyFeature. > > > > The above syntax is the most used "fuzziness" and nearly everyone discards > > the leading and trailing '<' '>' as it means "partial gene" with the > > coordinates interpreted in a hard way. > > Okay I was interpreting this as a > SplitLocation with > 3 LocationI objects > 2 of which are Fuzzy Locations... Ok. This is a good solution here, but the trouble about this recursion is that of course it allows SplitLocationI has-a SplitLocationI etc, which now becomes (a) a nightmare to do anything with (b) impossible to represent in EMBL/GenBank (c) generally lots of rope to hang ourselves with Two options - punt on these cases in the code... or pop in another inheritance layer in the interfaces: LocationI ^ | ------------------------ SingleLocationI SplitLocationI | sub_Locations defined to return SingleLocationI array | ----------------- SimpleLocationI FuzzyLocationI (does the above crappy ascii art make sense to you?) I guess this says that all FuzzyLocations can be made as combination of a single SplitLocation with a set of FuzzyLocations. ???? (ewan sighs again about fuzziness. It is just a can of worms that noone needs and noone should use) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Thu Jan 25 18:35:24 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 25 Jan 2001 10:35:24 -0800 Subject: [Bioperl-l] multiple blast script Message-ID: <3A7071EC.810AABA1@gmx.net> To bring it to the right audience. Please do not post such things to bioperl-guts-l, they will be ignored normally. Hilmar -------- Original Message -------- Subject: [Bioperl-guts-l] multiple blast script Date: Wed, 24 Jan 2001 13:22:50 -0600 From: Willy Valdivia To: bioperl-guts-l@bioperl.org Dear Group: I am looking for a Perl script that I can may allow me to perform multiple sequences alignment at once using BLAST. Thank you, Willy Valdivia Granda Plant Sciences Dept North Dakota State University _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From hlapp@gmx.net Thu Jan 25 18:38:59 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 25 Jan 2001 10:38:59 -0800 Subject: [Bioperl-l] RestrictionEnzyme.pm Message-ID: <3A7072C3.82932227@gmx.net> To bring it to the right audience. Please note that despite some module PODs still saying that bioperl-guts-l is for technical discussion, it is not in fact. The guts-list is for CVS messages and similar stuff most people never want to hear about. Hilmar -------- Original Message -------- Subject: RE: [Bioperl-guts-l] RestrictionEnzyme.pm Date: Wed, 24 Jan 2001 14:00:30 +0100 From: "Paul-Christophe Varoutas" To: You *are* right about not writting to the Bio/Tools directory, I guess I was rather sleepy when I wrote my previous mail %-P. And using my Win2000 and Linux as root all time doesn't arrange things either ;-). It is a good idea to incorporate the RE list update in the object constructor, and combining Hilmar's options (1) and (2) seems great because it's a flexible solution and should suite most needs. For the URL retrieval, I guess http will be more suitable, I will contact NEBASE to be sure that the URL we will decide to mostly use is to remain stable. This solution raises a small question: that of multiple occurences. The fact that we are using hashes will take care of eliminating multiple occurences of enzymes (one from the hard-coded collection, one from the the file / URL). Perhaps a minor issue would be to decide whether we just "let perl do the work" or if we do verifications while replacements are done, and/or define how they are done. We can make the assumption that, say, AatII always has the same recognition site, but if I make a issue out of this is because I don't know yet how this module is being used, and especially if it is only used for what it has initially been designed for. Do you know if there are users out there using this module in an unorthodox way, defining enzyme names/recognition sequences that don't exist, but could risk to create conflicts/unusual behavior ? Another issue is enzymes cutting asymetrically. For the moment the other RestrictionEnzyme methods don't know how to deal with them (as far as I understood), so the code will just ignore them while parsing the RE list file. One remaining question is about the RE list file format: is the DNAStrider format OK for everybody, or is there another suggestion ? An alternative would be to contact NEBASE and ask them to add a new 'bioperl' format to their database, and then define a format that minimizes parsing and suits best our needs. On their web site they say: "As REBASE expands, new data formats are provided. Requests for specialized formats are welcome, as we are prepared to support each major sequence analysis package". (The URL is: http://rebase.neb.com/rebase/rebase.serv.html ) So what do you think about this idea ? > Do you already have a CVS write account? I have already successfully anonymously CVSed from my home PC (under Win2000 and linux), but I don't have a write account yet. I will contact Ewan / Chris about that. Paul-Christophe > You normally can't write to Bio/Tools as a user (under Unix), and > a user client shouldn't attempt to do so under any circumstances. > Regarding the ability to update the list of known REs, I see the > following options. > 1) Accept an additional (named!) parameter at initialization that > denotes a file (in DNAStrider format?) containing the enzymes to > be known in addition to a collection of hard-coded enzymes. > 2) Same as before, but the parameter denotes a URL from where to > obtain this file. > 3) Put all hard-coded enzymes into a file that resides at a known > place within the Bio/ directory tree, and read (parse) that upon > initialization of RestrictionEnzyme.pm. An update would mean > updating that file. > > I'm not sure option 3) would have compelling advantages to the > present layout. Options 1) and 2) are certainly worthwhile to > pursue and in essence are almost identical, the only difference > being how to open the stream containing the enzyme data. So, one > could try to combine both into one parameter, and have the code > figure out whether it's a file or a http/ftp URL. > > Hilmar > > Do you already have a CVS write account? > > > - if the enzyme list is saved in a separate file, I will also modify the > > initialisation of the %RE hash, with code that reads and parses > the enzyme > > list file. > > > > If this sounds OK to you, I will write it this weekend and submit it. Of > > course if you had something completely different in mind please > say it, I > > will try to adapt to it. > > > > Paul-Christophe > > > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From lapp@gnf.org Thu Jan 25 21:10:23 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 25 Jan 2001 13:10:23 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: Message-ID: <3A70963F.34F10F@gnf.org> First, I think it is better to bring this back to the list, because users *will* be affected by the final design and implementation (i.e., Mark & David & others, watch out, don't complain afterwards). Jason Stajich wrote: > > So I that have really clearly solved this - > lease correct me if any of the following statement is false. ( N is a > location point) > > - start/end can be fuzzy at both points and it could be > (on 3') at either start/end point. However, N< and >N are invalid fuzzy > point descriptions. If they are indeed true then my start_fuzzy will > need to be more than just (-1, 0, 1) -- (5', not fuzzy, 3') but 5 > points (5' before, 5' after, 0, 3' before, 3' after) and I really don't > even know what that would mean since I would be so wrapped up in strand > coordinates - would think a 'complement' would simplify it ( no, not a > pat on the back, that's when we get to the release) > > - in plain simple genbank/embl terms > <5..12> and <5.12> > are valid, but > >5..12, 5<..12, 5..12<, 5..>12 > are invalid. The GenBank documentation is somewhat inconsistent here. Let me quote: >From http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB If the "<" symbol precedes a base span, the sequence is partial on the 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, the sequence is partial on the 3' end (e.g., CDS 435..915>). >From http://www.ncbi.nlm.nih.gov/collab/FT/index.html CDS <1..>336 /codon_start=1 /gene="IGHV1" /product="immunoglobulin heavy chain variable region" V_region <1..>336 /gene="IGHV1" /product="immunoglobulin heavy chain variable region" >From the BNF grammar definition of the feature table, to be found at http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur local_location ::= | | base_position ::= | | | low_base_bound ::= > high_base_bound ::= < two_base_bound ::= . between_position ::= ^ base_range ::= .. The sample record link seems to be pretty new, but I'm not sure. Shall we simply build upon the BNF? Maybe we should ask someone from NCBI. > > Questions: > 1. Do we need to override the famous pocock RangeI contains/overlaps > methods for a Split location to take into account where the pieces > of the contained LocationI are? > Or do we take the easy route and just use min_start/max_end? I think > that right now start/end return 0 for a split location since they are > not explictly set, should they default to delegating to > min_start/max_start? I think so. > > What about in Fuzzy, do we want to throw exceptions or do we just use > the best information we have and do some logic and coordinate > gymnastics to try and return a reasonable value or else throw an > exception? > As I understood the comments from users, exceptions should be avoided here whenever possible. However, since there are different policies one can think of, a mechanism should be provided to switch between them. > 2. Deep Split/Fuzziness - [copying famous artwork from Ewan's latest > email] > > LocationI > ^ > | > ------------------------ > SingleLocationI SplitLocationI > | sub_Locations defined to return SingleLocationI array > | > ----------------- > SimpleLocationI FuzzyLocationI > > > (does the above crappy ascii art make sense to you?) > > I guess this says that all FuzzyLocations can be made as combination of > a single SplitLocation with a set of FuzzyLocations. > > [ end Ewan's included message ] > > This is exactly what I have assumed. I see SplitLocation as simply a > Collection of LocationI objects some of which may be fuzzy. The only > problem is how to define min_start/max_end for a > SplitLocation when the beginning and end of the locations are fuzzy? > > As for deep SplitLocation (ie SplitLocation containing Location objects > that are SplitLocations), this will work in a very gross way just like > perl flattens arrays, except I don't plan to simplify the join(...join()) > code into a single join() unless you guys think its worth it. It wouldn't > be hard, just let perl collapse the arrays... > Be aware that you don't lose information you need for recovering the original location entry upon writing. If that seems to inflate the object tree unnecessarily, we can also store the original location string as a property. Not beautiful, but KISS is not a bad principle. > Any other problems you guys can think of. > > So close... I wonder if we should include Alan on this so we can see if > the biocorba IDL will really handle all of this now? I guess I could To my understanding BioCorba and BioPerl pretty much affect each other, don't they? If so, we should definitely get a comment from him. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Thu Jan 25 21:18:44 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 25 Jan 2001 13:18:44 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A706F7F.6020908@sanger.ac.uk> Message-ID: <3A709834.64E33D50@gnf.org> Bringing this (in part) back to the list, too. Matthew Pocock wrote: > > Jason Stajich wrote: > > Questions: > > 1. Do we need to override the famous pocock RangeI contains/overlaps > > methods for a Split location to take into account where the pieces > > of the contained LocationI are? > > Or do we take the easy route and just use min_start/max_end? I think > > that right now start/end return 0 for a split location since they are > > not explictly set, should they default to delegating to > > min_start/max_start? I think so. > > > Originaly the BioJava Locations just used min/max for all location > operators - this turned out to be a *very bad thing* under most > conditions. You are better off having operators that use split locations > return split locations - also, the union of two ranges that don't > overlap is the split location containing both ranges. It is more work to > set up, but it pays off & if you don't do it you get confusing bugs later. > > > What about in Fuzzy, do we want to throw exceptions or do we just use > > the best information we have and do some logic and coordinate > > gymnastics to try and return a reasonable value or else throw an > > exception? > My gut says to return the inner-most coordinate that is known but > provide API to get the full fuzzy coordinates out - so > > full loc -> start..end : minStart..maxEnd > <50..100> -> 50..100 : -INF..+INF > (78.90)..(100.107) -> 90..100 : 78..107 > I think I am much more in favor of returning the outer-most coordinates as the default policy. David, Mark? I'm also not sure whether INF or NaN are good return values in perl (i.e., can you test for INF or NaN by numeric comparison? I figured that e.g. you can't obtain NaN by sqrt(-1), as would be the result in C). Hilmar > > > > As for deep SplitLocation (ie SplitLocation containing Location objects > > that are SplitLocations), this will work in a very gross way just like > > perl flattens arrays, except I don't plan to simplify the join(...join()) > > code into a single join() unless you guys think its worth it. It wouldn't > > be hard, just let perl collapse the arrays... > Should work - there is the pathalogical case where an index is included > via two paths. CompoundLocation in BioJava does all the collapsing at > constructor time. All our Location objects are immutable, so once > constructed, you can't change their contained indexes in any way. The > hierachy of Location containment is never exposed to the user - we may > have to expose it if we provide a full fuzzy-location editor, though. > Now I come to think of it, I have seen Embl CDS entries with internal > exons that have < or > operators on them. Pants. > -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Thu Jan 25 21:48:47 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Thu, 25 Jan 2001 15:48:47 -0600 (CST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: <3A709834.64E33D50@gnf.org> Message-ID: This is my Modest Proposal for resolving some things: If there is a defined location, return the defined location, ie: my $start=$feature->start $start equals 42. IF there is not a hard location (it is fuzzy, split, whatever), return the location object, and let the client suss out what it wants to do with it. my $start=$feature->start; if (ref($start) eq 'LocationI') { #whatever the perl syntax is $start=myLocParser($start); } Then $start could be made to be min_start, max_start, an_array_of_start_values, or whatever was convenient for the client. $0.02 Cdn is pretty cheap nowadays. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From mdalphin@amgen.com Thu Jan 25 21:51:03 2001 From: mdalphin@amgen.com (Mark Dalphin) Date: Thu, 25 Jan 2001 13:51:03 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A70963F.34F10F@gnf.org> Message-ID: <3A709FC7.84F164F9@amgen.com> Hilmar Lapp wrote: > > - in plain simple genbank/embl terms > > <5..12> and <5.12> > > are valid, but > > >5..12, 5<..12, 5..12<, 5..>12 > > are invalid. > > The GenBank documentation is somewhat inconsistent here. Let me quote: > > >From http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB > > > If the "<" symbol precedes a base span, the sequence is partial on the > 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, > the > sequence is partial on the 3' end (e.g., CDS 435..915>). > > > >From http://www.ncbi.nlm.nih.gov/collab/FT/index.html > > > CDS <1..>336 > /codon_start=1 > /gene="IGHV1" > /product="immunoglobulin heavy chain variable region" > V_region <1..>336 > /gene="IGHV1" > /product="immunoglobulin heavy chain variable region" > > > >From the BNF grammar definition of the feature table, to be found at > http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur > > > local_location ::= | | > base_position ::= | | | > > > low_base_bound ::= > > > high_base_bound ::= < > > two_base_bound ::= . > > between_position ::= ^ > > base_range ::= .. > > I just looked for an example at NCBI and found this: http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&list_uids=234355&dopt=GenBank As you can see, the symbol '>' does end up BEFORE the position it is modifing which is consistant with the BNF. Hope this helps... LOCUS S52564 10 bp DNA PRI 05-APR-1999 DEFINITION Homo sapiens phenylalanine hydroxylase (PAH) gene, partial cds. ACCESSION S52564 VERSION S52564.1 GI:234355 SOURCE human. ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. FEATURES Location/Qualifiers source 1..10 /organism="Homo sapiens" /db_xref="taxon:9606" gene <1..>10 /gene="PAH" CDS <1..>10 /gene="PAH" /note="missense mutation" /codon_start=2 /product="phenylalanine hydroxylase" /protein_id="AAD14912.2" /db_xref="GI:4559419" /translation="HGV" variation 5..7 /gene="PAH" /note="Gly for Glu221" BASE COUNT 3 a 2 c 3 g 2 t ORIGIN 1 ccatggagta // Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work) From mwilkinson@gene.pbi.nrc.ca Thu Jan 25 21:44:36 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Thu, 25 Jan 2001 15:44:36 -0600 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A706F7F.6020908@sanger.ac.uk> <3A709834.64E33D50@gnf.org> Message-ID: <3A709E44.7B69B527@gene.pbi.nrc.ca> Hilmar Lapp wrote: > > full loc -> start..end : minStart..maxEnd > > <50..100> -> 50..100 : -INF..+INF > > (78.90)..(100.107) -> 90..100 : 78..107 > > I think I am much more in favor of returning the outer-most > coordinates as the default policy. David, Mark? In my gut I would also favour outer-most, only because, even with a simple scan of the data, you are able to say "there's something there" or not. However, the phrase "$Feature->start/stop returns the outer-most start/stop positions unless either is undefined in which case that one (or both) return the minimum" gives me the shivers! Still, this is more of a problem for unsophisticated parsers, which presumably will be asking unsophisticated questions - what will be most important for them (I think) is to be given the coordinates which span the maximum "secure" region. So, yes, I agree that outermost is preferable to innermost. > whether INF or NaN are good return values in perl YUCK! Please don't go there... Perhaps returning undef in a call to maxStart or maxEnd would be better... it functions nicely in testing statements. [[ Dave just told me he would prefer to return a Location object in a call to Feature->start that needed to return a fuzzy value, and let the parser choke on the resulting errors :-) Although this is nice OO Perl, I doubt that most existing parsers (or their authors) would be very happy with that solution! ]] -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From mwilkinson@gene.pbi.nrc.ca Thu Jan 25 21:46:53 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Thu, 25 Jan 2001 15:46:53 -0600 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: Message-ID: <3A709ECD.5B0EA636@gene.pbi.nrc.ca> David Block wrote: > $0.02 Cdn is pretty cheap nowadays. Your head will be worth more than that if we go that route... ;-) M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From mdalphin@amgen.com Thu Jan 25 22:06:48 2001 From: mdalphin@amgen.com (Mark Dalphin) Date: Thu, 25 Jan 2001 14:06:48 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A706F7F.6020908@sanger.ac.uk> <3A709834.64E33D50@gnf.org> Message-ID: <3A70A377.C99B79B1@amgen.com> Hilmar Lapp wrote: > > > What about in Fuzzy, do we want to throw exceptions or do we just use > > > the best information we have and do some logic and coordinate > > > gymnastics to try and return a reasonable value or else throw an > > > exception? > > My gut says to return the inner-most coordinate that is known but > > provide API to get the full fuzzy coordinates out - so > > > > full loc -> start..end : minStart..maxEnd > > <50..100> -> 50..100 : -INF..+INF > > (78.90)..(100.107) -> 90..100 : 78..107 > > > > I think I am much more in favor of returning the outer-most > coordinates as the default policy. David, Mark? I'm also not sure > whether INF or NaN are good return values in perl (i.e., can you test > for INF or NaN by numeric comparison? I figured that e.g. you can't > obtain NaN by sqrt(-1), as would be the result in C). > > Hilmar My inclination is also to select the outer-most ranges for the defined regions. I understand the reason for selected the "certainty" of the inner ranges, but most of the biologists here (it seems to me...) would rather have "weak data showing some potential" rather than "more certain data which risks missing something". This is a philosopical issue that involves many end-users. I would end up writing it to take the outer-most to please my customers, but I am not sure that it doesn't just give them more noise to wade through. For the uncertain edges, ie '<' and '>' I am not certain how best to handle them in Perl. There are really several cases here: 1) The most common in GenBank, I believe is where you just don't have more sequence so you end up with: CDS <1..>$Seq_Len Here we are saying that we don't even have sequence to go with. Displaying it is not really a problem, usually. 2) An uglier problem is when a gene-prediction program predicts an intial "exon". This "exon" is really only part of an exon as the program only predicts coding sequence and ignores the 5'-UTR. This might lead to: exon <105..300 CDS join(105..300, 405..1004) Here we have the upstream sequence (5'UTR) and know it extends directly upstream of position 105, but we don't really know where. I don't really know what to do with these. I think the best we can do is indicate it with a flag, similar to '<' or '>', whether we are drawing a picture or trying to extract in "interesting" sequence from a genomic fragment. I don't think returning NAN or INF is correct; we have an uncertainty, but we certainly don't have INF or even NAN. We need to pass on this "uncertainty" to the calling program for it to express to the user. Mark -- Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work) From jason@chg.mc.duke.edu Thu Jan 25 23:17:27 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 25 Jan 2001 18:17:27 -0500 (EST) Subject: [Bioperl-l] test suite upgrade Message-ID: A Smarter test suite and some code corrections to make perl 5.00404 happy have been checked in. Please checkout the live version and give it a try. Notes: - This involved copying Test.pm version 1.15 I believe, removing the line that required Test::Harness of a certain version (which won't install on 5.00404) and doing some fun use lib stuff in the BEGIN block of t test. Unfortunately my first hope of just pushing the 't' dir on the @INC stack did not work under 5.00404 - it was not being recognized. I'm not sure if that was not available in earlier versions of perl or what. At ant rate it was solved by our good friend eval... BEGIN { # to handle systems with no installed Test module # we include the t dir (where a copy of Test.pm is located) # as a fallback eval { require Test; }; if( $@ ) { use lib 't'; } use Test; plan tests => 35 } All new test modules should follow this format or they won't be able to use Test.pm on platforms with Test.pm not installed. - The LiveSeq test is still not working, but that's probably because I haven't really dug much to find out why it is failing. - I get strange errors (the ever cryptic 'dubious' message) in 5.00404 when exit is called in the BEGIN block which is necessary for tests where all the necessary modules are not installed on the system. - Things to deal with platform compatibility have not been addressed (alarm still called,index.t). I tried to work on the Index.t problem but didn't get the general solution to work (because we can't depend on File::Spec to be installed,grrr) so I will have to probably rely on the suggested fix by Shailesh. -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Fri Jan 26 00:42:13 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 25 Jan 2001 16:42:13 -0800 Subject: [Bioperl-l] test suite upgrade References: Message-ID: <3A70C7E5.1DD88031@gnf.org> Jason Stajich wrote: > use lib 't'; Isn't it required to append a slash to the directory name? I thought I read about that, but right now can't verify in the lib POD (there is no notion about a trailing slash requirement or absence requirement). Does anyone know for sure? Could it even be that Perl is smart here and allows both? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney@ebi.ac.uk Fri Jan 26 09:46:28 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 26 Jan 2001 09:46:28 +0000 (GMT) Subject: [Bioperl-l] test suite upgrade In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Jason Stajich wrote: > A Smarter test suite and some code corrections to make perl 5.00404 happy > have been checked in. > > Please checkout the live version and give it a try. > > Notes: > > - This involved copying Test.pm version 1.15 I believe, removing the line > that required Test::Harness of a certain version (which won't install on > 5.00404) and doing some fun use lib stuff in the BEGIN block of t test. > Unfortunately my first hope of just pushing the 't' dir on the @INC > stack did not work under 5.00404 - it was not being recognized. I'm not > sure if that was not available in earlier versions of perl or what. At > ant rate it was solved by our good friend eval... Jason is becoming THE MAN for this release. I'll check this out and report back. Awesome Jason! > > BEGIN { > # to handle systems with no installed Test module > # we include the t dir (where a copy of Test.pm is located) > # as a fallback > eval { require Test; }; > if( $@ ) { > use lib 't'; > } > use Test; > plan tests => 35 } > > All new test modules should follow this format or they won't be able to > use Test.pm on platforms with Test.pm not installed. > > - The LiveSeq test is still not working, but that's probably because I > haven't really dug much to find out why it is failing. > > - I get strange errors (the ever cryptic 'dubious' message) in 5.00404 > when exit is called in the BEGIN block which is necessary for tests > where all the necessary modules are not installed on the system. > I've worked around this one before. I'll see what I can do here... > - Things to deal with platform compatibility have not been addressed > (alarm still called,index.t). I tried to work on the Index.t problem > but didn't get the general solution to work (because we can't depend on > File::Spec to be installed,grrr) so I will have to probably rely on the > suggested fix by Shailesh. > > -Jason > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Fri Jan 26 16:09:15 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Fri, 26 Jan 2001 10:09:15 -0600 (CST) Subject: [Bioperl-l] test suite upgrade In-Reply-To: <3A70C7E5.1DD88031@gnf.org> Message-ID: On Thu, 25 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > use lib 't'; > > Isn't it required to append a slash to the directory name? I thought I > read about that, but right now can't verify in the lib POD (there is > no notion about a trailing slash requirement or absence requirement). > > Does anyone know for sure? Could it even be that Perl is smart here > and allows both? It does, in my experience. > > Hilmar > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From jason@chg.mc.duke.edu Fri Jan 26 23:12:05 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Fri, 26 Jan 2001 18:12:05 -0500 (EST) Subject: [Bioperl-l] more fuzziness checked in Message-ID: more robust fuzzy and split feature handling checked in. FTHelper will try and see if start==end, if it does and there is no splitlocation delimiter then the code will return just a single number representing the location ie variation 500 /allele="C" /allele="T" Bio::Location::Split - added method 'splittype' to capture more directives than just 'join'. ie 'order'. This is then called in FTHelper when constituting a feature table for output. Bio::Location::Fuzzy - renamed methods to fuzzy_string, fuzzy_end, fuzzy_range which will return strings representing the fuzzy points and range type (. or ^). This method validates a string to be sure it is a valid type of fuzzy location or range delimiter and then stores it literally so it can be returned later. I also added methods called _fuzzypointencode and _fuzzyrangeencode which return integers intended to represent the type of fuzzy location, this doesn't put any burden on the parser to interpret the <3..12 means starting on the 5', etc. All of these methods were added to the interface Bio::Location::FuzzyLocationI. So FTHelper just calls fuzzy_start and fuzzy_end to get the end points ( sane numeric is returned if the point is not fuzzy). start/end were overridden by Location::Fuzzy to pass their values to fuzzy_start/end if they were indeed fuzzy and parses to get the basic integer out to store in start. This means length will return something even if it is not technically correct ie for <3..12 length() will return 9. Enjoy... -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Sun Jan 28 07:26:44 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sat, 27 Jan 2001 23:26:44 -0800 Subject: [Bioperl-l] Bio::PrimarySeq Message-ID: <3A73C9B4.7147B41A@gmx.net> Is there any particular reason that length() and subseq() in Bio::PrimarySeq obtain the sequence string by a direct access of the hash instead of calling seq()? This is potentially dangerous if a derived object overrides seq(). If there's no real reason I'll fix it. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 28 10:04:09 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 28 Jan 2001 10:04:09 +0000 (GMT) Subject: [Bioperl-l] Bio::PrimarySeq In-Reply-To: <3A73C9B4.7147B41A@gmx.net> Message-ID: On Sat, 27 Jan 2001, Hilmar Lapp wrote: > Is there any particular reason that length() and subseq() in > Bio::PrimarySeq obtain the sequence string by a direct access of > the hash instead of calling seq()? This is potentially dangerous > if a derived object overrides seq(). Not that I know of... > > If there's no real reason I'll fix it. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Sun Jan 28 10:23:17 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sun, 28 Jan 2001 02:23:17 -0800 Subject: [Bioperl-l] Empty Seqs Message-ID: <3A73F315.50AA1F74@gmx.net> I added the possibility to create empty sequences to Bio::PrimarySeq (and thereby Bio::Seq), and support for reading and writing empty sequences to fasta format in Bio::SeqIO. Entries with an empty line following the description line as well as those without the additional empty line are supported. Note that if you initialize an explicitely empty sequence you MUST provide the -moltype parameter. The reason is that a sequence must have a moltype, and for an empty sequence it cannot be guessed (which it is otherwise). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Sun Jan 28 10:24:49 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sun, 28 Jan 2001 02:24:49 -0800 Subject: [Bioperl-l] Bio::PrimarySeq References: Message-ID: <3A73F371.8706C95E@gmx.net> Ewan Birney wrote: > > > Is there any particular reason that length() and subseq() in > > Bio::PrimarySeq obtain the sequence string by a direct access of > > the hash instead of calling seq()? This is potentially dangerous > > if a derived object overrides seq(). > > Not that I know of... > Okay. I fixed it. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 28 14:29:52 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 28 Jan 2001 14:29:52 +0000 (GMT) Subject: [Bioperl-l] all tests but LiveSeq.t pass Message-ID: All tests but LiveSeq.t pass. Jason - I am going to start looking at your sensational Location stuff to give it another pair of eyes over the code. I still need to look at getting RichSeq or something similar in... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Sun Jan 28 20:34:36 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sun, 28 Jan 2001 12:34:36 -0800 Subject: [Bioperl-l] flexible warning/exception in SeqIO Message-ID: <3A74825C.E024BA97@gmx.net> This is on our tasklist. To reiterate briefly the background, we had a discussion a while ago that there are many applications which would rather lose an entry of a databank file or a feature of an entry than choking due to an exception being thrown. The reason for such exceptions are entries which are either misformatted or contain syntax not yet understood by BioPerl (there will be significantly less though due to the new location model). The conclusion was that we want to have some flexibility on the client side, who can turn such incidents into exceptions if he/she wants to, but the default would be to only warn. I'm not sure but as I understood the changes to RootI every object has the ability to turn warn() into throw() by saying $obj->verbose(2). Is that right, and if so, do people agree that this fulfills the requirements in SeqIO warn/throw flexibility (which implies that the SeqIO code only warn()s). If people agree, this point becomes light green. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 28 21:53:09 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 28 Jan 2001 21:53:09 +0000 (GMT) Subject: [Bioperl-l] flexible warning/exception in SeqIO In-Reply-To: <3A74825C.E024BA97@gmx.net> Message-ID: On Sun, 28 Jan 2001, Hilmar Lapp wrote: > This is on our tasklist. To reiterate briefly the background, we > had a discussion a while ago that there are many applications > which would rather lose an entry of a databank file or a feature > of an entry than choking due to an exception being thrown. The > reason for such exceptions are entries which are either > misformatted or contain syntax not yet understood by BioPerl > (there will be significantly less though due to the new location > model). > > The conclusion was that we want to have some flexibility on the > client side, who can turn such incidents into exceptions if he/she > wants to, but the default would be to only warn. > > I'm not sure but as I understood the changes to RootI every object > has the ability to turn warn() into throw() by saying > $obj->verbose(2). Is that right, and if so, do people agree that > this fulfills the requirements in SeqIO warn/throw flexibility > (which implies that the SeqIO code only warn()s). I agree. So the SeqIO code should ->warn in recoverable positions and ->throw on utterly non-recoverable positions > > If people agree, this point becomes light green. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From paul-christophe.varoutas@curie.fr Mon Jan 29 00:12:49 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Mon, 29 Jan 2001 01:12:49 +0100 Subject: [Bioperl-l] Vector.pm commit Message-ID: <5.0.2.1.2.20010128232227.00b38e90@pop.wanadoo.fr> cvs checkedout under cygwin (win2000). I noticed that when I make test under perl 5.6.1 / cygwin I have these 2 lines appearing a *lot* of times: Ambiguous call resolved as CORE::shift(), qualify as such or use & at blib/lib/B io/Root/Vector.pm line 948. Ambiguous call resolved as CORE::shift(), qualify as such or use & at blib/lib/B io/Root/Vector.pm line 973. I guess this is because there is a shift() sub in Vector.pm, line 722: #--------- sub shift { #--------- my($self,%param) = @_; $self = $self->first(); $self = $self->remove(%param); } Lines 948 and 973 are of this type: #------------- sub valid_any { #------------- my $self = shift; ... I didn't have these warnings when I make tested with perl 5.004_04 / SunOS 5.6. I replaced $self = shift; by $self = &shift(@_); (thanks Ewan) in both lines. Seems to be OK with perl 5.6.1 / cygwin and perl 5.004_04 / SunOS 5.6. paulc From krbou@pgsgent.be Mon Jan 29 07:55:13 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Mon, 29 Jan 2001 08:55:13 +0100 Subject: [Bioperl-l] all tests but LiveSeq.t pass In-Reply-To: ; from birney@ebi.ac.uk on Sun, Jan 28, 2001 at 02:29:52PM +0000 References: Message-ID: <20010129085513.C6855@gryzo.pgsgent.be> Quoting Ewan Birney (birney@ebi.ac.uk): > > All tests but LiveSeq.t pass. Jason - I am going to start looking at your > sensational Location stuff to give it another pair of eyes over the code. > > > I still need to look at getting RichSeq or something similar in... > Will this make it into 0.7, or should I have a go at cleaning up some of the Swiss-Prot issues I found. Kris, From paul-christophe.varoutas@curie.fr Mon Jan 29 10:28:02 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Mon, 29 Jan 2001 11:28:02 +0100 Subject: [Bioperl-l] RetrictionEnzyme.pm: a proposal Message-ID: <5.0.2.1.2.20010129100143.00b32138@pop.wanadoo.fr> Yesterday I studied RestrictionEnzyme.pm more in depth. I haven't yet added the methods I wanted to, because in my opinion it is far more urgent for this module to get some redesigning. The module somewhat suffers of poor design, and just adding methods to it will just worsen the situation. RestrictionEnzyme has methods which are proper to the restriction enzymes: - seq() is the accessor method to the enzyme's recognition sequence. - cut_seq() "cuts" a Bio::Seq-derived object and generates an array of restriction site fragments. - cuts_seq_at() does the same but this time generates an array of restriction site coordinates. and methods which are proper to the list of enzymes: - is_available() says if a particular enzyme is in the list. - available_list() gives the list of all enzymes or list of n-base cutters. Steve Chervitz already suggested in the module's documentation that is_available() "may be more appropriate for a REData.pm class", and I share his opinion. From a conceptual point of view, the existing RestrictionEnzyme.pm module corresponds to two object classes, not one. Here is an outline of my proposal: Separate RestrictionEnzyme in two classes: RestrictionEnzymeDBase (or whatever more appropriate): - members: the list of restriction enzymes. - methods: - constructor using hardwired list of enzymes OR user file OR URL. - add/remove enzyme to/from list (adding will be the equivalent of _make_custom() ). - member accessor methods: already existing methods: is_available(), available_list(). RestrictionEnzyme: - members: the same as now (_name, _seq, _site, _cuts_after). - methods: - constructor (equivalent to the constructor calling the _make_standard() sub). - already existing accessor methods. - already existing methods: cut_seq, cuts_seq_at, etc. This design, apart from being more "correct", will facilitate any future extensions of the two modules. The drawback in separating RestrictionEnzyme in two classes is that all code using RestrictionEnzyme.pm will have to be modified. Perhaps we should take advantage of the imminent release of the 0.7 version and decide to proceed in the redesigning. If we change the design this will also be the opportunity to slightly change/extend its public interface to add small new functionalities such as being able to add and use asymmetric cutters and enzymes which cut outside the recognition site (perhaps just incorporating small changes now in order to be in time for the 0.7 release and leaving extensions for afterwards, especially if I do this alone based on what we decide). Tell me what you think about it: - First of all, is redesigning possible or are we obliged to maintain compatibility ? In the latter case I will just add functionality, maintaining the poor design of the module. - If redesigning is possible, please make comments/suggestions. Paul-Christophe From jason@chg.mc.duke.edu Mon Jan 29 14:02:33 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 09:02:33 -0500 (EST) Subject: [Bioperl-l] flexible warning/exception in SeqIO In-Reply-To: Message-ID: On Sun, 28 Jan 2001, Ewan Birney wrote: > On Sun, 28 Jan 2001, Hilmar Lapp wrote: > > > This is on our tasklist. To reiterate briefly the background, we > > had a discussion a while ago that there are many applications > > which would rather lose an entry of a databank file or a feature > > of an entry than choking due to an exception being thrown. The > > reason for such exceptions are entries which are either > > misformatted or contain syntax not yet understood by BioPerl > > (there will be significantly less though due to the new location > > model). > > > > The conclusion was that we want to have some flexibility on the > > client side, who can turn such incidents into exceptions if he/she > > wants to, but the default would be to only warn. > > > > I'm not sure but as I understood the changes to RootI every object > > has the ability to turn warn() into throw() by saying > > $obj->verbose(2). Is that right, and if so, do people agree that > > this fulfills the requirements in SeqIO warn/throw flexibility > > (which implies that the SeqIO code only warn()s). > > I agree. So the SeqIO code should ->warn in recoverable positions and > ->throw on utterly non-recoverable positions This is exactly what I think as well. It gives the most flexibility. I think with RichSeq we can handle things like parsing optional qualifiers (bug #160 -- PID) from GenBank format and any other lost features. > > > > > > If people agree, this point becomes light green. Grun ist gut. > > > > Hilmar > > -- > > ----------------------------------------------------------------- > > Hilmar Lapp email: hlapp@gmx.net > > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > > ----------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Mon Jan 29 17:24:03 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 09:24:03 -0800 Subject: [Bioperl-l] Genscan exon frame computation Message-ID: <3A75A733.AA342904@gmx.net> A revisit of this is on the task list. I had a discussion a while ago with Mark Dalphin, because he claimed that he managed to figured out the exon frame based on start coordinate and frame value. I still don't fully understand his code sample, as he was also using his own definition of frame. Still, the discussion let me see how one can figure out the frame. I've enclosed the relevant code section of my implementation below. Whoever feels in the position please review and double-check. This will add a frame attribute to each individual exon, which makes it possible to deliberately shuffle exons from one prediction (for those who aren't aware: Genscan with default parameters outputs only exons in the 'optimal path'; there may be other exons which also achieve very good scores and the output of which can be triggered by -subopt). Things still to do in this respect comprise of a rigorous test (take all exons of each prediction, translate them individually in the frame they've been assigned, and check that there are no intervening stops) and an adaptation of cds() in GeneStructure.pm (when concatenating exons, make sure that the frame of one and frame/phase of the previous match, and if not, fill with Ns). If anyone volunteers to add the test to Genpred.t I'd be really glad. This does not involve module design, just plain application coding, and anyone literate in Perl/Bioperl should be able to jump in here. Comments welcome, esp. regarding the cds() comment I made above. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- # Figure out the frame of this exon. This is NOT the frame # given by Genscan, which is the absolute frame of the base # starting the first predicted complete codon. By comparing # to the absolute frame of the first base we can compute the # offset of the first complete codon to the first base of the # exon, which determines the frame of the exon. my $cod_offset; if($predobj->strand() == 1) { $cod_offset = $flds[6] - (($predobj->start()-1) % 3); # Possible values are -2, -1, 0, 1, 2. -1 and -2 correspond # to offsets 2 and 1, resp. Offset 3 is the same as 0. $cod_offset += 3 if($cod_offset < 1); } else { # On the reverse strand the Genscan frame also refers to # the first base of the first complete codon, but viewed # from forward, which is the third base viewed from # reverse. # Note that end() is in fact start() here because we always # annotate in forward direction (otherwise we wouldn't need # strand()). $cod_offset = $flds[6] - (($predobj->end()-3) % 3); # Possible values are -2, -1, 0, 1, 2. Due to the reverse # situation, {2,-1} and {1,-2} correspond to offsets # 1 and 2, resp. Offset 3 is the same as 0. $cod_offset -= 3 if($cod_offset >= 0); $cod_offset = -$cod_offset; } # Offsets 2 and 1 correspond to frame 1 and 2 (frame of exon # is the frame of the first base relative to the exon, or the # number of bases the first codon is missing). $predobj->frame(3 - $cod_offset); From insana@ebi.ac.uk Mon Jan 29 17:48:19 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Mon, 29 Jan 2001 17:48:19 +0000 (GMT) Subject: [Bioperl-l] LiveSeq back working In-Reply-To: Message-ID: LiveSeq is back working now. The BioPerl loader was not working anymore because of the SplitLocation change. It was using the subfeature method. Joseph Insana From mwilkinson@gene.pbi.nrc.ca Mon Jan 29 17:38:24 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Mon, 29 Jan 2001 11:38:24 -0600 Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm Message-ID: <3A75AA90.1C3F92EC@gene.pbi.nrc.ca> Dear Group, I just cvs-updated and noticed that SeqFeature::Generic does not appear to be functional anymore. It is calling on Bio/Location/Simple.pm (line 122), which apparently does not exist. Is it just my installation which is wonky, or is this a genuine bug? any advice appreciated. cheers all! M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From jason@chg.mc.duke.edu Mon Jan 29 18:05:34 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 13:05:34 -0500 (EST) Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm In-Reply-To: <3A75AA90.1C3F92EC@gene.pbi.nrc.ca> Message-ID: you need to do % cvs update -d to get newly created directories. On Mon, 29 Jan 2001, Mark Wilkinson wrote: > Dear Group, > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > to be functional anymore. It is calling on Bio/Location/Simple.pm > (line 122), which apparently does not exist. Is it just my installation > which is wonky, or is this a genuine bug? > > any advice appreciated. > > cheers all! > > M > > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Mon Jan 29 18:05:59 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 13:05:59 -0500 (EST) Subject: [Bioperl-l] LiveSeq back working In-Reply-To: Message-ID: Thanks for fixing this, I wasn't sure where to go to look. On Mon, 29 Jan 2001, Joseph Insana wrote: > LiveSeq is back working now. > The BioPerl loader was not working anymore because of the SplitLocation > change. It was using the subfeature method. > > Joseph Insana > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Mon Jan 29 18:14:14 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Mon, 29 Jan 2001 18:14:14 +0000 (GMT) Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm In-Reply-To: <3A75AA90.1C3F92EC@gene.pbi.nrc.ca> Message-ID: On Mon, 29 Jan 2001, Mark Wilkinson wrote: > Dear Group, > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > to be functional anymore. It is calling on Bio/Location/Simple.pm > (line 122), which apparently does not exist. Is it just my installation > which is wonky, or is this a genuine bug? cvs update -d > > any advice appreciated. > > cheers all! > > M > > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From darrochi@dcs.gla.ac.uk Mon Jan 29 18:40:07 2001 From: darrochi@dcs.gla.ac.uk (Iain Darroch) Date: Mon, 29 Jan 2001 18:40:07 +0000 (GMT) Subject: [Bioperl-l] Bio Framework and XML Message-ID: Hi All, I am currently looking at ways of integrating biological systems. I saw mentioned in some of the documentation that a Bio-Object Framework was proposed. Also that XML could be used in meta data for describing bioinformatics objects. I was wondering what the current situation of both these were. Has anyone implemented parsers yet? Thanks in advance Iain From jason@chg.mc.duke.edu Mon Jan 29 19:33:59 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 14:33:59 -0500 (EST) Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature Message-ID: What is the feeling here, we have this old way of doing things which included using the value 'EXPAND' to determine if we should expand the start/end space for a feature when adding a sub_SeqFeature to a feature? I think this should likely be better modeled through a SplitLocationI which is just a container of LocationObjects. So I propose to remove all references to 'EXPAND' which means removing the method _expand_region and updating add_sub_Feature to deal with adding the locations. Similarly the flush_sub_SeqFeature should flush the locations, but I'm not sure about what the start/end should be reset to... I also had to update FeaturePair to add the method location() which delegates to feature1()->location() otherwise things won't work correctly. start/end are defined by feature1 object so location should also reside in feature1. Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Mon Jan 29 21:09:40 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Mon, 29 Jan 2001 13:09:40 -0800 Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature References: Message-ID: <3A75DC14.529E0072@gnf.org> Jason Stajich wrote: > > What is the feeling here, we have this old way of doing things which > included using the value 'EXPAND' to determine if we should expand the > start/end space for a feature when adding a sub_SeqFeature to a feature? > > I think this should likely be better modeled through a SplitLocationI > which is just a container of LocationObjects. So I propose to remove all > references to 'EXPAND' which means removing the method _expand_region and > updating add_sub_Feature to deal with adding the locations. Similarly Can't we keep a separate method for coping with region extension due to a new subfeature, in whatever way the extension is done? As far as I can remember I had a good reason to put it into its own method, I needed it separately from add_sub_SeqFeature(). Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason@chg.mc.duke.edu Mon Jan 29 21:26:21 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 16:26:21 -0500 (EST) Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature In-Reply-To: <3A75DC14.529E0072@gnf.org> Message-ID: On Mon, 29 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > What is the feeling here, we have this old way of doing things which > > included using the value 'EXPAND' to determine if we should expand the > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > I think this should likely be better modeled through a SplitLocationI > > which is just a container of LocationObjects. So I propose to remove all > > references to 'EXPAND' which means removing the method _expand_region and > > updating add_sub_Feature to deal with adding the locations. Similarly > > Can't we keep a separate method for coping with region extension due > to a new subfeature, in whatever way the extension is done? As far as > I can remember I had a good reason to put it into its own method, I > needed it separately from add_sub_SeqFeature(). I guess it is more sane to let SeqFeature::Generic handle the common case and the split location case will need to be handled elsewhere. In the special case of a feature with multiple locations that feature (or object creating it) will take care of updating the location object to point to a splitlocation object. For example, if we choose to have CDS be represented as a SplitLocation with the exons being the parts in the join(...) statement. This will have to be negotiated by the object creating the Gene/CDS object. Okay so no changes to check in for Generic. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Mon Jan 29 21:57:13 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Mon, 29 Jan 2001 13:57:13 -0800 Subject: [Bioperl-l] RetrictionEnzyme.pm: a proposal References: <5.0.2.1.2.20010129100143.00b32138@pop.wanadoo.fr> Message-ID: <3A75E739.3E0EC94E@gnf.org> Paul-Christophe Varoutas wrote: > > Tell me what you think about it: > - First of all, is redesigning possible or are we obliged to maintain > compatibility ? In the latter case I will just add functionality, > maintaining the poor design of the module. > - If redesigning is possible, please make comments/suggestions. > First of all, keeping compatibility is a very good thing. Every user of your software will appreciate it if he/she knows that this is taken seriously. In general, my opinion is if there's no strong reason to break compatibility, then don't break it. On the other hand, if there is a good reason, then don't hesitate. This means, yes, redesigning is possible, but a nicer design by itself is not a good reason to break compatibility. If the existing design is sort of prohibitive for adding certain new functionality, this might justify breaking compatibility. An example is the new location model, but in fact Jason could manage to keep compatibility. I suggest that you carefully examine whether you indeed can't redesign and at the same time keep compatibility. Based on your proposal I don't see the prohibitive point yet. As for the release, this issue is not on the task list, which means that you are on your own. There's a deadline next week, and we don't want to lose focus. If you finish the code and submit an accompanying rigorous test in t/* on time, it can make it into the release though, provided that there are no objections should you introduce incompatibilities. As a last remark, a design that isn't prepared very well for an extension one has in mind is not necessarily poor. It may just have been perfect for its original scope. And: I really think that there is no such thing as a "correct" design. Design may be bad or may be good, generic or tailored, or whatever, it just depends on your viewpoint, that is, on the particular problem you want to solve. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 07:02:45 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 23:02:45 -0800 Subject: [Bioperl-l] missing use statements References: <5.0.2.1.2.20010129204942.00b32638@mailhost.curie.fr> Message-ID: <3A766715.45FA0EEC@gmx.net> Paul-Christophe Varoutas wrote: > > so I just added one line at the beginning of the module to load Bio::Seq: > > use Bio::Seq; > Thanks for pointing this out. The reason this became necessary all of a sudden was probably that I removed the respective lines from SeqIO.pm, because there was no obvious reason to keep them. Since I still think that the 'use' statements are better in those files where the modules are really used, I left it that way and added the necessary use statements to all other SeqIO modules (which probably would all have complained sooner or later). > and edited the @ISA array initialization line: > > @ISA = qw(Bio::SeqIO Bio::Seq); > We don't want SeqIO modules to inherit from Bio::Seq. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 07:04:21 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 23:04:21 -0800 Subject: [Bioperl-l] Root::Object in bioxml.pm Message-ID: <3A766775.F21CDB21@gmx.net> SeqIO::bioxml.pm still inherits from Root::Object. Is there a particular reason that this one's an exception? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 07:10:26 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 23:10:26 -0800 Subject: [Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/888 References: <200101291951.f0TJptp29320@pw600a.bioperl.org> Message-ID: <3A7668E2.5E40F6A5@gmx.net> bioperl-bugs@bioperl.org wrote: > > Generic Features created from a GFF string do not > record Frame information, and when dumping the feature > out as GFF it is invariably reported as frame = 0. > > The problem is multi-fold: > > (1) the _from_gff_string and _from_gff2_string > subroutines in Generic.pm do not contain any code to handle the > recording of Frame information in the feature object > > (2) GFF allows a "." as the frame (meaning info not available), > while $Feature only allows 0,1, or 2. Thus it isn't clear how a > GFF frame of "." should be recorded. My first thought was that a > value of undef might return "." in a call to SeqFeatureI::gff_string, > however... > > (3) ...it appears that even if there is no frame information > available in a Feature object, it nevertheless passes the > $Feature->can('frame') test in SeqFeatureI::gff_string > and returns a (default??) value of 0 for the $Feature->frame call > (though there *is* code there to assign the frame to > "." if it fails the ->can test...) > > I am willing to fix this problem myself, but I would appreciate having > a consensus from the group about which level of the problem needs to be > fixed to keep everyone else's code happy. > I think that frame information should be consistent between GFF representation and object representation. '.' is equivalent to undef, and otherwise the frame should be 0, 1, or 2, regardless of object or GFF string. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Tue Jan 30 09:14:42 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 09:14:42 +0000 (GMT) Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature In-Reply-To: Message-ID: On Mon, 29 Jan 2001, Jason Stajich wrote: > What is the feeling here, we have this old way of doing things which > included using the value 'EXPAND' to determine if we should expand the > start/end space for a feature when adding a sub_SeqFeature to a feature? > > I think this should likely be better modeled through a SplitLocationI > which is just a container of LocationObjects. So I propose to remove all > references to 'EXPAND' which means removing the method _expand_region and > updating add_sub_Feature to deal with adding the locations. Similarly the > flush_sub_SeqFeature should flush the locations, but I'm not sure about > what the start/end should be reset to... I guess agree (I am wincing at every one of these decisions you know. It just pains me to see us have to handle this object complexity in essentially simple objects. Bugger-it! I know there is no way out here, but .... it goes against the grain). > > I also had to update FeaturePair to add the method location() which > delegates to feature1()->location() otherwise things won't work correctly. > start/end are defined by feature1 object so location should also reside > in feature1. > That is the consistent route here... > Jason > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Tue Jan 30 09:41:50 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 09:41:50 +0000 (GMT) Subject: [Bioperl-l] Root::Object in bioxml.pm In-Reply-To: <3A766775.F21CDB21@gmx.net> Message-ID: On Mon, 29 Jan 2001, Hilmar Lapp wrote: > SeqIO::bioxml.pm still inherits from Root::Object. Is there a > particular reason that this one's an exception? > I think this is a dead object? Brad.....??? > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 30 13:48:16 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 30 Jan 2001 08:48:16 -0500 (EST) Subject: [Bioperl-l] Re: Root::Object in bioxml.pm In-Reply-To: <3A766775.F21CDB21@gmx.net> Message-ID: I skipped it because I thought it was to be remove for the release, Brad Marshall would know. On Mon, 29 Jan 2001, Hilmar Lapp wrote: > SeqIO::bioxml.pm still inherits from Root::Object. Is there a > particular reason that this one's an exception? > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From ydzhang@iastate.edu Tue Jan 30 14:44:25 2001 From: ydzhang@iastate.edu (Yuandan Zhang) Date: Tue, 30 Jan 2001 08:44:25 -0600 Subject: [Bioperl-l] Re: Bioperl-l digest, Vol 1 #200 - 15 msgs In-Reply-To: <200101300917.f0U9HLp20264@pw600a.bioperl.org> Message-ID: <4.2.0.58.20010130084139.00ad6560@ydzhang.mail.iastate.edu> Hi, I am new to bioperl, very patinate in it. Is there any tutorial materials available or any collection of example scripts for beginners to make a start? Yuandan At 04:17 AM 1/30/01 -0500, you wrote: >Send Bioperl-l mailing list submissions to > bioperl-l@bioperl.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://bioperl.org/mailman/listinfo/bioperl-l >or, via email, send a message with subject or body 'help' to > bioperl-l-request@bioperl.org > >You can reach the person managing the list at > bioperl-l-admin@bioperl.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Bioperl-l digest..." > > >Today's Topics: > > 1. Genscan exon frame computation (Hilmar Lapp) > 2. LiveSeq back working (Joseph Insana) > 3. SeqFeature::Generic broken? no Location::Simple.pm (Mark Wilkinson) > 4. Re: SeqFeature::Generic broken? no Location::Simple.pm (Jason Stajich) > 5. Re: LiveSeq back working (Jason Stajich) > 6. Re: SeqFeature::Generic broken? no Location::Simple.pm (Ewan Birney) > 7. Bio Framework and XML (Iain Darroch) > 8. Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > 9. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Hilmar Lapp) > 10. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > 11. Re: RetrictionEnzyme.pm: a proposal (Hilmar Lapp) > 12. Re: missing use statements (Hilmar Lapp) > 13. Root::Object in bioxml.pm (Hilmar Lapp) > 14. Re: [Bioperl-guts-l] Notification: incoming/888 (Hilmar Lapp) > 15. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Ewan Birney) > >--__--__-- > >Message: 1 >Date: Mon, 29 Jan 2001 09:24:03 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Bioperl >Subject: [Bioperl-l] Genscan exon frame computation > >A revisit of this is on the task list. I had a discussion a while >ago with Mark Dalphin, because he claimed that he managed to >figured out the exon frame based on start coordinate and frame >value. > >I still don't fully understand his code sample, as he was also >using his own definition of frame. Still, the discussion let me >see how one can figure out the frame. I've enclosed the relevant >code section of my implementation below. Whoever feels in the >position please review and double-check. > >This will add a frame attribute to each individual exon, which >makes it possible to deliberately shuffle exons from one >prediction (for those who aren't aware: Genscan with default >parameters outputs only exons in the 'optimal path'; there may be >other exons which also achieve very good scores and the output of >which can be triggered by -subopt). > >Things still to do in this respect comprise of a rigorous test >(take all exons of each prediction, translate them individually in >the frame they've been assigned, and check that there are no >intervening stops) and an adaptation of cds() in GeneStructure.pm >(when concatenating exons, make sure that the frame of one and >frame/phase of the previous match, and if not, fill with Ns). > >If anyone volunteers to add the test to Genpred.t I'd be really >glad. This does not involve module design, just plain application >coding, and anyone literate in Perl/Bioperl should be able to jump >in here. > >Comments welcome, esp. regarding the cds() comment I made above. > > Hilmar >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > ># Figure out the frame of this exon. This is NOT the frame ># given by Genscan, which is the absolute frame of the base ># starting the first predicted complete codon. By comparing ># to the absolute frame of the first base we can compute the ># offset of the first complete codon to the first base of the ># exon, which determines the frame of the exon. >my $cod_offset; >if($predobj->strand() == 1) { > $cod_offset = $flds[6] - (($predobj->start()-1) % 3); > # Possible values are -2, -1, 0, 1, 2. -1 and -2 correspond > # to offsets 2 and 1, resp. Offset 3 is the same as 0. > $cod_offset += 3 if($cod_offset < 1); >} else { > # On the reverse strand the Genscan frame also refers to > # the first base of the first complete codon, but viewed > # from forward, which is the third base viewed from > # reverse. > # Note that end() is in fact start() here because we always > # annotate in forward direction (otherwise we wouldn't need > # strand()). > $cod_offset = $flds[6] - (($predobj->end()-3) % 3); > # Possible values are -2, -1, 0, 1, 2. Due to the reverse > # situation, {2,-1} and {1,-2} correspond to offsets > # 1 and 2, resp. Offset 3 is the same as 0. > $cod_offset -= 3 if($cod_offset >= 0); > $cod_offset = -$cod_offset; >} ># Offsets 2 and 1 correspond to frame 1 and 2 (frame of exon ># is the frame of the first base relative to the exon, or the ># number of bases the first codon is missing). >$predobj->frame(3 - $cod_offset); > >--__--__-- > >Message: 2 >Date: Mon, 29 Jan 2001 17:48:19 +0000 (GMT) >From: Joseph Insana >Reply-To: insana@ebi.ac.uk >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] LiveSeq back working > >LiveSeq is back working now. >The BioPerl loader was not working anymore because of the SplitLocation >change. It was using the subfeature method. > >Joseph Insana > > >--__--__-- > >Message: 3 >Date: Mon, 29 Jan 2001 11:38:24 -0600 >From: Mark Wilkinson >Organization: PBI-NRC >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > >Dear Group, > >I just cvs-updated and noticed that SeqFeature::Generic does not appear >to be functional anymore. It is calling on Bio/Location/Simple.pm >(line 122), which apparently does not exist. Is it just my installation >which is wonky, or is this a genuine bug? > >any advice appreciated. > >cheers all! > >M > > >-- >--- >Dr. Mark Wilkinson >Bioinformatics Group >National Research Council of Canada >Plant Biotechnology Institute >110 Gymnasium Place >Saskatoon, SK >Canada > > > > >--__--__-- > >Message: 4 >Date: Mon, 29 Jan 2001 13:05:34 -0500 (EST) >From: Jason Stajich >To: Mark Wilkinson >cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > >you need to do >% cvs update -d >to get newly created directories. > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > Dear Group, > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > (line 122), which apparently does not exist. Is it just my installation > > which is wonky, or is this a genuine bug? > > > > any advice appreciated. > > > > cheers all! > > > > M > > > > > > -- > > --- > > Dr. Mark Wilkinson > > Bioinformatics Group > > National Research Council of Canada > > Plant Biotechnology Institute > > 110 Gymnasium Place > > Saskatoon, SK > > Canada > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > >--__--__-- > >Message: 5 >Date: Mon, 29 Jan 2001 13:05:59 -0500 (EST) >From: Jason Stajich >To: Joseph Insana >cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] LiveSeq back working > >Thanks for fixing this, I wasn't sure where to go to look. > >On Mon, 29 Jan 2001, Joseph Insana wrote: > > > LiveSeq is back working now. > > The BioPerl loader was not working anymore because of the SplitLocation > > change. It was using the subfeature method. > > > > Joseph Insana > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > >--__--__-- > >Message: 6 >Date: Mon, 29 Jan 2001 18:14:14 +0000 (GMT) >From: Ewan Birney >To: Mark Wilkinson >cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > Dear Group, > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > (line 122), which apparently does not exist. Is it just my installation > > which is wonky, or is this a genuine bug? > > >cvs update -d > > > > > > any advice appreciated. > > > > cheers all! > > > > M > > > > > > -- > > --- > > Dr. Mark Wilkinson > > Bioinformatics Group > > National Research Council of Canada > > Plant Biotechnology Institute > > 110 Gymnasium Place > > Saskatoon, SK > > Canada > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >----------------------------------------------------------------- >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 >. >----------------------------------------------------------------- > > >--__--__-- > >Message: 7 >Date: Mon, 29 Jan 2001 18:40:07 +0000 (GMT) >From: Iain Darroch >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] Bio Framework and XML > >Hi All, > >I am currently looking at ways of integrating biological systems. I saw >mentioned in some of the documentation that a Bio-Object Framework was >proposed. Also that XML could be used in meta data for describing >bioinformatics objects. > >I was wondering what the current situation of both these were. > >Has anyone implemented parsers yet? > >Thanks in advance > >Iain > > > > >--__--__-- > >Message: 8 >Date: Mon, 29 Jan 2001 14:33:59 -0500 (EST) >From: Jason Stajich >To: Bioperl >Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >What is the feeling here, we have this old way of doing things which >included using the value 'EXPAND' to determine if we should expand the >start/end space for a feature when adding a sub_SeqFeature to a feature? > >I think this should likely be better modeled through a SplitLocationI >which is just a container of LocationObjects. So I propose to remove all >references to 'EXPAND' which means removing the method _expand_region and >updating add_sub_Feature to deal with adding the locations. Similarly the >flush_sub_SeqFeature should flush the locations, but I'm not sure about >what the start/end should be reset to... > >I also had to update FeaturePair to add the method location() which >delegates to feature1()->location() otherwise things won't work correctly. >start/end are defined by feature1 object so location should also reside >in feature1. > >Jason > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > > >--__--__-- > >Message: 9 >Date: Mon, 29 Jan 2001 13:09:40 -0800 >From: Hilmar Lapp >Organization: GNF >To: Jason Stajich >Cc: Bioperl >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >Jason Stajich wrote: > > > > What is the feeling here, we have this old way of doing things which > > included using the value 'EXPAND' to determine if we should expand the > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > I think this should likely be better modeled through a SplitLocationI > > which is just a container of LocationObjects. So I propose to remove all > > references to 'EXPAND' which means removing the method _expand_region and > > updating add_sub_Feature to deal with adding the locations. Similarly > >Can't we keep a separate method for coping with region extension due >to a new subfeature, in whatever way the extension is done? As far as >I can remember I had a good reason to put it into its own method, I >needed it separately from add_sub_SeqFeature(). > > Hilmar >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp@gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > >--__--__-- > >Message: 10 >Date: Mon, 29 Jan 2001 16:26:21 -0500 (EST) >From: Jason Stajich >To: Hilmar Lapp >cc: Bioperl >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >On Mon, 29 Jan 2001, Hilmar Lapp wrote: > > > Jason Stajich wrote: > > > > > > What is the feeling here, we have this old way of doing things which > > > included using the value 'EXPAND' to determine if we should expand the > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > I think this should likely be better modeled through a SplitLocationI > > > which is just a container of LocationObjects. So I propose to remove all > > > references to 'EXPAND' which means removing the method _expand_region and > > > updating add_sub_Feature to deal with adding the locations. Similarly > > > > Can't we keep a separate method for coping with region extension due > > to a new subfeature, in whatever way the extension is done? As far as > > I can remember I had a good reason to put it into its own method, I > > needed it separately from add_sub_SeqFeature(). > >I guess it is more sane to let SeqFeature::Generic handle the common case >and the split location case will need to be handled elsewhere. > >In the special case of a feature with multiple locations that feature (or >object creating it) will take care of updating the location object to >point to a splitlocation object. For example, if we choose to have CDS be >represented as a SplitLocation with the exons being the parts in the >join(...) statement. This will have to be negotiated by the object >creating the Gene/CDS object. > >Okay so no changes to check in for Generic. > > > > > Hilmar > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp@gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > >--__--__-- > >Message: 11 >Date: Mon, 29 Jan 2001 13:57:13 -0800 >From: Hilmar Lapp >Organization: GNF >To: Paul-Christophe Varoutas >Cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] RetrictionEnzyme.pm: a proposal > >Paul-Christophe Varoutas wrote: > > > > Tell me what you think about it: > > - First of all, is redesigning possible or are we obliged to maintain > > compatibility ? In the latter case I will just add functionality, > > maintaining the poor design of the module. > > - If redesigning is possible, please make comments/suggestions. > > > >First of all, keeping compatibility is a very good thing. Every user >of your software will appreciate it if he/she knows that this is taken >seriously. > >In general, my opinion is if there's no strong reason to break >compatibility, then don't break it. On the other hand, if there is a >good reason, then don't hesitate. > >This means, yes, redesigning is possible, but a nicer design by itself >is not a good reason to break compatibility. If the existing design is >sort of prohibitive for adding certain new functionality, this might >justify breaking compatibility. An example is the new location model, >but in fact Jason could manage to keep compatibility. I suggest that >you carefully examine whether you indeed can't redesign and at the >same time keep compatibility. Based on your proposal I don't see the >prohibitive point yet. > >As for the release, this issue is not on the task list, which means >that you are on your own. There's a deadline next week, and we don't >want to lose focus. If you finish the code and submit an accompanying >rigorous test in t/* on time, it can make it into the release though, >provided that there are no objections should you introduce >incompatibilities. > >As a last remark, a design that isn't prepared very well for an >extension one has in mind is not necessarily poor. It may just have >been perfect for its original scope. And: I really think that there is >no such thing as a "correct" design. Design may be bad or may be good, >generic or tailored, or whatever, it just depends on your viewpoint, >that is, on the particular problem you want to solve. > > Hilmar >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp@gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > >--__--__-- > >Message: 12 >Date: Mon, 29 Jan 2001 23:02:45 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Bioperl >Subject: Re: [Bioperl-l] missing use statements > >Paul-Christophe Varoutas wrote: > > > > so I just added one line at the beginning of the module to load Bio::Seq: > > > > use Bio::Seq; > > > >Thanks for pointing this out. The reason this became necessary all >of a sudden was probably that I removed the respective lines from >SeqIO.pm, because there was no obvious reason to keep them. Since >I still think that the 'use' statements are better in those files >where the modules are really used, I left it that way and added >the necessary use statements to all other SeqIO modules (which >probably would all have complained sooner or later). > > > and edited the @ISA array initialization line: > > > > @ISA = qw(Bio::SeqIO Bio::Seq); > > > >We don't want SeqIO modules to inherit from Bio::Seq. > > Hilmar >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > >--__--__-- > >Message: 13 >Date: Mon, 29 Jan 2001 23:04:21 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Bioperl >CC: Jason Stajich >Subject: [Bioperl-l] Root::Object in bioxml.pm > >SeqIO::bioxml.pm still inherits from Root::Object. Is there a >particular reason that this one's an exception? > > Hilmar >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > >--__--__-- > >Message: 14 >Date: Mon, 29 Jan 2001 23:10:26 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Mark Wilkinson >CC: bioperl-l@bioperl.org >Subject: [Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/888 > >bioperl-bugs@bioperl.org wrote: > > > > Generic Features created from a GFF string do not > > record Frame information, and when dumping the feature > > out as GFF it is invariably reported as frame = 0. > > > > The problem is multi-fold: > > > > (1) the _from_gff_string and _from_gff2_string > > subroutines in Generic.pm do not contain any code to handle the > > recording of Frame information in the feature object > > > > (2) GFF allows a "." as the frame (meaning info not available), > > while $Feature only allows 0,1, or 2. Thus it isn't clear how a > > GFF frame of "." should be recorded. My first thought was that a > > value of undef might return "." in a call to SeqFeatureI::gff_string, > > however... > > > > (3) ...it appears that even if there is no frame information > > available in a Feature object, it nevertheless passes the > > $Feature->can('frame') test in SeqFeatureI::gff_string > > and returns a (default??) value of 0 for the $Feature->frame call > > (though there *is* code there to assign the frame to > > "." if it fails the ->can test...) > > > > I am willing to fix this problem myself, but I would appreciate having > > a consensus from the group about which level of the problem needs to be > > fixed to keep everyone else's code happy. > > > >I think that frame information should be consistent between GFF >representation and object representation. '.' is equivalent to >undef, and otherwise the frame should be 0, 1, or 2, regardless of >object or GFF string. > > Hilmar > >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > >--__--__-- > >Message: 15 >Date: Tue, 30 Jan 2001 09:14:42 +0000 (GMT) >From: Ewan Birney >To: Jason Stajich >cc: Bioperl >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >On Mon, 29 Jan 2001, Jason Stajich wrote: > > > What is the feeling here, we have this old way of doing things which > > included using the value 'EXPAND' to determine if we should expand the > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > I think this should likely be better modeled through a SplitLocationI > > which is just a container of LocationObjects. So I propose to remove all > > references to 'EXPAND' which means removing the method _expand_region and > > updating add_sub_Feature to deal with adding the locations. Similarly the > > flush_sub_SeqFeature should flush the locations, but I'm not sure about > > what the start/end should be reset to... > >I guess agree (I am wincing at every one of these decisions you know. It >just pains me to see us have to handle this object complexity in >essentially simple objects. Bugger-it! I know there is no way out here, >but .... it goes against the grain). > > > > > I also had to update FeaturePair to add the method location() which > > delegates to feature1()->location() otherwise things won't work correctly. > > start/end are defined by feature1 object so location should also reside > > in feature1. > > > >That is the consistent route here... > > > > Jason > > > > Jason Stajich > > jason@chg.mc.duke.edu > > Center for Human Genetics > > Duke University Medical Center > > http://www.chg.duke.edu/ > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >----------------------------------------------------------------- >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 >. >----------------------------------------------------------------- > > > >--__--__-- > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@bioperl.org >http://bioperl.org/mailman/listinfo/bioperl-l > > >End of Bioperl-l Digest -- Yuandan Zhang From birney@ebi.ac.uk Tue Jan 30 15:08:35 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 15:08:35 +0000 (GMT) Subject: [Bioperl-l] Re: Bioperl-l digest, Vol 1 #200 - 15 msgs In-Reply-To: <4.2.0.58.20010130084139.00ad6560@ydzhang.mail.iastate.edu> Message-ID: On Tue, 30 Jan 2001, Yuandan Zhang wrote: > Hi, > I am new to bioperl, very patinate in it. Is there any tutorial materials > available or any collection of example scripts for beginners to make a start? A tutorial will be availble in the 0.7 release (due to be branched soon). > > Yuandan > > At 04:17 AM 1/30/01 -0500, you wrote: > >Send Bioperl-l mailing list submissions to > > bioperl-l@bioperl.org > > > >To subscribe or unsubscribe via the World Wide Web, visit > > http://bioperl.org/mailman/listinfo/bioperl-l > >or, via email, send a message with subject or body 'help' to > > bioperl-l-request@bioperl.org > > > >You can reach the person managing the list at > > bioperl-l-admin@bioperl.org > > > >When replying, please edit your Subject line so it is more specific > >than "Re: Contents of Bioperl-l digest..." > > > > > >Today's Topics: > > > > 1. Genscan exon frame computation (Hilmar Lapp) > > 2. LiveSeq back working (Joseph Insana) > > 3. SeqFeature::Generic broken? no Location::Simple.pm (Mark Wilkinson) > > 4. Re: SeqFeature::Generic broken? no Location::Simple.pm (Jason Stajich) > > 5. Re: LiveSeq back working (Jason Stajich) > > 6. Re: SeqFeature::Generic broken? no Location::Simple.pm (Ewan Birney) > > 7. Bio Framework and XML (Iain Darroch) > > 8. Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > > 9. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Hilmar Lapp) > > 10. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > > 11. Re: RetrictionEnzyme.pm: a proposal (Hilmar Lapp) > > 12. Re: missing use statements (Hilmar Lapp) > > 13. Root::Object in bioxml.pm (Hilmar Lapp) > > 14. Re: [Bioperl-guts-l] Notification: incoming/888 (Hilmar Lapp) > > 15. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Ewan Birney) > > > >--__--__-- > > > >Message: 1 > >Date: Mon, 29 Jan 2001 09:24:03 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Bioperl > >Subject: [Bioperl-l] Genscan exon frame computation > > > >A revisit of this is on the task list. I had a discussion a while > >ago with Mark Dalphin, because he claimed that he managed to > >figured out the exon frame based on start coordinate and frame > >value. > > > >I still don't fully understand his code sample, as he was also > >using his own definition of frame. Still, the discussion let me > >see how one can figure out the frame. I've enclosed the relevant > >code section of my implementation below. Whoever feels in the > >position please review and double-check. > > > >This will add a frame attribute to each individual exon, which > >makes it possible to deliberately shuffle exons from one > >prediction (for those who aren't aware: Genscan with default > >parameters outputs only exons in the 'optimal path'; there may be > >other exons which also achieve very good scores and the output of > >which can be triggered by -subopt). > > > >Things still to do in this respect comprise of a rigorous test > >(take all exons of each prediction, translate them individually in > >the frame they've been assigned, and check that there are no > >intervening stops) and an adaptation of cds() in GeneStructure.pm > >(when concatenating exons, make sure that the frame of one and > >frame/phase of the previous match, and if not, fill with Ns). > > > >If anyone volunteers to add the test to Genpred.t I'd be really > >glad. This does not involve module design, just plain application > >coding, and anyone literate in Perl/Bioperl should be able to jump > >in here. > > > >Comments welcome, esp. regarding the cds() comment I made above. > > > > Hilmar > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > ># Figure out the frame of this exon. This is NOT the frame > ># given by Genscan, which is the absolute frame of the base > ># starting the first predicted complete codon. By comparing > ># to the absolute frame of the first base we can compute the > ># offset of the first complete codon to the first base of the > ># exon, which determines the frame of the exon. > >my $cod_offset; > >if($predobj->strand() == 1) { > > $cod_offset = $flds[6] - (($predobj->start()-1) % 3); > > # Possible values are -2, -1, 0, 1, 2. -1 and -2 correspond > > # to offsets 2 and 1, resp. Offset 3 is the same as 0. > > $cod_offset += 3 if($cod_offset < 1); > >} else { > > # On the reverse strand the Genscan frame also refers to > > # the first base of the first complete codon, but viewed > > # from forward, which is the third base viewed from > > # reverse. > > # Note that end() is in fact start() here because we always > > # annotate in forward direction (otherwise we wouldn't need > > # strand()). > > $cod_offset = $flds[6] - (($predobj->end()-3) % 3); > > # Possible values are -2, -1, 0, 1, 2. Due to the reverse > > # situation, {2,-1} and {1,-2} correspond to offsets > > # 1 and 2, resp. Offset 3 is the same as 0. > > $cod_offset -= 3 if($cod_offset >= 0); > > $cod_offset = -$cod_offset; > >} > ># Offsets 2 and 1 correspond to frame 1 and 2 (frame of exon > ># is the frame of the first base relative to the exon, or the > ># number of bases the first codon is missing). > >$predobj->frame(3 - $cod_offset); > > > >--__--__-- > > > >Message: 2 > >Date: Mon, 29 Jan 2001 17:48:19 +0000 (GMT) > >From: Joseph Insana > >Reply-To: insana@ebi.ac.uk > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] LiveSeq back working > > > >LiveSeq is back working now. > >The BioPerl loader was not working anymore because of the SplitLocation > >change. It was using the subfeature method. > > > >Joseph Insana > > > > > >--__--__-- > > > >Message: 3 > >Date: Mon, 29 Jan 2001 11:38:24 -0600 > >From: Mark Wilkinson > >Organization: PBI-NRC > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > > > >Dear Group, > > > >I just cvs-updated and noticed that SeqFeature::Generic does not appear > >to be functional anymore. It is calling on Bio/Location/Simple.pm > >(line 122), which apparently does not exist. Is it just my installation > >which is wonky, or is this a genuine bug? > > > >any advice appreciated. > > > >cheers all! > > > >M > > > > > >-- > >--- > >Dr. Mark Wilkinson > >Bioinformatics Group > >National Research Council of Canada > >Plant Biotechnology Institute > >110 Gymnasium Place > >Saskatoon, SK > >Canada > > > > > > > > > >--__--__-- > > > >Message: 4 > >Date: Mon, 29 Jan 2001 13:05:34 -0500 (EST) > >From: Jason Stajich > >To: Mark Wilkinson > >cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > > > >you need to do > >% cvs update -d > >to get newly created directories. > > > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > > > Dear Group, > > > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > > (line 122), which apparently does not exist. Is it just my installation > > > which is wonky, or is this a genuine bug? > > > > > > any advice appreciated. > > > > > > cheers all! > > > > > > M > > > > > > > > > -- > > > --- > > > Dr. Mark Wilkinson > > > Bioinformatics Group > > > National Research Council of Canada > > > Plant Biotechnology Institute > > > 110 Gymnasium Place > > > Saskatoon, SK > > > Canada > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > >--__--__-- > > > >Message: 5 > >Date: Mon, 29 Jan 2001 13:05:59 -0500 (EST) > >From: Jason Stajich > >To: Joseph Insana > >cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] LiveSeq back working > > > >Thanks for fixing this, I wasn't sure where to go to look. > > > >On Mon, 29 Jan 2001, Joseph Insana wrote: > > > > > LiveSeq is back working now. > > > The BioPerl loader was not working anymore because of the SplitLocation > > > change. It was using the subfeature method. > > > > > > Joseph Insana > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > >--__--__-- > > > >Message: 6 > >Date: Mon, 29 Jan 2001 18:14:14 +0000 (GMT) > >From: Ewan Birney > >To: Mark Wilkinson > >cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > > > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > > > Dear Group, > > > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > > (line 122), which apparently does not exist. Is it just my installation > > > which is wonky, or is this a genuine bug? > > > > > >cvs update -d > > > > > > > > > > any advice appreciated. > > > > > > cheers all! > > > > > > M > > > > > > > > > -- > > > --- > > > Dr. Mark Wilkinson > > > Bioinformatics Group > > > National Research Council of Canada > > > Plant Biotechnology Institute > > > 110 Gymnasium Place > > > Saskatoon, SK > > > Canada > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >----------------------------------------------------------------- > >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > >. > >----------------------------------------------------------------- > > > > > >--__--__-- > > > >Message: 7 > >Date: Mon, 29 Jan 2001 18:40:07 +0000 (GMT) > >From: Iain Darroch > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] Bio Framework and XML > > > >Hi All, > > > >I am currently looking at ways of integrating biological systems. I saw > >mentioned in some of the documentation that a Bio-Object Framework was > >proposed. Also that XML could be used in meta data for describing > >bioinformatics objects. > > > >I was wondering what the current situation of both these were. > > > >Has anyone implemented parsers yet? > > > >Thanks in advance > > > >Iain > > > > > > > > > >--__--__-- > > > >Message: 8 > >Date: Mon, 29 Jan 2001 14:33:59 -0500 (EST) > >From: Jason Stajich > >To: Bioperl > >Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >What is the feeling here, we have this old way of doing things which > >included using the value 'EXPAND' to determine if we should expand the > >start/end space for a feature when adding a sub_SeqFeature to a feature? > > > >I think this should likely be better modeled through a SplitLocationI > >which is just a container of LocationObjects. So I propose to remove all > >references to 'EXPAND' which means removing the method _expand_region and > >updating add_sub_Feature to deal with adding the locations. Similarly the > >flush_sub_SeqFeature should flush the locations, but I'm not sure about > >what the start/end should be reset to... > > > >I also had to update FeaturePair to add the method location() which > >delegates to feature1()->location() otherwise things won't work correctly. > >start/end are defined by feature1 object so location should also reside > >in feature1. > > > >Jason > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > > > >--__--__-- > > > >Message: 9 > >Date: Mon, 29 Jan 2001 13:09:40 -0800 > >From: Hilmar Lapp > >Organization: GNF > >To: Jason Stajich > >Cc: Bioperl > >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >Jason Stajich wrote: > > > > > > What is the feeling here, we have this old way of doing things which > > > included using the value 'EXPAND' to determine if we should expand the > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > I think this should likely be better modeled through a SplitLocationI > > > which is just a container of LocationObjects. So I propose to remove all > > > references to 'EXPAND' which means removing the method _expand_region and > > > updating add_sub_Feature to deal with adding the locations. Similarly > > > >Can't we keep a separate method for coping with region extension due > >to a new subfeature, in whatever way the extension is done? As far as > >I can remember I had a good reason to put it into its own method, I > >needed it separately from add_sub_SeqFeature(). > > > > Hilmar > >-- > >------------------------------------------------------------- > >Hilmar Lapp email: lapp@gnf.org > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >------------------------------------------------------------- > > > >--__--__-- > > > >Message: 10 > >Date: Mon, 29 Jan 2001 16:26:21 -0500 (EST) > >From: Jason Stajich > >To: Hilmar Lapp > >cc: Bioperl > >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >On Mon, 29 Jan 2001, Hilmar Lapp wrote: > > > > > Jason Stajich wrote: > > > > > > > > What is the feeling here, we have this old way of doing things which > > > > included using the value 'EXPAND' to determine if we should expand the > > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > > > I think this should likely be better modeled through a SplitLocationI > > > > which is just a container of LocationObjects. So I propose to remove all > > > > references to 'EXPAND' which means removing the method _expand_region and > > > > updating add_sub_Feature to deal with adding the locations. Similarly > > > > > > Can't we keep a separate method for coping with region extension due > > > to a new subfeature, in whatever way the extension is done? As far as > > > I can remember I had a good reason to put it into its own method, I > > > needed it separately from add_sub_SeqFeature(). > > > >I guess it is more sane to let SeqFeature::Generic handle the common case > >and the split location case will need to be handled elsewhere. > > > >In the special case of a feature with multiple locations that feature (or > >object creating it) will take care of updating the location object to > >point to a splitlocation object. For example, if we choose to have CDS be > >represented as a SplitLocation with the exons being the parts in the > >join(...) statement. This will have to be negotiated by the object > >creating the Gene/CDS object. > > > >Okay so no changes to check in for Generic. > > > > > > > > Hilmar > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp@gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > >--__--__-- > > > >Message: 11 > >Date: Mon, 29 Jan 2001 13:57:13 -0800 > >From: Hilmar Lapp > >Organization: GNF > >To: Paul-Christophe Varoutas > >Cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] RetrictionEnzyme.pm: a proposal > > > >Paul-Christophe Varoutas wrote: > > > > > > Tell me what you think about it: > > > - First of all, is redesigning possible or are we obliged to maintain > > > compatibility ? In the latter case I will just add functionality, > > > maintaining the poor design of the module. > > > - If redesigning is possible, please make comments/suggestions. > > > > > > >First of all, keeping compatibility is a very good thing. Every user > >of your software will appreciate it if he/she knows that this is taken > >seriously. > > > >In general, my opinion is if there's no strong reason to break > >compatibility, then don't break it. On the other hand, if there is a > >good reason, then don't hesitate. > > > >This means, yes, redesigning is possible, but a nicer design by itself > >is not a good reason to break compatibility. If the existing design is > >sort of prohibitive for adding certain new functionality, this might > >justify breaking compatibility. An example is the new location model, > >but in fact Jason could manage to keep compatibility. I suggest that > >you carefully examine whether you indeed can't redesign and at the > >same time keep compatibility. Based on your proposal I don't see the > >prohibitive point yet. > > > >As for the release, this issue is not on the task list, which means > >that you are on your own. There's a deadline next week, and we don't > >want to lose focus. If you finish the code and submit an accompanying > >rigorous test in t/* on time, it can make it into the release though, > >provided that there are no objections should you introduce > >incompatibilities. > > > >As a last remark, a design that isn't prepared very well for an > >extension one has in mind is not necessarily poor. It may just have > >been perfect for its original scope. And: I really think that there is > >no such thing as a "correct" design. Design may be bad or may be good, > >generic or tailored, or whatever, it just depends on your viewpoint, > >that is, on the particular problem you want to solve. > > > > Hilmar > >-- > >------------------------------------------------------------- > >Hilmar Lapp email: lapp@gnf.org > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >------------------------------------------------------------- > > > >--__--__-- > > > >Message: 12 > >Date: Mon, 29 Jan 2001 23:02:45 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Bioperl > >Subject: Re: [Bioperl-l] missing use statements > > > >Paul-Christophe Varoutas wrote: > > > > > > so I just added one line at the beginning of the module to load Bio::Seq: > > > > > > use Bio::Seq; > > > > > > >Thanks for pointing this out. The reason this became necessary all > >of a sudden was probably that I removed the respective lines from > >SeqIO.pm, because there was no obvious reason to keep them. Since > >I still think that the 'use' statements are better in those files > >where the modules are really used, I left it that way and added > >the necessary use statements to all other SeqIO modules (which > >probably would all have complained sooner or later). > > > > > and edited the @ISA array initialization line: > > > > > > @ISA = qw(Bio::SeqIO Bio::Seq); > > > > > > >We don't want SeqIO modules to inherit from Bio::Seq. > > > > Hilmar > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > >--__--__-- > > > >Message: 13 > >Date: Mon, 29 Jan 2001 23:04:21 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Bioperl > >CC: Jason Stajich > >Subject: [Bioperl-l] Root::Object in bioxml.pm > > > >SeqIO::bioxml.pm still inherits from Root::Object. Is there a > >particular reason that this one's an exception? > > > > Hilmar > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > >--__--__-- > > > >Message: 14 > >Date: Mon, 29 Jan 2001 23:10:26 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Mark Wilkinson > >CC: bioperl-l@bioperl.org > >Subject: [Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/888 > > > >bioperl-bugs@bioperl.org wrote: > > > > > > Generic Features created from a GFF string do not > > > record Frame information, and when dumping the feature > > > out as GFF it is invariably reported as frame = 0. > > > > > > The problem is multi-fold: > > > > > > (1) the _from_gff_string and _from_gff2_string > > > subroutines in Generic.pm do not contain any code to handle the > > > recording of Frame information in the feature object > > > > > > (2) GFF allows a "." as the frame (meaning info not available), > > > while $Feature only allows 0,1, or 2. Thus it isn't clear how a > > > GFF frame of "." should be recorded. My first thought was that a > > > value of undef might return "." in a call to SeqFeatureI::gff_string, > > > however... > > > > > > (3) ...it appears that even if there is no frame information > > > available in a Feature object, it nevertheless passes the > > > $Feature->can('frame') test in SeqFeatureI::gff_string > > > and returns a (default??) value of 0 for the $Feature->frame call > > > (though there *is* code there to assign the frame to > > > "." if it fails the ->can test...) > > > > > > I am willing to fix this problem myself, but I would appreciate having > > > a consensus from the group about which level of the problem needs to be > > > fixed to keep everyone else's code happy. > > > > > > >I think that frame information should be consistent between GFF > >representation and object representation. '.' is equivalent to > >undef, and otherwise the frame should be 0, 1, or 2, regardless of > >object or GFF string. > > > > Hilmar > > > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > >--__--__-- > > > >Message: 15 > >Date: Tue, 30 Jan 2001 09:14:42 +0000 (GMT) > >From: Ewan Birney > >To: Jason Stajich > >cc: Bioperl > >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >On Mon, 29 Jan 2001, Jason Stajich wrote: > > > > > What is the feeling here, we have this old way of doing things which > > > included using the value 'EXPAND' to determine if we should expand the > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > I think this should likely be better modeled through a SplitLocationI > > > which is just a container of LocationObjects. So I propose to remove all > > > references to 'EXPAND' which means removing the method _expand_region and > > > updating add_sub_Feature to deal with adding the locations. Similarly the > > > flush_sub_SeqFeature should flush the locations, but I'm not sure about > > > what the start/end should be reset to... > > > >I guess agree (I am wincing at every one of these decisions you know. It > >just pains me to see us have to handle this object complexity in > >essentially simple objects. Bugger-it! I know there is no way out here, > >but .... it goes against the grain). > > > > > > > > I also had to update FeaturePair to add the method location() which > > > delegates to feature1()->location() otherwise things won't work correctly. > > > start/end are defined by feature1 object so location should also reside > > > in feature1. > > > > > > >That is the consistent route here... > > > > > > > Jason > > > > > > Jason Stajich > > > jason@chg.mc.duke.edu > > > Center for Human Genetics > > > Duke University Medical Center > > > http://www.chg.duke.edu/ > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >----------------------------------------------------------------- > >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > >. > >----------------------------------------------------------------- > > > > > > > >--__--__-- > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@bioperl.org > >http://bioperl.org/mailman/listinfo/bioperl-l > > > > > >End of Bioperl-l Digest > > > > > -- > Yuandan Zhang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 18:28:59 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 10:28:59 -0800 Subject: [Bioperl-l] Status 0.7 Message-ID: <3A7707EB.C318F6EF@gmx.net> We're rapidly approaching next Monday's deadline for the 0.7 code freeze. I've indicated on the tasklist where in the sequence of tasks I define the freeze to be. In essence, to be realistic the freeze will in fact be a functionality freeze, that is, on Monday next week all tasks on the list before the freeze should be completed up to the stage of remaining bug-fixes (i.e. dark green). Tasks not completed until then are likely to be dropped. As I said, bug fixes and documentation additions/fixes (I consider every piece of added documentation essentially a bug-fix, because missing documentation constitutes a bug) are exempt from the freeze. I suggest, however, that these fixes begin immediately after the freeze, and do not take longer than 1 week. In parallel to fixing known bugs (known from the bug-tracker) the package shall be tested on various systems and against the projects we want to be compatible with (Mac, Win32, Perl 5.004, Ensembl, bioperl-gui, bioperl-corba, which will certainly reveal additional bugs/problems. The goal is to have the code branch-ready within one week after the freeze, the quicker the better. Just to note the obvious: to keep the release phase as free as possible from unnecessary interference, I will not accept module or code submissions from the point of freeze until actually branching off the release. The situation actually doesn't look bad, the patchwork carpet is more and more greenish. The remaining sore points are o RichSeq interface, implementation, and adoption by parsers (Ewan) o SeqAnalysisParser/SeqFeatureProducer revisit (Hilmar & Jason, Ewan) o Transcript/GeneStructure & frame-aware cds() (Hilmar) o BioCorba 0.2 interoperability (Jason) Others than the three of us mentioned probably can't sensibly jump in any of these. However, you can provide support by testing code, looking through documentation, pointing out errors and undocumented methods etc, and, most importantly, development-wise by implementing tests which is BTW a good way of learning the package. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 18:34:43 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 10:34:43 -0800 Subject: [Bioperl-l] SeqFeatureI review Message-ID: <3A770943.8A3A89EA@gmx.net> I think I posted this already but can't dig it up any more. This is on our tasklist. Are there any other issues in SeqFeatureI we wanted to revisit apart from location-related stuff? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 30 19:08:36 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 30 Jan 2001 13:08:36 -0600 (CST) Subject: [Bioperl-l] SeqFeatureI review In-Reply-To: <3A770943.8A3A89EA@gmx.net> Message-ID: Has the Err problem been fixed? Bugs were posted numerous times. I think the error came from Root::Object - so it is irrelevant, correct? On Tue, 30 Jan 2001, Hilmar Lapp wrote: > I think I posted this already but can't dig it up any more. This > is on our tasklist. Are there any other issues in SeqFeatureI we > wanted to revisit apart from location-related stuff? > > Hilmar > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From hlapp@gmx.net Tue Jan 30 19:20:51 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:20:51 -0800 Subject: [Bioperl-l] SeqFeatureI review References: Message-ID: <3A771413.44154823@gmx.net> David Block wrote: > > Has the Err problem been fixed? Bugs were posted numerous times. I think > the error came from Root::Object - so it is irrelevant, correct? > I'm not sure. Can you dig up such a report, point to the respective number in the bug tracker? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 30 19:26:35 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 30 Jan 2001 13:26:35 -0600 (CST) Subject: [Bioperl-l] SeqFeatureI review In-Reply-To: <3A771413.44154823@gmx.net> Message-ID: One of them was # 855. It still bites us. We also got mail off-list from others who were unable to use 6.2 because of it. On Tue, 30 Jan 2001, Hilmar Lapp wrote: > David Block wrote: > > > > Has the Err problem been fixed? Bugs were posted numerous times. I think > > the error came from Root::Object - so it is irrelevant, correct? > > > > I'm not sure. Can you dig up such a report, point to the > respective number in the bug tracker? > > Hilmar > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From hlapp@gmx.net Tue Jan 30 19:35:46 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:35:46 -0800 Subject: [Bioperl-l] Bio::Root::Object cleanup Message-ID: <3A771792.DB06ACA6@gmx.net> In an attempt to tidy up our transition to Bio::Root::RootI I added a note about deprecation to Root/Object.pm and a warning to _initialize(). The warning will be suppressed for modules of which we know that they're in love with Root::Object. These revealed some to me unexpected modules. In total, the following modules contain a 'use Bio::Root::Object'statement: Bio/Root/Global.pm (*) Bio/Root/Err.pm (*) Bio/Root/IOManager.pm (*) Bio/Root/Utilities.pm (*) Bio/Root/Vector.pm (*) Bio/Root/Xref.pm (*) Bio/Search/Hit/HitI.pm (?) Bio/SeqIO/bioxml.pm (?) Bio/Tools/Blast/Sbjct.pm (*) Bio/Tools/Blast/HSP.pm (*) Bio/Tools/Blast/Run/LocalBlast.pm (*) Bio/Tools/AlignFactory.pm Bio/Tools/IUPAC.pm Bio/Tools/SeqAnal.pm (*) Bio/Tools/SeqPattern.pm Bio/Tools/WWW.pm Bio/Tools/PPSEARCH/Parse.pm Bio/Tools/PRFSCAN/Parse.pm Bio/Tools/PRINTS/Parse.pm Those marked with (*) are obvious. Those marked with (?) are likely to be absent from the release. I presently have no overview of the others. Jason, did you leave them out on purpose? In addition, the Variation code contains the line Bio/Variation/IO.pm: return Bio::Root::Object::new($class, %param); Heikki, I don't know about the context, just wanted to make sure this is indispensable. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 19:41:32 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:41:32 -0800 Subject: [Bioperl-l] LiveSeq tests warn Message-ID: <3A7718EC.736A4F85@gmx.net> Just to let you know, I'm getting warnings on my machine from LiveSeq.t and Mutator.t. Could you check whether this might indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) Hilmar t/LiveSeq...........Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1202. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1207. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1215. Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 380. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 385. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 393. ok t/Mutator...........Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1202. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1207. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1215. Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 380. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 385. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 393. ok -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 19:49:22 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:49:22 -0800 Subject: [Bioperl-l] SeqFeatureI review References: Message-ID: <3A771AC2.F61D9FBB@gmx.net> David Block wrote: > > One of them was # 855. It still bites us. We also got mail off-list from > others who were unable to use 6.2 because of it. > This one should be gone, first because it was fixed, second, because Err.pm shouldn't be used in many modules (basically only the Blast modules are left). Does it really still exist in a main-trunk checkout? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 30 20:40:38 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 30 Jan 2001 15:40:38 -0500 (EST) Subject: [Bioperl-l] Bio::Root::Object cleanup In-Reply-To: <3A771792.DB06ACA6@gmx.net> Message-ID: On Tue, 30 Jan 2001, Hilmar Lapp wrote: > In an attempt to tidy up our transition to Bio::Root::RootI I > added a note about deprecation to Root/Object.pm and a warning to > _initialize(). The warning will be suppressed for modules of which > we know that they're in love with Root::Object. > > These revealed some to me unexpected modules. In total, the > following modules contain a 'use Bio::Root::Object'statement: I skipped most of SteveC's modules initially because he likes to utilize the functionality he built into Bio::Root::Object and Bio::Root::Global (understandably). below is log of what I just checked in. > > Bio/Root/Global.pm (*) > Bio/Root/Err.pm (*) > Bio/Root/IOManager.pm (*) > Bio/Root/Utilities.pm (*) > Bio/Root/Vector.pm (*) > Bio/Root/Xref.pm (*) > Bio/Search/Hit/HitI.pm (?) I believe the entire Search dir is to be removed per Aaron Mackey saying that it is not in a usuable state and probably never will.... > Bio/SeqIO/bioxml.pm (?) I can fix it if we are definitely keeping it, I am under the impression it is to be trashed... not sure though. > Bio/Tools/Blast/Sbjct.pm (*) > Bio/Tools/Blast/HSP.pm (*) > Bio/Tools/Blast/Run/LocalBlast.pm (*) > Bio/Tools/AlignFactory.pm fixed > Bio/Tools/IUPAC.pm fixed > Bio/Tools/SeqAnal.pm (*) > Bio/Tools/SeqPattern.pm fixed > Bio/Tools/WWW.pm dependance on Bio::Root::Global and the $AUTHORITY variable which steve had this coded to his old stanford email address, I removed this dependance and hardcoded the $AUTHORITY var to be local to WWW.pm and have the value 'nobody@localhost' if anyone is using it they probably should speak up.... > Bio/Tools/PPSEARCH/Parse.pm depends on old Bio::SeqFeatureSet which no longer exists, I'm not sure what to do here > Bio/Tools/PRFSCAN/Parse.pm ditto > Bio/Tools/PRINTS/Parse.pm ditto > > Those marked with (*) are obvious. Those marked with (?) are > likely to be absent from the release. I presently have no overview > of the others. Jason, did you leave them out on purpose? > > In addition, the Variation code contains the line > Bio/Variation/IO.pm: return Bio::Root::Object::new($class, > %param); > Heikki, I don't know about the context, just wanted to make sure > this is indispensable. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From mwilkinson@gene.pbi.nrc.ca Tue Jan 30 21:35:03 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Tue, 30 Jan 2001 15:35:03 -0600 Subject: [Bioperl-l] GO ontology browser module available Message-ID: <3A773386.CF87115A@gene.pbi.nrc.ca> Dear Group, Most of you will be familiar with the GO consortium project of putting together a common nomenclature for genome annotation. As part of the development of Workbench, I have thrown together a fairly simplistic Gene Ontology ("GO") parser/browser widget. It is able to parse the XML files available on the GO website, clean up the XML to make it compatible with the XML::Parser module (available from CPAN), and then dump the resulting hash using Data::Dumper. The dumped file can then be read into the GO_browser (which is an extension of a Tk::Text widget) and browsed as if it were a directory window, with double-clicks to navigate up and down the tree, and color coding of what are 'branches' and what are 'leaves'. Middle-clicks can be trapped in the external Tk::MainWindow to extract the selected ontology term and definition. It is more or less a "plug in" module, similar in design to SeqCanvas - you create a Text widget, pass the Text widget to GO_Browser->new and it gives you back a browsable GO ontology. Parsing the GO ontology files themselves takes about 4-5 minues each, but this only has to be done once per GO release; the resulting hash-dump can be slurped into the GO_browser widget in a couple of seconds. I parse the GO ontology tree only to the point where GO-terms end and hard gene-names, examples, and bibliographic data begin. This could easily be modified, however, as you wish. Because this module doesn't really "fit" anywhere in the current BioPerl structure, and because the .xml files that it is based on are still quite fluid (and thus the module will likely have to be tweaked quite extensively until things settle down), I don't feel that it is worth adding into the BioPerl repository at this time. However, I would be glad to share it with anyone who might find it useful, with all the usual disclaimers :-) Let me know, Cheers all! M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From birney@ebi.ac.uk Tue Jan 30 21:57:48 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 21:57:48 +0000 (GMT) Subject: [Bioperl-l] RichSeqI Message-ID: To prove to hilmar that I am doing the RichSeqI stuff, I have committed the interface. Basically this is a trivial recasting of the "additional support" currently in Seq.pm which I will move out into Bio::Seq::RichSeq.pm currently the interface looks like... =head1 NAME Bio::Seq::RichSeqI - RichSeq interface, mainly for database orientated sequences =head1 SYNOPSIS @secondary = $richseq->get_secondary_accessions; $division = $richseq->division; $mol = $richseq->molecule; @dates = $richseq->get_dates; $seq_version = $richseq->seq_version; =head1 DESCRIPTION This interface extends the Bio::SeqI interface to give additional functionality to sequences with richer data sources, in particular from database sequences (EMBL, GenBank and Swissprot). Kris, Jason, Hilmar --- comments? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Tue Jan 30 22:38:16 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 22:38:16 +0000 (GMT) Subject: [Bioperl-l] RichSeq Message-ID: I have committed Bio::Seq::RichaSeq which implement the interface. I have adapted embl, genbank and swiss IO to work with it.... all very painless... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From lapp@gnf.org Tue Jan 30 23:05:52 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 30 Jan 2001 15:05:52 -0800 Subject: [Bioperl-l] RichSeqI References: Message-ID: <3A7748D0.CD638437@gnf.org> Ewan Birney wrote: > > To prove to hilmar that I am doing the RichSeqI stuff, I have committed > the interface. Basically this is a trivial recasting of the "additional > support" currently in Seq.pm which I will move out into > Bio::Seq::RichSeq.pm > > currently the interface looks like... > > =head1 NAME > > Bio::Seq::RichSeqI - RichSeq interface, mainly for database orientated > sequences > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > > > =head1 DESCRIPTION > > This interface extends the Bio::SeqI interface to give additional > functionality to sequences with richer data sources, in particular from > database sequences (EMBL, GenBank and Swissprot). > > Kris, Jason, Hilmar --- comments? > Sounds good. This is really the right direction. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Tue Jan 30 23:09:44 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 30 Jan 2001 15:09:44 -0800 Subject: [Bioperl-l] RichSeq References: Message-ID: <3A7749B8.39CE1934@gnf.org> Ewan Birney wrote: > > I have committed Bio::Seq::RichaSeq which implement the interface. I ----^--- Typo? > have > adapted embl, genbank and swiss IO to work with it.... > > all very painless... > Cool. I'm really glad that this makes it into the release. The reddish colors on the patchwork carpet retreat. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Tue Jan 30 23:10:38 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 30 Jan 2001 15:10:38 -0800 Subject: [Bioperl-l] GO ontology browser module available References: <3A773386.CF87115A@gene.pbi.nrc.ca> Message-ID: <3A7749EE.576E0D9D@gnf.org> Mark Wilkinson wrote: > > Because this module doesn't really "fit" anywhere in the current BioPerl > structure, and because the .xml files that it is based on are still > quite fluid (and thus the module will likely have to be tweaked quite > extensively until things settle down), I don't feel that it is worth > adding into the BioPerl repository at this time. However, I would be > glad to share it with anyone who might find it useful, with all the > usual disclaimers :-) > > Let me know, > Wouldn't it make sense to add it to bioperl-gui? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 30 23:22:16 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 30 Jan 2001 17:22:16 -0600 (CST) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To: <3A7749EE.576E0D9D@gnf.org> Message-ID: On Tue, 30 Jan 2001, Hilmar Lapp wrote: > Mark Wilkinson wrote: > > > > Because this module doesn't really "fit" anywhere in the current BioPerl > > structure, and because the .xml files that it is based on are still > > quite fluid (and thus the module will likely have to be tweaked quite > > extensively until things settle down), I don't feel that it is worth > > adding into the BioPerl repository at this time. However, I would be > > glad to share it with anyone who might find it useful, with all the > > usual disclaimers :-) > > > > Let me know, > > > > Wouldn't it make sense to add it to bioperl-gui? > > Hilmar > Inasmuch as it is completely separate from SeqCanvas, and we are still thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater than SeqCanvas, maybe. Mark? I think it would be okay. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From insana@ebi.ac.uk Wed Jan 31 00:08:03 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Wed, 31 Jan 2001 00:08:03 +0000 (GMT) Subject: [Bioperl-l] Re: LiveSeq tests warn In-Reply-To: <3A7718EC.736A4F85@gmx.net> Message-ID: > Just to let you know, I'm getting warnings on my machine from > LiveSeq.t and Mutator.t. Could you check whether this might > indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) Strange, I have nothing like that. Hmmmm. It seems it's complaining because I used "ne" instead than "!=" to test for something to be -1 or not -1. My perl is not complaining. I am running perl v5.6.0 on linux 2.4.0. Try please putting "!=" instead than "ne" and see if it gets fixed. Joseph From icarus@caffeine.doit.wisc.edu Wed Jan 31 00:17:39 2001 From: icarus@caffeine.doit.wisc.edu (Christopher Solomon) Date: Tue, 30 Jan 2001 18:17:39 -0600 (CST) Subject: [Bioperl-l] introduction Message-ID: Heyas. I've been looking over the bioperl site for a few weeks and thought it was about time I signed up for the mailing list. I'm eager to start learning about bioperl and how to use computation in general to solve biological problems. As a person who is interested in this subject, I'd like to hear a little about what your backgrounds are. To be fair, I'll tell you a little of mine. I got my B.S. in Biology from U of Illinois, then went to grad school in Molecular Biology at U of Wisconsin. After two years in the grad program, I got sick of it and left. I then got a job on campus for the university help desk. The next year introduced me to linux and system administration. Which brought me out here to California. Now I'm doing perl development for an internet company. I would eventually like to get back into science, but doing computational biology or bioinformatics. I figured getting involved with the bioperl project was as good a way to start as any, so here I am. I'd love to help out with anything I can, so if there are any lingering jobs nobody seems to want, well, I might take a crack, or any modules or such that just needs some cleaning up, I'm willing to help out where I can. So please tell me a little about yourselves and what (if anything) bio and/or perl has to do with your current employment situation. Christopher Solomon Jr. Application Developer ValueClick, Inc. icarus@caffeine.doit.wisc.edu From petertait@sympatico.ca Wed Jan 31 05:05:02 2001 From: petertait@sympatico.ca (Peter Tait) Date: Tue, 30 Jan 2001 21:05:02 -0800 Subject: [Bioperl-l] quit Message-ID: <3A779CFE.2D861F2D@sympatico.ca> I would like to quit bioperl. Thanks From hlapp@gmx.net Wed Jan 31 08:18:29 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 00:18:29 -0800 Subject: [Bioperl-l] RichSeqI References: <3A7748D0.CD638437@gnf.org> Message-ID: <3A77CA55.87E06E1@gmx.net> The interface looks slim, in fact very slim. Intentional or did you forget to commit? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 31 08:51:44 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 00:51:44 -0800 Subject: [Bioperl-l] Bio::Root::Object cleanup References: Message-ID: <3A77D220.1D20E5E1@gmx.net> Jason Stajich wrote: > > > Bio/Tools/PPSEARCH/Parse.pm > depends on old Bio::SeqFeatureSet which no longer exists, I'm not sure > what to do here > > Bio/Tools/PRFSCAN/Parse.pm > ditto > > Bio/Tools/PRINTS/Parse.pm > ditto These modules are obviously not being maintained, nor are they functional, let alone test scripts. Does anyone know what the intended destiny for these modules is? The author seems to be Evgueni; is he still with EBI? There was another module Tools::SeqWords which escaped my grep; I fixed it to inherit from RootI and fixed also a couple of other bugs there. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 31 09:08:46 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 01:08:46 -0800 Subject: [Bioperl-l] Bio::Factory Message-ID: <3A77D61E.CF9D1413@gmx.net> In an attempt to address revisit/finalization of the SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept the design change Ewan proposed couple of weeks ago: ------ Why not have Bio::SeqAnalysisParserFactoryI $parser = $factory->create_parser(-fh => \*FILE); Bio::SeqAnalyisParserI while( $next_feature = $parser->next_feature ) { } same number of functions defined. Twice the number of interfaces, but these are the interfaces I would argue we want. An implementation could implement ParserFactoryI and ParserI in the same module if so wished. ------ For the factory interface I propose to open a new directory Bio::Factory, first to avoid cluttering of other directories, and second because there are many places in BioPerl that can eventually take advantage of a factory design (basically, wherever hard-coded object creation occurs, e.g. in SeqIO::* etc), so that directory hopefully won't stay empty for long. Any objections? If not, I'll give it a go soon. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 09:26:51 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 09:26:51 +0000 (GMT) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To: Message-ID: On Tue, 30 Jan 2001, David Block wrote: > On Tue, 30 Jan 2001, Hilmar Lapp wrote: > > > Mark Wilkinson wrote: > > > > > > Because this module doesn't really "fit" anywhere in the current BioPerl > > > structure, and because the .xml files that it is based on are still > > > quite fluid (and thus the module will likely have to be tweaked quite > > > extensively until things settle down), I don't feel that it is worth > > > adding into the BioPerl repository at this time. However, I would be > > > glad to share it with anyone who might find it useful, with all the > > > usual disclaimers :-) > > > > > > Let me know, > > > > > > > Wouldn't it make sense to add it to bioperl-gui? > > > > Hilmar > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > than SeqCanvas, maybe. Mark? I think it would be okay. Sounds like the right place to me.... > > -- > David Block > dblock@gene.pbi.nrc.ca > http://bioinfo.pbi.nrc.ca/dblock/wiki > Plant Biotechnology Institute > National Research Council of Canada > Saskatoon, Saskatchewan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 09:32:43 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 09:32:43 +0000 (GMT) Subject: [Bioperl-l] Bio::Factory In-Reply-To: <3A77D61E.CF9D1413@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > In an attempt to address revisit/finalization of the > SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept > the design change Ewan proposed couple of weeks ago: > > ------ > Why not have > > Bio::SeqAnalysisParserFactoryI > > $parser = $factory->create_parser(-fh => \*FILE); > > Bio::SeqAnalyisParserI > > while( $next_feature = $parser->next_feature ) { > > } > > same number of functions defined. Twice the number of interfaces, > but > these are the interfaces I would argue we want. > > An implementation could implement ParserFactoryI and ParserI in > the > same module if so wished. > ------ > > For the factory interface I propose to open a new directory > Bio::Factory, first to avoid cluttering of other directories, and > second because there are many places in BioPerl that can > eventually take advantage of a factory design (basically, wherever > hard-coded object creation occurs, e.g. in SeqIO::* etc), so that > directory hopefully won't stay empty for long. > > Any objections? If not, I'll give it a go soon. This sounds really good.... Definitely needed/wanted... > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 31 13:56:21 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 08:56:21 -0500 (EST) Subject: [Bioperl-l] Bio::Factory In-Reply-To: <3A77D61E.CF9D1413@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > In an attempt to address revisit/finalization of the > SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept > the design change Ewan proposed couple of weeks ago: > > ------ > Why not have > > Bio::SeqAnalysisParserFactoryI > > $parser = $factory->create_parser(-fh => \*FILE); > > Bio::SeqAnalyisParserI > > while( $next_feature = $parser->next_feature ) { > > } > > same number of functions defined. Twice the number of interfaces, > but > these are the interfaces I would argue we want. > > An implementation could implement ParserFactoryI and ParserI in > the > same module if so wished. > ------ > > For the factory interface I propose to open a new directory > Bio::Factory, first to avoid cluttering of other directories, and > second because there are many places in BioPerl that can > eventually take advantage of a factory design (basically, wherever > hard-coded object creation occurs, e.g. in SeqIO::* etc), so that > directory hopefully won't stay empty for long. > > Any objections? If not, I'll give it a go soon. Great idea and it is a good place to put these things and can help cleanup some of the clutter for sure. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Wed Jan 31 14:01:23 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 09:01:23 -0500 (EST) Subject: [Bioperl-l] RichSeqI In-Reply-To: Message-ID: On Tue, 30 Jan 2001, Ewan Birney wrote: > > To prove to hilmar that I am doing the RichSeqI stuff, I have committed > the interface. Basically this is a trivial recasting of the "additional > support" currently in Seq.pm which I will move out into > Bio::Seq::RichSeq.pm > > > currently the interface looks like... > > > =head1 NAME > > Bio::Seq::RichSeqI - RichSeq interface, mainly for database orientated > sequences > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > > > =head1 DESCRIPTION > > This interface extends the Bio::SeqI interface to give additional > functionality to sequences with richer data sources, in particular from > database sequences (EMBL, GenBank and Swissprot). > > > Kris, Jason, Hilmar --- comments? We have static set of methods for handling the fields you describe above as well as a set of dynamic methods (via AUTOLOAD) to deal with things like PID (bug #160), genbankid. Or does most of that get wrapped into secondard_accessions? I guess are there any other fields we are missing? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From heikki@ebi.ac.uk Wed Jan 31 15:33:57 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:33:57 +0000 Subject: [Bioperl-l] more fuzziness checked in References: Message-ID: <3A783065.9A629C67@ebi.ac.uk> Jason Stajich wrote: > > more robust fuzzy and split feature handling checked in. > > FTHelper will try and see if start==end, if it does and there is no > splitlocation delimiter then the code will return just a single number > representing the location ie > > variation 500 > /allele="C" > /allele="T" > I am just back from an one week holiday. I'll catch up with the list in a day or two. Jason, In case you really are going to use the above format, it is not valid according to The DDBJ/EMBL/GenBank Feature Table Definition. The allele qualifier gives a common name of the allele in free text, e.g.: /allele="adh1-1" In general there is the rule that there should not be identical feature keys on the same location, but 'variation' is an exception. When we are dealing with SNPs whe do not generally know which of the alleles are present in that particular sequence the SNP is mapped to (unless you want to check the sequence). The correct way to represent diallelic variation in DDBJ/EMBL/GenBank feature table is to repeat the feature key for each allele and use /replace qualifier. variation 500 /replace="C" variation 500 /replace="T" It is ugly but that's what they (EMBL database people) told me to do a few weeks ago when I was writing the to_FTHelper method to SNPs in EnsEMBL. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From paul-christophe.varoutas@curie.fr Wed Jan 31 15:19:06 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Wed, 31 Jan 2001 16:19:06 +0100 Subject: [Bioperl-l] Re: LiveSeq tests warn In-Reply-To: References: <3A7718EC.736A4F85@gmx.net> Message-ID: <5.0.2.1.2.20010131160802.00a62a98@mailhost.curie.fr> I guess you are talking about the small bug I fixed yesterday in /Bio/LiveSeq/SeqI.pm and Bio/LiveSeq/Gene.pm: http://bioperl.org/pipermail/bioperl-guts-l/2001-January/002957.html (I committed after Hilmar's mail and before Joseph's answer). Paul-Christophe At 00:08 31/01/2001 +0000, Joseph Insana wrote: > > Just to let you know, I'm getting warnings on my machine from > > LiveSeq.t and Mutator.t. Could you check whether this might > > indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) > >Strange, I have nothing like that. >Hmmmm. It seems it's complaining because I used "ne" instead than "!=" >to test for something to be -1 or not -1. >My perl is not complaining. >I am running perl v5.6.0 on linux 2.4.0. > >Try please putting "!=" instead than "ne" and see if it gets fixed. > >Joseph At 11:41 30/01/2001 -0800, Hilmar Lapp wrote: >Just to let you know, I'm getting warnings on my machine from >LiveSeq.t and Mutator.t. Could you check whether this might >indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) > > Hilmar > >t/LiveSeq...........Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1202. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1207. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1215. >Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 380. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 385. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 393. >ok > >t/Mutator...........Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1202. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1207. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1215. >Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 380. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 385. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 393. >ok > > >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@bioperl.org >http://bioperl.org/mailman/listinfo/bioperl-l From mwilkinson@gene.pbi.nrc.ca Wed Jan 31 15:25:51 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Wed, 31 Jan 2001 09:25:51 -0600 Subject: [Bioperl-l] GO ontology browser module available References: Message-ID: <3A782E7E.5EEC35CA@gene.pbi.nrc.ca> Ewan Birney wrote: > > > Wouldn't it make sense to add it to bioperl-gui? > > > > > > Hilmar > > > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > > than SeqCanvas, maybe. Mark? I think it would be okay. > > Sounds like the right place to me.... indeed - that was where I intended to put it when it was a little more "polished"... I am just hesitant to use the BioPerl CVS repository to store my half-baked code. There are several things which "don't work right" (tm). I think a lot of this has to do with the fact that I can not get my hands on the GO.dtd - it isn't available on the GO website, though all of the other XML files are (yet they reference the DTD in these same XML files). Neither do I receive a response to inquiries sent to the consortium e-mail address. The consequence is that XML::Parser doesn't know what to do with the HTML-like formatting tags that they are using in some of their "free text", and in some cases tries to treat them as sub-level tags (for example, what should be a subscript or superscript will become a sub-element of the preceeding word, so Carbon14 parses as $GO->{Carbon}->{14}... which is ridiculous of course....). In addition they use HTML designations for the greek alpha, beta, gamma, and so on, preceeded with an ampersand and ending with a semicolon These can not be parsed by XML::Parser *at all* unless it is specifically told that these are going to be #CDATA elements... which requires a DTD.... which I don't have. So, GO_Browser (for the time being) hacks away at the XML in its first parsing pass, replacing these tags with things that will not break XML::Parser, and then reads from this hacked data. As a result, what you get is not "strict" GO ontology, but a slightly modified version of the same.... which effectively defeats the purpose of GO which is that everyone should use a consensus nomenclature. :-( In any case, after all that griping, I am perfectly willing to cvs add this module to bioperl-gui, so long as I am not judged too harshly by it - I know it's a hack!! :-) I'll get on to that later this afternoon. b.t.w. If anyone can assist me in getting ahold of a GO.dtd please speak up! It would make my miserable life a bit brighter!! -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From heikki@ebi.ac.uk Wed Jan 31 15:43:53 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:43:53 +0000 Subject: [Bioperl-l] RetrictionEnzyme.pm: a proposal References: <5.0.2.1.2.20010129100143.00b32138@pop.wanadoo.fr> Message-ID: <3A7832B9.ED647AAC@ebi.ac.uk> Paul-Christophe, Please have a look at Bio::Variation::VariantI::restriction_changes, too. I would have prefered to use Bio::Tools::RestrictionEnzyme but decided not to depend on it as I found it too complicated. It would be great not to have to duplicate restriction enzyme lists and functionality. If you come up with a solution I'd be happy to remove or modify the restriction_changes method. -Heikki Paul-Christophe Varoutas wrote: > > Yesterday I studied RestrictionEnzyme.pm more in depth. I haven't yet added > the methods I wanted to, because in my opinion it is far more urgent for > this module to get some redesigning. > > The module somewhat suffers of poor design, and just adding methods to it > will just worsen the situation. > > RestrictionEnzyme has methods which are proper to the restriction enzymes: > - seq() is the accessor method to the enzyme's recognition sequence. > - cut_seq() "cuts" a Bio::Seq-derived object and generates an array of > restriction site fragments. > - cuts_seq_at() does the same but this time generates an array of > restriction site coordinates. > > and methods which are proper to the list of enzymes: > - is_available() says if a particular enzyme is in the list. > - available_list() gives the list of all enzymes or list of n-base cutters. > > Steve Chervitz already suggested in the module's documentation that > is_available() "may be more appropriate for a REData.pm class", and I share > his opinion. From a conceptual point of view, the existing > RestrictionEnzyme.pm module corresponds to two object classes, not one. > > Here is an outline of my proposal: > > Separate RestrictionEnzyme in two classes: > > RestrictionEnzymeDBase (or whatever more appropriate): > - members: the list of restriction enzymes. > - methods: > - constructor using hardwired list of enzymes OR user file OR URL. > - add/remove enzyme to/from list (adding will be the equivalent of > _make_custom() ). > - member accessor methods: already existing methods: is_available(), > available_list(). > > RestrictionEnzyme: > - members: the same as now (_name, _seq, _site, _cuts_after). > - methods: > - constructor (equivalent to the constructor calling the > _make_standard() sub). > - already existing accessor methods. > - already existing methods: cut_seq, cuts_seq_at, etc. > > This design, apart from being more "correct", will facilitate any future > extensions of the two modules. The drawback in separating RestrictionEnzyme > in two classes is that all code using RestrictionEnzyme.pm will have to be > modified. > > Perhaps we should take advantage of the imminent release of the 0.7 version > and decide to proceed in the redesigning. If we change the design this will > also be the opportunity to slightly change/extend its public interface to > add small new functionalities such as being able to add and use asymmetric > cutters and enzymes which cut outside the recognition site (perhaps just > incorporating small changes now in order to be in time for the 0.7 release > and leaving extensions for afterwards, especially if I do this alone based > on what we decide). > > Tell me what you think about it: > - First of all, is redesigning possible or are we obliged to maintain > compatibility ? In the latter case I will just add functionality, > maintaining the poor design of the module. > - If redesigning is possible, please make comments/suggestions. > > Paul-Christophe > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 15:51:18 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:51:18 +0000 Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm References: Message-ID: <3A783476.5DE01D5D@ebi.ac.uk> I read "A Really Good Book" recently about CVS and found out that you can put in your home directory a .cvsrc file with for example following lines: update -d cvs -q -z9 After that 'cvs update' is automatically expanded to 'cvs -q -z9 update -d'! -Heikki Ewan Birney wrote: > > On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > Dear Group, > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > (line 122), which apparently does not exist. Is it just my installation > > which is wonky, or is this a genuine bug? > > cvs update -d > > > > > any advice appreciated. > > > > cheers all! > > > > M > > > > > > -- > > --- > > Dr. Mark Wilkinson > > Bioinformatics Group > > National Research Council of Canada > > Plant Biotechnology Institute > > 110 Gymnasium Place > > Saskatoon, SK > > Canada > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 16:52:20 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 16:52:20 +0000 Subject: [Bioperl-l] Incompatibility with Perl v5.6.0 [Fwd: XML::Parse test fails] Message-ID: <3A7842C4.6F05ED9D@ebi.ac.uk> It might be worth adding this into release notes of the upcoming 0.7 release. As a result Bio::Variation XML input and output does not work under Perl v5.6.0. We have to pray that 5.6.1 will be out soon. -Heikki David Megginson wrote: > > Heikki Lehvaslaiho writes: > > > I recently upgraded to Perl v5.6.0. As result the XML::Parse test > > script fails and CPAN does not install it: > > There is a known bug in Perl 5.6 when passing array references. > > All the best, > > David > > -- > David Megginson david@megginson.com > http://www.megginson.com/ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 17:11:26 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 17:11:26 +0000 Subject: [Bioperl-l] Re: Bio::Root::Object cleanup References: <3A771792.DB06ACA6@gmx.net> Message-ID: <3A78473E.554A78C1@ebi.ac.uk> Hilmar Lapp wrote: ... > In addition, the Variation code contains the line > Bio/Variation/IO.pm: return Bio::Root::Object::new($class, > %param); > Heikki, I don't know about the context, just wanted to make sure > this is indispensable. It is not. I copied it over from Bio::SeqIO at some point. Removed. -Heikki > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From hlapp@gmx.net Wed Jan 31 17:58:44 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 09:58:44 -0800 Subject: [Bioperl-l] RichSeqI References: Message-ID: <3A785254.7E3A11BD@gmx.net> Ewan Birney wrote: > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > What about species()? Just popped into my head. Right now a class implementing both SeqI and RichSeqI doesn't have to have that, even though it's present in probably most 'rich' databanks. What do you think about moving it, too? (It's now in Seq.pm.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From cjm@fruitfly.bdgp.berkeley.edu Wed Jan 31 18:02:05 2001 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Wed, 31 Jan 2001 10:02:05 -0800 (PST) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To: <3A782E7E.5EEC35CA@gene.pbi.nrc.ca> Message-ID: Hi Mark Sorry you haven't heard back from us GO people, all the GO developers are working full time on another project at the moment, just keep at us and we'll respond eventually. We should fix the problem of the SGML embedded within XML - Brad, can you see to this? In the meantime, have you tried using either the flat files or the mysql database? there are perl modules for using either of these in the GO repository. As to where you deposit your code, I'd love to keep all the GO code together in one cvs repository. Unfortunately, the stanford cvs server is highly restricted. I was considering moving the perl software portion of GO away from the stanford cvs server into the Berkeley one, for this reason. Another option would be to use bioperl cvs for all of GO-perl, if people are willing. if anyone's interested the GO module docs are here: http://www.fruitfly.org/annot/go/database/modules/GO::AppHandle.html On Wed, 31 Jan 2001, Mark Wilkinson wrote: > Ewan Birney wrote: > > > > > Wouldn't it make sense to add it to bioperl-gui? > > > > > > > > Hilmar > > > > > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > > > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > > > than SeqCanvas, maybe. Mark? I think it would be okay. > > > > Sounds like the right place to me.... > > indeed - that was where I intended to put it when it was a little more > "polished"... I am just hesitant to use the BioPerl CVS repository to store my > half-baked code. > > There are several things which "don't work right" (tm). I think a lot of this > has to do with the fact that I can not get my hands on the GO.dtd - it isn't > available on the GO website, though all of the other XML files are (yet they > reference the DTD in these same XML files). Neither do I receive a response to > inquiries sent to the consortium e-mail address. > > The consequence is that XML::Parser doesn't know what to do with the HTML-like > formatting tags that they are using in some of their "free text", and in some > cases tries to treat them as sub-level tags (for example, what should be a > subscript or superscript will become a sub-element of the preceeding word, so > Carbon14 parses as $GO->{Carbon}->{14}... which is ridiculous of > course....). In addition they use HTML designations for the greek alpha, beta, > gamma, and so on, preceeded with an ampersand and ending with a semicolon These > can not be parsed by XML::Parser *at all* unless it is specifically told that > these are going to be #CDATA elements... which requires a DTD.... which I don't > have. > > So, GO_Browser (for the time being) hacks away at the XML in its first parsing > pass, replacing these tags with things that will not break XML::Parser, and then > reads from this hacked data. As a result, what you get is not "strict" GO > ontology, but a slightly modified version of the same.... which effectively > defeats the purpose of GO which is that everyone should use a consensus > nomenclature. :-( > > In any case, after all that griping, I am perfectly willing to cvs add this > module to bioperl-gui, so long as I am not judged too harshly by it - I know it's > a hack!! :-) > > I'll get on to that later this afternoon. > > b.t.w. If anyone can assist me in getting ahold of a GO.dtd please speak up! It > would make my miserable life a bit brighter!! > > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From birney@ebi.ac.uk Wed Jan 31 18:10:21 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 18:10:21 +0000 (GMT) Subject: [Bioperl-l] RichSeqI In-Reply-To: <3A785254.7E3A11BD@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > =head1 SYNOPSIS > > > > @secondary = $richseq->get_secondary_accessions; > > $division = $richseq->division; > > $mol = $richseq->molecule; > > @dates = $richseq->get_dates; > > $seq_version = $richseq->seq_version; > > > > What about species()? Just popped into my head. Right now a class > implementing both SeqI and RichSeqI doesn't have to have that, > even though it's present in probably most 'rich' databanks. What > do you think about moving it, too? (It's now in Seq.pm.) Hmmmm. I would guess it would go to SeqI. It should be somewhere. I'm agnostic. If we move it out to RichSeq genbank/embl IO have to be able to generate dummy Species lines... > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 31 19:10:30 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 11:10:30 -0800 Subject: [Bioperl-l] Bio::Factory::SeqAnalysisParserFactoryI Message-ID: <3A786326.D0DCBFFE@gmx.net> Interface committed. Check out the documentation. If you approve it, I'll add the implementation. The obvious question with regard to SeqFeatureProducer is what will happen to the add_features() method. In principle the implementation is simple enough to just dismiss it; as we already felt a couple of times it doesn't really add that much value. So, let me know what you think. Hilmar -------- Original Message -------- Subject: Bio::Factory Date: Wed, 31 Jan 2001 01:08:46 -0800 From: Hilmar Lapp Organization: Nereis 4 To: Bioperl In an attempt to address revisit/finalization of the SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept the design change Ewan proposed couple of weeks ago: ------ Why not have Bio::SeqAnalysisParserFactoryI $parser = $factory->create_parser(-fh => \*FILE); Bio::SeqAnalyisParserI while( $next_feature = $parser->next_feature ) { } same number of functions defined. Twice the number of interfaces, but these are the interfaces I would argue we want. An implementation could implement ParserFactoryI and ParserI in the same module if so wished. ------ For the factory interface I propose to open a new directory Bio::Factory, first to avoid cluttering of other directories, and second because there are many places in BioPerl that can eventually take advantage of a factory design (basically, wherever hard-coded object creation occurs, e.g. in SeqIO::* etc), so that directory hopefully won't stay empty for long. Any objections? If not, I'll give it a go soon. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 19:15:37 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 19:15:37 +0000 (GMT) Subject: [Bioperl-l] Re: Bio::Factory::SeqAnalysisParserFactoryI In-Reply-To: <3A786326.D0DCBFFE@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > Interface committed. Check out the documentation. If you approve > it, I'll add the implementation. > > The obvious question with regard to SeqFeatureProducer is what > will happen to the add_features() method. In principle the > implementation is simple enough to just dismiss it; as we already > felt a couple of times it doesn't really add that much value. So, > let me know what you think. > I don't like the add_features method much myself... Jason? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 31 20:17:10 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 15:17:10 -0500 (EST) Subject: [Bioperl-l] Re: Bio::Factory::SeqAnalysisParserFactoryI In-Reply-To: Message-ID: kill it, that's fine. We should instead be providing better example scripts rather than wrapping something that simple into an object since all the work is done by the Seq object. On Wed, 31 Jan 2001, Ewan Birney wrote: > On Wed, 31 Jan 2001, Hilmar Lapp wrote: > > > Interface committed. Check out the documentation. If you approve > > it, I'll add the implementation. > > > > The obvious question with regard to SeqFeatureProducer is what > > will happen to the add_features() method. In principle the > > implementation is simple enough to just dismiss it; as we already > > felt a couple of times it doesn't really add that much value. So, > > let me know what you think. > > > > I don't like the add_features method much myself... Jason? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From krbou@pgsgent.be Wed Jan 31 21:43:09 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Wed, 31 Jan 2001 22:43:09 +0100 Subject: [Bioperl-l] Cruft in module documentation ? Message-ID: <20010131224309.B24431@gryzo.pgsgent.be> In testing the documentation (SYNOPSIS) part I already fixed some errors (more to come during the coming days), but I don't know what to do with this one (I guess it can be removed). The SYNOPSIS for Bio::Annotation contains [ ...] # # Making an annotation object from scratch # $ann = Bio::Pfam::Annotation->new(); $ann->description("Description text"); print "Annotation description is ", $ann->description, "\n"; I can't find any reference to Bio::Pfam::Annotation, is this a remainder of history ? Kris, From birney@ebi.ac.uk Wed Jan 31 22:03:29 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 22:03:29 +0000 (GMT) Subject: [Bioperl-l] Cruft in module documentation ? In-Reply-To: <20010131224309.B24431@gryzo.pgsgent.be> Message-ID: On Wed, 31 Jan 2001, Kris Boulez wrote: > In testing the documentation (SYNOPSIS) part I already fixed some errors > (more to come during the coming days), but I don't know what to do with > this one (I guess it can be removed). > The SYNOPSIS for Bio::Annotation contains > > [ ...] > # > # Making an annotation object from scratch > # > > $ann = Bio::Pfam::Annotation->new(); > > $ann->description("Description text"); > print "Annotation description is ", $ann->description, "\n"; > > > I can't find any reference to Bio::Pfam::Annotation, is this a remainder > of history ? This is historical cruft. s/Pfam:://g; > > Kris, > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From krbou@pgsgent.be Wed Jan 31 22:32:21 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Wed, 31 Jan 2001 23:32:21 +0100 Subject: [Bioperl-l] Cruft in module documentation ? In-Reply-To: ; from birney@ebi.ac.uk on Wed, Jan 31, 2001 at 10:03:29PM +0000 References: <20010131224309.B24431@gryzo.pgsgent.be> Message-ID: <20010131233221.A24783@gryzo.pgsgent.be> Quoting Ewan Birney (birney@ebi.ac.uk): > On Wed, 31 Jan 2001, Kris Boulez wrote: > > > > > > > I can't find any reference to Bio::Pfam::Annotation, is this a remainder > > of history ? > > This is historical cruft. s/Pfam:://g; > Done. Kris, From Cox, Greg" I know that there are some people on the BioPerl list who went into the same trouble and managed to have some success. Please reply directly to Greg, as it wasn't me who had the question. Hilmar -------- Original Message -------- Subject: [Biojava-l] WinCVS and SSH Date: Wed, 31 Jan 2001 14:08:06 -0500 From: "Cox, Greg" To: biojava-l@biojava.org I'm having problems convincing WinCVS and SSH to play nicely together. I followed the instructions on WinCVS' page, but I can't log in. I can login with ssh (I'm using ssh-1.2.14-win32bin) without typing a password, but when I try to login to cvs, I get, "Set the password authentication first in the preferences !" Did anyone else run across this, and how did you fix it? Greg _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From M.W.E.J.Fiers@plant.wag-ur.nl Tue Jan 2 12:52:57 2001 From: M.W.E.J.Fiers@plant.wag-ur.nl (Fiers, M.W.E.J.) Date: Tue, 02 Jan 2001 13:52:57 +0100 Subject: [Bioperl-l] Computation object Message-ID: Hi Concerning the computation.pm object; I've seem to have made a rather stupid mistake, I seem to have failed to do an actual commit last time. So I've given it another try. If somebody feels like it, please take a look. I didn't implement the structure Ewan proposed. If people like my implementation of this object, I will do it. Mark Fiers Plant Research International From jason@chg.mc.duke.edu Tue Jan 2 15:58:18 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 2 Jan 2001 10:58:18 -0500 (EST) Subject: [Bioperl-l] call for more tests Message-ID: In the continued effort to check every module in our distribution before 0.7 is released. I wondered if anyone does use Bio::SeqIO::scf? I need some test files for it. Thanks. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Tue Jan 2 17:19:38 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 2 Jan 2001 12:19:38 -0500 (EST) Subject: [Bioperl-l] test framework Message-ID: while I'm messing with it, does anyone have objections to using the built in perl Test module available since perl 5.004 rather than our I agree it is wasted time to constantly move things from one test suite to another ( I already tried to standardize our existing ones as best as possible). But a nice standard makes it easier for new people to write tests and make them fit. Any comments? sub test ($$;$) { my($num, $true,$msg) = @_; print($true ? "ok $num\n" : "not ok $num $msg\n"); } [ from perldoc Test ] use strict; use Test; # use a BEGIN block so we print our plan before MyModule is loaded BEGIN { plan tests => 14, todo => [3,4] } # load your module... use MyModule; ok(0); # failure ok(1); # success ok(0); # ok, expected failure (see todo list, above) ok(1); # surprise success! ok(0,1); # failure: '0' ne '1' ok('broke','fixed'); # failure: 'broke' ne 'fixed' ok('fixed','fixed'); # success: 'fixed' eq 'fixed' ok('fixed',qr/x/); # success: 'fixed' =~ qr/x/ ok(sub { 1+1 }, 2); # success: '2' eq '2' ok(sub { 1+1 }, 3); # failure: '2' ne '3' ok(0, int(rand(2)); # (just kidding :-) my @list = (0,0); ok @list, 3, "\@list=".join(',',@list); #extra diagnostics ok 'segmentation fault', '/(?i)success/'; #regex match skip($feature_is_missing, ...); #do platform specific test Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From krbou@pgsgent.be Tue Jan 2 21:21:36 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Tue, 2 Jan 2001 22:21:36 +0100 Subject: [Bioperl-l] SWISS-PROT writing Message-ID: <20010102222136.A19390@gryzo.pgsgent.be> [ I know there are some specialists on SWISS-PROT on this list, so I might make a fool of me, but here goes ] When chasing down the reason why swiss.pm was not able to read a SWISS-PROT formatted file it wrote itself I found the following things which look suspicious in write_seq() - at line 356 there is $mol = $seq->molecule; I think this should be $seq->moltype; as ->molecule only looks for {'molecule'} which is not set by ->new. Bio::Seq->new only sets {'moltype'}. We should change the 'protein' of ->moltype to 'PRT' to conform to the standard. B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA sequences ? - around line 369 the whole else block should be changed. We should make sure we have a division ($div) in the ID part. The previous version of the code which is now commented out did a better try at this. Looking at next_seq() we why we're not able to read this (entry name must contain an underscore section 3.1.1 of the SWISS-PROT manual). $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/ || $self->throw("swissprot stream with no ID. Not swissprot in my book"); $name = $1."_".$2; $seq->primary_id($1); $seq->division($2); How standard compliant do we want to be with this. If we want to be very strict we should e.g. make sure the 'entry name' (first item on the ID line) is not more then 10 characters. P.S. (very) minor issue: the division we choose 'UNK' for sequences which don't have a division set is not in the standard (speclist.txt), it only contains UNKP Should I try to adopt swiss.pm to the thoughts I (tried to) put out or are there major objections ? Kris, From lapp@gnf.org Tue Jan 2 23:45:28 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 02 Jan 2001 15:45:28 -0800 Subject: [Bioperl-l] SWISS-PROT writing References: <20010102222136.A19390@gryzo.pgsgent.be> Message-ID: <3A526818.95C495BB@gnf.org> Kris Boulez wrote: > > > - at line 356 there is > $mol = $seq->molecule; > I think this should be $seq->moltype; as ->molecule only looks for > {'molecule'} which is not set by ->new. Bio::Seq->new only sets > {'moltype'}. > We should change the 'protein' of ->moltype to 'PRT' to conform to the > standard. moltype() is internal to BioPerl. Whenever there is an attribute synonymous to moltype() but defined by a databank, molecule() should be used for that. So the code is correct I think. Bio::Seq->new() indeed only sets moltype(), because at this point there is no databank specificity. molecule() should be set by the parser. If you want to instantiate a swissprot seq from memory and have it written in swissprot format, the way we want to go is have dedicated classes under Bio::Seq::*. If there is need for a swissprot-dedicated class, that one probably would also set molecule() at instantiation time. > > B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA > sequences ? In my opinion there's no need for that, but others may think differently. > > - around line 369 the whole else block should be changed. We should make > sure we have a division ($div) in the ID part. The previous version of > the code which is now commented out did a better try at this. Looking at > next_seq() we why we're not able to read this (entry name must contain > an underscore section 3.1.1 of the SWISS-PROT manual). > > $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/ > || $self->throw("swissprot stream with no ID. Not swissprot in my > book"); > $name = $1."_".$2; > $seq->primary_id($1); > $seq->division($2); > If this is the code you're referring to (sorry, don't have at hand right now), it does ensure that there is a division part. I'm probably missing something. > How standard compliant do we want to be with this. If we want to be very > strict we should e.g. make sure the 'entry name' (first item on the ID > line) is not more then 10 characters. > > P.S. (very) minor issue: the division we choose 'UNK' for sequences > which don't have a division set is not in the standard (speclist.txt), > it only contains UNKP > Sure, can (should) be changed. > Should I try to adopt swiss.pm to the thoughts I (tried to) put out or > are there major objections ? > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects of that class. Apart from this, Lorenz may wish to comment. He's been our Swissprot cruncher for a while, but haven't heard from him for some time. Lorenz, still out there? Happy new year to all. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From schattner@alum.mit.edu Wed Jan 3 02:26:20 2001 From: schattner@alum.mit.edu (Peter Schattner) Date: Tue, 02 Jan 2001 18:26:20 -0800 Subject: [Bioperl-l] call for more tests References: Message-ID: <3A528DCC.BEE4C293@alum.mit.edu> Jason Stajich wrote: > > In the continued effort to check every module in our distribution before > 0.7 is released. I wondered if anyone does use Bio::SeqIO::scf? I need > some test files for it. > Thanks. I can't help you with Bio::SeqIO::scf, but I can add a couple of other missing tests to your list: Bio::Tools::SeqPattern does not have a "t" file. (By the way, seq_pattern.pl in the examples directory crashes - I just submitted a bug report). Bio:Tools:SeqStats currently only has one very simple test (located in Tools.t) Previously there were several more tests that seem to have disappeared. I can upload the additional tests again if you like. Peter From schattner@alum.mit.edu Wed Jan 3 02:31:14 2001 From: schattner@alum.mit.edu (Peter Schattner) Date: Tue, 02 Jan 2001 18:31:14 -0800 Subject: [Bioperl-l] A couple of CVS questions. Message-ID: <3A528EF1.5361CEE7@alum.mit.edu> A couple of CVS questions. 1. How can one access earlier releases of bioperl? I haven't been able to find them on CVS or elsewhere. Where should I be looking? 2. Some modules were moved to different directories within the CVS structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to be able to find the versions of the modules made prior to the date that the modules were moved. Can someone tell me if these older versions are accessible and if so how to find them. Thanks Peter Schattner From lapp@gnf.org Wed Jan 3 04:16:02 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 02 Jan 2001 20:16:02 -0800 Subject: [Bioperl-l] A couple of CVS questions. References: <3A528EF1.5361CEE7@alum.mit.edu> Message-ID: <3A52A782.392956A4@gnf.org> Peter Schattner wrote: > > A couple of CVS questions. > > 1. How can one access earlier releases of bioperl? I haven't been able > to find them on CVS or elsewhere. Where should I be looking? > You can checkout based on one of version, tag, or date. You very likely don't want to checkout a release by version, as each file has a different version. There is a tag for the 0.6.x release branch, and also for other releases. If you want to checkout the whole development trunk in an earlier version, the most sensible way is probably to go by date (option -D). For individual modules you can go either way. Do you have the manpages of cvs? They're actually poor compared to the info-files cvs comes with. On a Unix box with info installed you should be able to type 'info cvs'. > 2. Some modules were moved to different directories within the CVS > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to > Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to > be able to find the versions of the modules made prior to the date that > the modules were moved. Can someone tell me if these older versions are > accessible and if so how to find them. The files were moved without retaining the revision history (cvs is bad at file moving and renaming; you have to mess with the repository in order to have cvs history preserved in this case). The version at the former location was deleted, so you can restore it at the former place only. The file at the new location has lost all its revision information before the move. Hope this helps. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dagdigian@ComputeFarm.com Wed Jan 3 06:12:04 2001 From: dagdigian@ComputeFarm.com (Chris Dagdigian) Date: Wed, 03 Jan 2001 01:12:04 -0500 Subject: [Bioperl-l] A couple of CVS questions. In-Reply-To: <3A528EF1.5361CEE7@alum.mit.edu> Message-ID: <5.0.2.1.0.20010103010952.00aaa260@fedayi.sonsorol.org> ftp://bioperl.org/pub/DIST/ All of our old 'official' bioperl release tarballs can be found there. Regards, Chris At 06:31 PM 1/2/01 -0800, Peter Schattner wrote: >A couple of CVS questions. > >1. How can one access earlier releases of bioperl? I haven't been able >to find them on CVS or elsewhere. Where should I be looking? From krbou@pgsgent.be Wed Jan 3 07:29:43 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Wed, 3 Jan 2001 08:29:43 +0100 Subject: [Bioperl-l] SWISS-PROT writing In-Reply-To: <3A526818.95C495BB@gnf.org>; from lapp@gnf.org on Tue, Jan 02, 2001 at 03:45:28PM -0800 References: <20010102222136.A19390@gryzo.pgsgent.be> <3A526818.95C495BB@gnf.org> Message-ID: <20010103082943.A21648@gryzo.pgsgent.be> Quoting Hilmar Lapp (lapp@gnf.org): > Kris Boulez wrote: > > > > > > - at line 356 there is > > $mol = $seq->molecule; > > I think this should be $seq->moltype; as ->molecule only looks for > > {'molecule'} which is not set by ->new. Bio::Seq->new only sets > > {'moltype'}. > > We should change the 'protein' of ->moltype to 'PRT' to conform to the > > standard. > > moltype() is internal to BioPerl. Whenever there is an attribute synonymous > to moltype() but defined by a databank, molecule() should be used for that. > So the code is correct I think. > Then documentation for Bio::Seq->molecule() should be extended a bit. It now reads molecule Title : molecule Usage : $obj->molecule($newval) Function: Returns : type of molecule (DNA, mRNA) Args : newvalue (optional) > Bio::Seq->new() indeed only sets moltype(), because at this point there is > no databank specificity. molecule() should be set by the parser. If you > want to instantiate a swissprot seq from memory and have it written in > swissprot format, the way we want to go is have dedicated classes under > Bio::Seq::*. If there is need for a swissprot-dedicated class, that one > probably would also set molecule() at instantiation time. > > > > > B.T.W. do we want to allow SWISS-PROT to try to write out DNA/RNA > > sequences ? > > In my opinion there's no need for that, but others may think differently. > > > > > - around line 369 the whole else block should be changed. We should make > > sure we have a division ($div) in the ID part. The previous version of > > the code which is now commented out did a better try at this. Looking at > > next_seq() we why we're not able to read this (entry name must contain > > an underscore section 3.1.1 of the SWISS-PROT manual). > > > > $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/ > > || $self->throw("swissprot stream with no ID. Not swissprot in my > > book"); > > $name = $1."_".$2; > > $seq->primary_id($1); > > $seq->division($2); > > > > If this is the code you're referring to (sorry, don't have at hand right > now), it does ensure that there is a division part. I'm probably missing > something. > Sorry I wasn't clear on this one obviously. The code I pasted is from next_seq(). What I was referring to is the code in write_seq(). In there we do not enforce that there is a division part (I think we should at least check if $seq->display_id() returns an underscore in a reasonable position). The code reads } else { #$temp_line = sprintf ("%10s STANDARD; %3s; %d AA.", # $seq->primary_id()."_".$div,$mol,$len); # Reconstructing the ID relies heavily upon the input source # having # been in a format that is parsed as this routine expects it -- # that is, # by this module itself. This is bad, I think, and immediately # breaks # if e.g. the Bio::DB::GenPept module is used as input. # Hence, switch to display_id(); _every_ sequence is supposed to # have # this. HL 2000/09/03 $temp_line = sprintf ("%10s STANDARD; %3s; %d AA.", $seq->display_id(), $mol, $len); } > > How standard compliant do we want to be with this. If we want to be very > > strict we should e.g. make sure the 'entry name' (first item on the ID > > line) is not more then 10 characters. > > > > P.S. (very) minor issue: the division we choose 'UNK' for sequences > > which don't have a division set is not in the standard (speclist.txt), > > it only contains UNKP > > > > Sure, can (should) be changed. > > > Should I try to adopt swiss.pm to the thoughts I (tried to) put out or > > are there major objections ? > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > of that class. > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do you plan on having a Bio::Seq::* class for every (complex) sequence type ? Kris, From jason@chg.mc.duke.edu Wed Jan 3 14:17:01 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 3 Jan 2001 09:17:01 -0500 (EST) Subject: [Bioperl-l] A couple of CVS questions. In-Reply-To: <3A52A782.392956A4@gnf.org> Message-ID: On Tue, 2 Jan 2001, Hilmar Lapp wrote: > Peter Schattner wrote: > > > > A couple of CVS questions. > > > > 1. How can one access earlier releases of bioperl? I haven't been able > > to find them on CVS or elsewhere. Where should I be looking? > > > > You can checkout based on one of version, tag, or date. You very likely > don't want to checkout a release by version, as each file has a different > version. There is a tag for the 0.6.x release branch, and also for other > releases. If you want to checkout the whole development trunk in an earlier > version, the most sensible way is probably to go by date (option -D). For > individual modules you can go either way. > > Do you have the manpages of cvs? They're actually poor compared to the > info-files cvs comes with. On a Unix box with info installed you should be > able to type 'info cvs'. > > > 2. Some modules were moved to different directories within the CVS > > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to > > Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to > > be able to find the versions of the modules made prior to the date that > > the modules were moved. Can someone tell me if these older versions are > > accessible and if so how to find them. > > The files were moved without retaining the revision history (cvs is bad at > file moving and renaming; you have to mess with the repository in order to > have cvs history preserved in this case). The version at the former > location was deleted, so you can restore it at the former place only. The > file at the new location has lost all its revision information before the > move. Many apologies, this was my stupidness for not moving the files the correct way. I wish I had waited for Hilmar's email.... Learned my lesson though.... I didn't realize we could move the RCS files (itchy trigger finger) before I moved the src files. If you look at the first date in Bio::Tools::Run::Alignment or Bio::Tools::StandAloneBlast you can see when the move occurred and then checkout with -D as some day or time before then. > > Hope this helps. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 3 14:50:53 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 3 Jan 2001 14:50:53 +0000 (GMT) Subject: [Bioperl-l] A couple of CVS questions. In-Reply-To: Message-ID: On Wed, 3 Jan 2001, Jason Stajich wrote: > On Tue, 2 Jan 2001, Hilmar Lapp wrote: > > > Peter Schattner wrote: > > > > > > A couple of CVS questions. > > > > > > 1. How can one access earlier releases of bioperl? I haven't been able > > > to find them on CVS or elsewhere. Where should I be looking? > > > > > > > You can checkout based on one of version, tag, or date. You very likely > > don't want to checkout a release by version, as each file has a different > > version. There is a tag for the 0.6.x release branch, and also for other > > releases. If you want to checkout the whole development trunk in an earlier > > version, the most sensible way is probably to go by date (option -D). For > > individual modules you can go either way. > > > > Do you have the manpages of cvs? They're actually poor compared to the > > info-files cvs comes with. On a Unix box with info installed you should be > > able to type 'info cvs'. > > > > > 2. Some modules were moved to different directories within the CVS > > > structure recently (eg Bio::Tools::Alignment::Clustalw.pm was moved to > > > Bio::Tools::Run::Alignment::Clustalw.pm ). Since then, I don't seem to > > > be able to find the versions of the modules made prior to the date that > > > the modules were moved. Can someone tell me if these older versions are > > > accessible and if so how to find them. > > > > The files were moved without retaining the revision history (cvs is bad at > > file moving and renaming; you have to mess with the repository in order to > > have cvs history preserved in this case). The version at the former > > location was deleted, so you can restore it at the former place only. The > > file at the new location has lost all its revision information before the > > move. > > Many apologies, this was my stupidness for not moving the files the > correct way. I wish I had waited for Hilmar's email.... Learned my > lesson though.... I didn't realize we could move the RCS files (itchy > trigger finger) before I moved the src files. If you look at the > first date in Bio::Tools::Run::Alignment or Bio::Tools::StandAloneBlast > you can see when the move occurred and then checkout with -D as some day > or time before then. It is, in my book, bad form to move the actual files. If you move files then CVS checkouts on old versions screw up with sometimes disasterous effects. The removal and cvs add is "The Right Way" tm in my book. > > > > > Hope this helps. > > > > Hilmar > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp@gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 3 17:20:17 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 3 Jan 2001 12:20:17 -0500 (EST) Subject: [Bioperl-l] named parameters Message-ID: This is a bit on inconsistency when we specify parameters to new in some of the bioperl modules. Whenever we don't use named parameters (ie -file=> 'filename'), we are inconsistent with the fact that all modules inherit from Bio::Root::RootI. This is because Bio::Root::RootI will parse a couple of special parameters - specifically -verbose, -strict, -name, -obj, -record_err now we really don't use these that much, however, in the case of Bio::Species one would call my @classification = qw( sapiens Homo Hominidae Catarrhini Primates Eutheria Mammalia Vertebrata Chordata Metazoa Eukaryota ) my $sp = new Bio::Species(@classification); but if one also wanted debugging turned on, one might call this my $sp = new Bio::Species(-verbose=>1, @classification); This won't bother RootI, but Bio::Species expects all the parameters to be part of the classification array. A solution is to change Bio::Species to expect named parameters so an array ref is $sp = new Bio::Species(-verbose=>1, -classification => \@classification ); What are people's reactions to this? If we can agree that this is expected then we can add this to our programming conventions wiki page. -Jason From birney@ebi.ac.uk Wed Jan 3 17:31:25 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 3 Jan 2001 17:31:25 +0000 (GMT) Subject: [Bioperl-l] test failures on main trunk Message-ID: perl 5.004_04 is failing again. Some I can fix, others Peter/Jason might want to take a peek at. They are Failed Test Status Wstat Total Fail Failed List of failed ------------------------------------------------------------------------------- t/Clustalw.t 9 1 11.11% 4 t/DB.t 0 11 ?? ?? % ?? t/Index.t 2 512 8 3 37.50% 6-8 t/SeqFeature.t 21 ?? % ?? t/TCoffee.t 9 1 11.11% 4 Failed 5/48 test scripts, 89.58% okay. -1/594 subtests failed, 100.17% okay. make: *** [test_dynamic] Error 29 riker:~/src/bioperl-live> perl t/DB/ riker:~/src/bioperl-live> perl t/DB.t IO::String not installed. This means the Bio::DB::* modules are not usable. Skipping tests. 1..1 ok 1 Segmentation fault riker:~/src/bioperl-live> perl t/Clustalw.t 1..9 Clustalw program not found as /clustalw or not executable. Clustalw can be obtained from eg- http://corba.ebi.ac.uk/Biocatalog/Alignment_Search_software.html/ ok 1 -------------------- EXCEPTION -------------------- MSG: Unallowed parameter: NEW ! CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: t/Clustalw.t STACK: Bio::Tools::Run::Alignment::Clustalw::AUTOLOAD(308) main::t/Clustalw.t(52) --------------------------------------------------- riker:~/src/bioperl-live> perl t/SeqFeature.t 1..21 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 ok 10 ok 11 ok 12 ok 13 ok 14 ok 15 ok 16 ok 17 not ok 18 ok 19 not ok 20 ok 21 ok 22 ok 23 ok 24 ok 25 ok 26 ok 27 riker:~/src/bioperl-live> perl t/TCoffee.t 1..9 TCoffee program not found as /t_coffee or not executable. TCoffee can be obtained from eg- http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/t_coffee_home_page.html ok 1 -------------------- EXCEPTION -------------------- MSG: Unallowed parameter: NEW ! CONTEXT: Error in uNKNOWN CONTEXT SCRIPT: t/TCoffee.t STACK: Bio::Tools::Run::Alignment::TCoffee::AUTOLOAD(561) main::t/TCoffee.t(55) --------------------------------------------------- I'll start to work on TCoffee/Clustalw... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 3 17:39:19 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 3 Jan 2001 17:39:19 +0000 (GMT) Subject: [Bioperl-l] named parameters In-Reply-To: Message-ID: On Wed, 3 Jan 2001, Jason Stajich wrote: > This is a bit on inconsistency when we specify parameters to new in some > of the bioperl modules. Whenever we don't use named parameters (ie > -file=> 'filename'), we are inconsistent with the fact that all modules > inherit from Bio::Root::RootI. This is because Bio::Root::RootI will > parse a couple of special parameters - specifically > -verbose, -strict, -name, -obj, -record_err > > now we really don't use these that much, however, in the case of > Bio::Species > > one would call > my @classification = qw( sapiens Homo Hominidae > Catarrhini Primates Eutheria > Mammalia Vertebrata Chordata > Metazoa Eukaryota ) > > my $sp = new Bio::Species(@classification); > > but if one also wanted debugging turned on, one might call this > my $sp = new Bio::Species(-verbose=>1, @classification); > > This won't bother RootI, but Bio::Species expects all the parameters to be > part of the classification array. > > A solution is to change Bio::Species to expect named parameters so an > array ref is > > $sp = new Bio::Species(-verbose=>1, -classification => \@classification ); > > What are people's reactions to this? If we can agree that this is > expected then we can add this to our programming conventions wiki page. I think we should stick to named parameters throughout and have it as a programming convention... > > -Jason > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 3 17:52:50 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 3 Jan 2001 17:52:50 +0000 (GMT) Subject: [Bioperl-l] test failures on main trunk In-Reply-To: Message-ID: Ok. My mistake - we are failing tests but not in the way that I described... TCoffee/ClustalW is waiting on RootI reorganisation, currently being led by Jason SeqFeature was a trivial addition of 21 --> 27 tests to run for the new computation object. Index has a weird dependancy on IO::String - why is this? Who needs IO::String in Index? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 3 17:53:53 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 03 Jan 2001 09:53:53 -0800 Subject: [Bioperl-l] A couple of CVS questions. References: Message-ID: <3A536731.3A7DAA56@gmx.net> Ewan Birney wrote: > > It is, in my book, bad form to move the actual files. If you move files > then CVS checkouts on old versions screw up with sometimes disasterous > effects. > > The removal and cvs add is "The Right Way" tm in my book. > Well, I'm certainly not a CVS expert but when I wrote that you can move the repository files I only quoted the recommendation given in the CVS documentation (the info files that come with it). If you think applying this recommendation can have disastrous effects you should probably write to the CVS people to take this out of their documentation, or better yet, to put in a warning. I'm still not sure what could cause the disastrous effect, as the revision file does not keep any directory information (I may be wrong here though, but I haven't seen any dir info in such files yet), and there is no 'central database' that keeps track of which file is where. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 3 17:57:17 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 3 Jan 2001 17:57:17 +0000 (GMT) Subject: [Bioperl-l] A couple of CVS questions. In-Reply-To: <3A536731.3A7DAA56@gmx.net> Message-ID: On Wed, 3 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > It is, in my book, bad form to move the actual files. If you move files > > then CVS checkouts on old versions screw up with sometimes disasterous > > effects. > > > > The removal and cvs add is "The Right Way" tm in my book. > > > > Well, I'm certainly not a CVS expert but when I wrote that you can > move the repository files I only quoted the recommendation given > in the CVS documentation (the info files that come with it). If > you think applying this recommendation can have disastrous effects > you should probably write to the CVS people to take this out of > their documentation, or better yet, to put in a warning. > > I'm still not sure what could cause the disastrous effect, as the > revision file does not keep any directory information (I may be > wrong here though, but I haven't seen any dir info in such files > yet), and there is no 'central database' that keeps track of which > file is where. Yeah, but then what happens is that in OldRelease (real) StableFile XX::YY says use AA:BB File AA::BB is there We now move AA:BB to CC:BB *in the repository* if we checkout the old release we get StableFile XX::YY says use AA:BB File AA::BB ** IS NOT THERE ** File CC::BB is there, but is named wrong! So it is ok from a cvs perspective, but it sucks from a code management perspective! if you cvs remove, cvs add this does not happen. Traditionally you put in your log on the cvs add that is has just come from XXXX, allowing people to track the history ... > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 3 18:19:48 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 03 Jan 2001 10:19:48 -0800 Subject: [Bioperl-l] A couple of CVS questions. References: Message-ID: <3A536D44.CA7CABE2@gmx.net> Ewan Birney wrote: > > Yeah, but then what happens is that in > > OldRelease (real) > > StableFile XX::YY says use AA:BB > File AA::BB is there > > We now move AA:BB to CC:BB *in the repository* > > if we checkout the old release we get > > StableFile XX::YY says use AA:BB > File AA::BB ** IS NOT THERE ** > File CC::BB is there, but is named wrong! > > So it is ok from a cvs perspective, but it sucks from a code management > perspective! > > if you cvs remove, cvs add this does not happen. Traditionally you put in > your log on the cvs add that is has just come from XXXX, allowing people > to track the history ... > I see. You could still copy the repository file to the new location, and then cvs remove it from the old. But then, you probably don't want people to be able to restore a previous version at a place where that version didn't sit. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 3 18:21:40 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 03 Jan 2001 10:21:40 -0800 Subject: [Bioperl-l] named parameters References: Message-ID: <3A536DB4.DE3DC640@gmx.net> Jason Stajich wrote: > > > A solution is to change Bio::Species to expect named parameters so an > array ref is > > $sp = new Bio::Species(-verbose=>1, -classification => \@classification ); > > What are people's reactions to this? If we can agree that this is > expected then we can add this to our programming conventions wiki page. > Yes, certainly. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 3 19:12:07 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 3 Jan 2001 14:12:07 -0500 (EST) Subject: [Bioperl-l] RootI migration and other changes Message-ID: Hilmar, Ewan, and I came up with the scheme for handling Bio::Root::RootI and all this obnoxious initializations. My apologies for not keeping the list more in the loop, but this was actually really boring. So I have checked in changes that should meet this new spec. There are some parts that were a little tricky, but all the tests pass so the behaviour appears to be consistent. In additione making the changes necessary for the move to a chained new rather than chained _initialize I revamped some modules that needed updating. Here is a summary to the best of my recollection. t/ - I updated some the tests on an ad hoc basis to using the perl Test module. more info on it perldoc Test. I hope this will make test writing even easier so that those interested can jump in and write a test (This might be a good way to get acquainted with a module if you are wanting to contribute to the project). Bio::Tools::Run - this new directory is for modules that serve as wrappers to call outside programs. We should try and have all modules that execute external programs residing in this dir or its subdirs. I added some code using File::Spec to standardize how pathnames to executeables are located. I am not sure if we can expect File::Spec to always be installed in a perl distribution (IT SHOULD BE!), so I revert back to the original way of constructing paths (assuming unix style directory separators '/'). Some cleaning up and standardization. Actually we need to write a module Bio::Tools::Run.pm that will serve as a framework for all modules that execute external programs. There is much code redundancy in these modules right now. Bio::Species - now use named parameter for classification this required updates to a test and some of the SeqIO modules. Bio::SeqFeature::* - I worked on Mark's Computation object a little to take advantage of inheritance, there are still some noises being made in t/SeqFeature with the new tests Ewan added so I'll try and track those down. I also did some work so that feature1 and feature2 of SimilarityPair always return something valid even if you have not initialized it. This was necessary because of the order parameters are set when a subclass is instantiated (ie look at the Bio::Tools::Sim4::Exon heirarchy and trace the calls to new() and you'll start to see what was happening). This was due to our move to chained new(), but it works now so no worries. Bio::AlignIO::clustalw - now supports read and writing of clustalw alignments - only supported writing before. This should work for both clustal 1.4 and 1.8 Bio::SearchDist - I added a test for this - I have not actually had luck loading it on my machine lately so I have written a very simple test that will skip if it cannot load the Bio::Ext::Align module. Bio::SeqIO - genbank/embl/swiss I added the verbose parameters to new Bio::FTHelper(-verbose => $self->verbose) and when instantiating the new Seq so that it will not print the warnings when vebose is set to -1 for the SeqIO object. Bio::DB::GDB - a new module that will query the website www.gdb.org and return simple things (what I needed which was for a markername, the pcrprimers and length of product. This will get much improved later on as we develop objects for storing Markers and other information. This will fail if you overload the GDB server (Trust me I know...) I'm still tinkering with it so the tests may not pass 100% of the time. We can decide if it is good enough to include in the release (I'm not sure yet). It's hairy HTML parsing in there. There are some modules I did not touch - UnivAln, Bio::Tools::Blast, which depend on Bio::Root::Object. We're going to have to decide what we want to do here in the future, but that may not be a job we try to complete for 0.7 release. -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Wed Jan 3 20:33:32 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 03 Jan 2001 12:33:32 -0800 Subject: [Bioperl-l] SWISS-PROT writing References: <20010102222136.A19390@gryzo.pgsgent.be> <3A526818.95C495BB@gnf.org> <20010103082943.A21648@gryzo.pgsgent.be> Message-ID: <3A538C9C.644E1D6E@gnf.org> Kris Boulez wrote: > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > of that class. > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > Yes, we plan to have a specialized class for every databank, for which the attributes its entries carry are not sufficiently reflected in Bio::Seq.pm or an already existing class under Bio::Seq::*. This enables us to free the basic Seq object from definitions that only pertain to databanks and don't make up the essentials of a biological sequence. So, molecule(), division() etc will be eventually moved away from Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of 2, meaning that we want it, but we may decide to skip it this time in order to get the release out of the door. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 3 21:12:54 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 3 Jan 2001 21:12:54 +0000 (GMT) Subject: [Bioperl-l] SWISS-PROT writing In-Reply-To: <3A538C9C.644E1D6E@gnf.org> Message-ID: On Wed, 3 Jan 2001, Hilmar Lapp wrote: > Kris Boulez wrote: > > > > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > > of that class. > > > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > > > > Yes, we plan to have a specialized class for every databank, for which the > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm > or an already existing class under Bio::Seq::*. This enables us to free the > basic Seq object from definitions that only pertain to databanks and don't > make up the essentials of a biological sequence. > > So, molecule(), division() etc will be eventually moved away from > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of > 2, meaning that we want it, but we may decide to skip it this time in order > to get the release out of the door. For GenBank/EMBL I have prototype code to check in over here. Looks fine to me. Swissprot probably needs its own class. there is a valid debate about whether swissprot and genbank/embl should inheriet off a common base class of "rich database sequence objects" (eg, division is the same) or we should just say that they are different enough not to stretch this. I hae not done anything on swissprot. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From wfish82@hotmail.com Thu Jan 4 00:59:55 2001 From: wfish82@hotmail.com (Fish Fish) Date: Thu, 04 Jan 2001 00:59:55 -0000 Subject: [Bioperl-l] Bio::Tools::Blast; Message-ID: Hi, I am trying to pick out those blast results saying "***** No hits found *****", among many other things. But I can't get it work with Bio::Tools::Blast. Can somebody point out what is wrong in the following code? Also, it seems if the first of a multi blast record is a "No hits found", then the 2nd record will be skipped. Thanks in advance! wfish82 ********************************** #!/usr/local/bin/perl -w use strict; use Bio::SeqIO; use Bio::Tools::Blast qw(:obj); my $blastn=$ARGV[0]; my %blastParam=( -file => $blastn, -parse => 1, -filt_func => \&filter, -min_len => 50, -check_all_hits => 0, -strict => 0, -stats => 0, -best => 0, -share => 0, -exec_func => \&process_blast, ); $Blast->parse(%blastParam); sub filter{ my $hit=shift; if(! defined $hit){ print "blahblah...\n"; }else{ return 1; } } sub process_blast{ my $blastObj=shift; if(! defined $blastObj->hit){ printf "BLAHBLAH...\n"; } $blastObj->destroy; } ####################################### # end ############################# _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com From krbou@pgsgent.be Thu Jan 4 08:04:08 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Thu, 4 Jan 2001 09:04:08 +0100 Subject: [Bioperl-l] SWISS-PROT writing In-Reply-To: ; from birney@ebi.ac.uk on Wed, Jan 03, 2001 at 09:12:54PM +0000 References: <3A538C9C.644E1D6E@gnf.org> Message-ID: <20010104090408.B27520@gryzo.pgsgent.be> Quoting Ewan Birney (birney@ebi.ac.uk): > On Wed, 3 Jan 2001, Hilmar Lapp wrote: > > > Kris Boulez wrote: > > > > > > > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > > > of that class. > > > > > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > > > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > > > > > > > Yes, we plan to have a specialized class for every databank, for which the > > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm > > or an already existing class under Bio::Seq::*. This enables us to free the > > basic Seq object from definitions that only pertain to databanks and don't > > make up the essentials of a biological sequence. > > > > So, molecule(), division() etc will be eventually moved away from > > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of > > 2, meaning that we want it, but we may decide to skip it this time in order > > to get the release out of the door. > > For GenBank/EMBL I have prototype code to check in over here. Looks fine > to me. Swissprot probably needs its own class. > > > there is a valid debate about whether swissprot and genbank/embl should > inheriet off a common base class of "rich database sequence objects" (eg, > division is the same) or we should just say that they are different enough > not to stretch this. I hae not done anything on swissprot. > > Last night I thought a bit more about this and have some questions. - will these objects also inherit from Bio::Seq ? - if yes, will these objects be created like my $swiss_seq = Bio::Seq->new( ..., -format => 'swiss'); or my $swiss_seq = Bio::Seq::swiss->new( .. ); - will it be possible to 'promote' a Bio::Seq object to one of these new objects ? Kris, From birney@ebi.ac.uk Thu Jan 4 09:26:59 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 4 Jan 2001 09:26:59 +0000 (GMT) Subject: [Bioperl-l] SWISS-PROT writing In-Reply-To: <20010104090408.B27520@gryzo.pgsgent.be> Message-ID: On Thu, 4 Jan 2001, Kris Boulez wrote: > Quoting Ewan Birney (birney@ebi.ac.uk): > > On Wed, 3 Jan 2001, Hilmar Lapp wrote: > > > > > Kris Boulez wrote: > > > > > > > > > > > > > > See above. I'm not sure what we already have in the Bio::Seq::* hierarchy. > > > > > If there's no Swiss.pm yet and GenBank/GenPept doesn't fit well, you could > > > > > give Bio::Seq::Swiss.pm a start and adopt the parser to instantiate objects > > > > > of that class. > > > > > > > > > The only thing we have now is Bio::Seq::LargeSeq en LargePrimarySeq. Do > > > > you plan on having a Bio::Seq::* class for every (complex) sequence type ? > > > > > > > > > > Yes, we plan to have a specialized class for every databank, for which the > > > attributes its entries carry are not sufficiently reflected in Bio::Seq.pm > > > or an already existing class under Bio::Seq::*. This enables us to free the > > > basic Seq object from definitions that only pertain to databanks and don't > > > make up the essentials of a biological sequence. > > > > > > So, molecule(), division() etc will be eventually moved away from > > > Bio::Seq.pm. This is even in the task list for 0.7, but with a priority of > > > 2, meaning that we want it, but we may decide to skip it this time in order > > > to get the release out of the door. > > > > For GenBank/EMBL I have prototype code to check in over here. Looks fine > > to me. Swissprot probably needs its own class. > > > > > > there is a valid debate about whether swissprot and genbank/embl should > > inheriet off a common base class of "rich database sequence objects" (eg, > > division is the same) or we should just say that they are different enough > > not to stretch this. I hae not done anything on swissprot. > > > > > Last night I thought a bit more about this and have some questions. > > - will these objects also inherit from Bio::Seq ? yes. > > - if yes, will these objects be created like > my $swiss_seq = Bio::Seq->new( ..., -format => 'swiss'); > No. They will be created though from my $swiss_seq_io = Bio::SeqIO->new( -format => 'swiss' ) ; $swiss_seq = $swiss_seq_io->next_seq; > or > > my $swiss_seq = Bio::Seq::swiss->new( .. ); > This will be achievable. > - will it be possible to 'promote' a Bio::Seq object to one of these new > objects ? > yes.... > > > Kris, > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From heikki@ebi.ac.uk Thu Jan 4 10:38:43 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu, 04 Jan 2001 10:38:43 +0000 Subject: [Bioperl-l] test framework References: Message-ID: <3A5452B3.4B2CEF66@ebi.ac.uk> Since it is already in perl 5.004, there should be no reason not to use it. I tried it yesterday it really cleans up test code nicely. I am going to use it in the future. -Heikki Jason Stajich wrote: > > while I'm messing with it, does anyone have objections to using the built > in perl Test module available since perl 5.004 rather than our > > I agree it is wasted time to constantly move things from one test suite to > another ( I already tried to standardize our existing ones as best as > possible). But a nice standard makes it easier for new people to write > tests and make them fit. Any comments? > > sub test ($$;$) { > my($num, $true,$msg) = @_; > print($true ? "ok $num\n" : "not ok $num $msg\n"); > } > > [ from perldoc Test ] > > use strict; > use Test; > > # use a BEGIN block so we print our plan before MyModule is loaded > BEGIN { plan tests => 14, todo => [3,4] } > > # load your module... > use MyModule; > > ok(0); # failure > ok(1); # success > > ok(0); # ok, expected failure (see todo list, above) > ok(1); # surprise success! > > ok(0,1); # failure: '0' ne '1' > ok('broke','fixed'); # failure: 'broke' ne 'fixed' > ok('fixed','fixed'); # success: 'fixed' eq 'fixed' > ok('fixed',qr/x/); # success: 'fixed' =~ qr/x/ > > ok(sub { 1+1 }, 2); # success: '2' eq '2' > ok(sub { 1+1 }, 3); # failure: '2' ne '3' > ok(0, int(rand(2)); # (just kidding :-) > > my @list = (0,0); > ok @list, 3, "\@list=".join(',',@list); #extra diagnostics > ok 'segmentation fault', '/(?i)success/'; #regex match > > skip($feature_is_missing, ...); #do platform specific test > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From hlapp@gmx.net Thu Jan 4 17:33:52 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 04 Jan 2001 09:33:52 -0800 Subject: [Bioperl-l] SWISS-PROT writing References: Message-ID: <3A54B400.DE42919E@gmx.net> Ewan Birney wrote: > > For GenBank/EMBL I have prototype code to check in over here. Looks fine > to me. Swissprot probably needs its own class. > > there is a valid debate about whether swissprot and genbank/embl should > inheriet off a common base class of "rich database sequence objects" (eg, > division is the same) or we should just say that they are different enough > not to stretch this. I hae not done anything on swissprot. > There are probably enough attributes shared (division, molecule, date, secondary accessions, maybe revision of the sequence, ...) to justify creating a rich sequence base class. This would also others wishing to add another rich seq class get started quickly. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Mon Jan 8 09:42:20 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 08 Jan 2001 01:42:20 -0800 Subject: [Bioperl-l] make test Message-ID: <3A598B7C.D911DBF8@gmx.net> make test presently reveals the following problems (I'm running Perl 5.005003 on Linux 2.2.10). t/Chain.............Warning chain2string: argument LAST:6 overriding LEN:4! at blib/lib/Bio/LiveSeq/Chain.pm line 184. Does this have any significance? There were a couple of others which I (and Ewan and Jason) could fix. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From heikki@ebi.ac.uk Mon Jan 8 10:18:01 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon, 08 Jan 2001 10:18:01 +0000 Subject: [Bioperl-l] Re: make test References: <3A598B7C.D911DBF8@gmx.net> Message-ID: <3A5993D9.C863A267@ebi.ac.uk> Hilmar, The warning is intentional, but I agree it looks alarming to anyone installing bioperl. Test code uses a value outside existing positions. Can you think a way of rewriting the test so that it does not print it out? -Heikki Hilmar Lapp wrote: > > make test presently reveals the following problems (I'm running > Perl 5.005003 on Linux 2.2.10). > > t/Chain.............Warning chain2string: argument LAST:6 > overriding LEN:4! at blib/lib/Bio/LiveSeq/Chain.pm line 184. > > Does this have any significance? > > There were a couple of others which I (and Ewan and Jason) could > fix. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From insana@ebi.ac.uk Mon Jan 8 13:33:13 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Mon, 8 Jan 2001 13:33:13 +0000 (GMT) Subject: [Bioperl-l] Re: make test In-Reply-To: <3A5993D9.C863A267@ebi.ac.uk> Message-ID: > The warning is intentional, but I agree it looks alarming to anyone > installing bioperl. Test code uses a value outside existing positions. > Can you think a way of rewriting the test so that it does not print it > out? Ok, I will change that test not to create the warning. But the whole point of that test was to get that warning and see it was working as expected. Jos From jason@chg.mc.duke.edu Mon Jan 8 13:57:33 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 8 Jan 2001 08:57:33 -0500 (EST) Subject: [Bioperl-l] Re: make test In-Reply-To: Message-ID: If you made your warnings come from bioperl objects (ie $obj->warn() ) we can turn them off by setting the verbose level on the object (ie $obj->verbose(-1) turns off all warnings). This means you objects have to inherit from Bio::Root::RootI. I didn't change the LiveSeq or Variation objects when I updated all for Bio::Root::RootI chained new for the other modules in the repository because I didn't know what your feelings were on this. Do you want to check to see that the error is thrown or just that the routine returns the correct value? -Jason On Mon, 8 Jan 2001, Joseph Insana wrote: > > The warning is intentional, but I agree it looks alarming to anyone > > installing bioperl. Test code uses a value outside existing positions. > > Can you think a way of rewriting the test so that it does not print it > > out? > > Ok, I will change that test not to create the warning. > But the whole point of that test was to get that warning and see it was > working as expected. > > Jos > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From insana@ebi.ac.uk Mon Jan 8 14:09:35 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Mon, 8 Jan 2001 14:09:35 +0000 (GMT) Subject: [Bioperl-l] Re: make test In-Reply-To: Message-ID: > (ie $obj->verbose(-1) turns off all warnings). This means you objects > have to inherit from Bio::Root::RootI. I don't want my objects to inherit from RootI. They are independent and I'd like to have them stay independent. > Do you want to check to see that the error is thrown or just that the > routine returns the correct value? I wanted to check that the third argument ("last") would always override the second argument ("length") since that is the way the method is supposed to work. I am now going to commit a version that won't produce the warning and will check something else. Joseph From birney@ebi.ac.uk Mon Jan 8 15:08:57 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Mon, 8 Jan 2001 15:08:57 +0000 (GMT) Subject: [Bioperl-l] Re: make test In-Reply-To: Message-ID: On Mon, 8 Jan 2001, Joseph Insana wrote: > > (ie $obj->verbose(-1) turns off all warnings). This means you objects > > have to inherit from Bio::Root::RootI. > > I don't want my objects to inherit from RootI. > They are independent and I'd like to have them stay independent. This is cool (I completely understand). I think we should consider moving the variation into its own cvs module, which means that Joseph and Heikki are not tied to the bioperl release schedule etc. This is for post 0.7 branching in my view (Hilmar to make the call). > > > Do you want to check to see that the error is thrown or just that the > > routine returns the correct value? > > I wanted to check that the third argument ("last") would always override > the second argument ("length") since that is the way the method is supposed > to work. > I am now going to commit a version that won't produce the warning > and will check something else. > > Joseph > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Mon Jan 8 18:51:16 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 08 Jan 2001 10:51:16 -0800 Subject: [Bioperl-l] Re: make test References: Message-ID: <3A5A0C24.6057A0DD@gmx.net> Joseph Insana wrote: > > > The warning is intentional, but I agree it looks alarming to anyone > > installing bioperl. Test code uses a value outside existing positions. > > Can you think a way of rewriting the test so that it does not print it > > out? > > Ok, I will change that test not to create the warning. > But the whole point of that test was to get that warning and see it was > working as expected. > As I understand from your and Heikki's replies in your test you wanted the overriding thing to happen, be accepted (even though a warning was triggered), and the code be able to handle it. I'm not sure what you did by your change of the test, but it looks like you simply don't test that feature anymore. If you do want to keep the warning in the code (and not turn it into an exception, which means to me that the call itself may indicate an error on the client side, but in some cases may be totally sensible), what if you print a message before the test that a warning should be expected? If you feel confident with removing the warning message, what if you test afterwards that your code dealt with the overriding thing as you expected it to do? Just my two pennies. I didn't want to suggest that anyone turns off a test of his code. I just think that a warning message being printed is not really a measurable test result (i.e., it should be either 'passed' or 'failed'). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Mon Jan 8 19:04:04 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 08 Jan 2001 11:04:04 -0800 Subject: [Bioperl-l] Re: make test References: Message-ID: <3A5A0F24.64D86538@gmx.net> Ewan Birney wrote: > > On Mon, 8 Jan 2001, Joseph Insana wrote: > > > > (ie $obj->verbose(-1) turns off all warnings). This means you objects > > > have to inherit from Bio::Root::RootI. > > > > I don't want my objects to inherit from RootI. > > They are independent and I'd like to have them stay independent. > > This is cool (I completely understand). I think we should consider moving > the variation into its own cvs module, which means that Joseph and Heikki > are not tied to the bioperl release schedule etc. > > This is for post 0.7 branching in my view (Hilmar to make the call). > I'm not sure what you mean by post-0.7 branching. I agree that under these premises the Variation code should probably better go into into its own module, even though it's a pity. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From schattner@alum.mit.edu Mon Jan 8 19:44:08 2001 From: schattner@alum.mit.edu (Peter Schattner) Date: Mon, 08 Jan 2001 11:44:08 -0800 Subject: [Bioperl-l] Initial draft of bioperl tutorial committed References: Message-ID: <3A5A1887.B6570706@alum.mit.edu> Hello all I have committed an initial draft of an introductory bioperl tutorial (called "bptutorial.pl") to the bioperl-live (main) repository. The draft tutorial pretty much follows the outline from my proposal: http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html One addition to the original proposal is that I have included an "appendix" which is a working script that demonstrates most of the bioperl features described in the tutorial. (The script is largely cut-and-pasted from various test and example files with print statements added to make it clearer as to what is going on). I believe that having a clear and accurate tutorial could make bioperl more accessible and widely used. On the other hand, if the tutorial is confusing or contains mistakes, it will turn people away from trying bioperl (and probably be worse than not having one at all). So I have a request. I would appreciate it if some of you would read the tutorial and give me feedback in terms of clarity and accuracy. I am interested in both general comments (eg "this section is too long - cut out such-and-such" or "this module description fits better in this section" or "this module will not be included in the 0.7 release so don't include it" ) and specific places where there are errors or misleading or confusing statements. (If you think that the tutorial is clear and/or that specific parts are particularly helpful I'd of course be happy to get that feedback too :--). Suggestions on improving the formatting would also be appreciated. I would definitely like feedback from people who have written modules which are in the 0.7 release to make sure that I have captured your intent and the proper usage of your module(s). I would also like comments from folks who are simply bioperl users and, ideally, from a few people who haven't used bioperl much before to see in what ways the tutorial makes it easier to use or get started using bioperl (or doesn't). Feel free to write to me directly at schattner@alum.mit.edu or via this list. Thanks. If you just want to look at the tutorial, you can view it through the web browsable CVS at : http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. (Note: you may need to view the tutorial through a word processor to get the lines to wrap properly and to get rid of extra '^M's. If someone can tell me how I need to reformat the file so this is not necessary I'd be grateful.) If you want to also run the tutorial script, you will need to have a copy of CVS "bioperl-live". The tutorial script will *not* work with release 0.6. (Note that the contents of bioperl-live are being updated often so some of the demo scripts may fail - they're working for me now and if they start failing I'd appreciate finding out). Cheers Peter From jason@chg.mc.duke.edu Mon Jan 8 21:10:26 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 8 Jan 2001 16:10:26 -0500 (EST) Subject: [Bioperl-l] ORF identification/prediction Message-ID: To the best of my knowledge, we don't currently have bioperl modules that predict/identify (depending on your confidence in the software =) Open Reading Frames. Eric and I were thinking of working on a bioperl module for this. Any suggestions, known pitfalls, etc are welcomed. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Mon Jan 8 22:55:10 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Mon, 08 Jan 2001 14:55:10 -0800 Subject: [Bioperl-l] ORF identification/prediction References: Message-ID: <3A5A454E.AC61F160@gnf.org> Jason Stajich wrote: > > To the best of my knowledge, we don't currently have bioperl modules that > predict/identify (depending on your confidence in the software =) Open > Reading Frames. Eric and I were thinking of working on a bioperl module > for this. Any suggestions, known pitfalls, etc are welcomed. > There is the Bio::Tools::ESTScan module, which obviously relies on ESTScan as the ORF predicting external tool. If you plan to implement a full-fledged ORF prediction algorithm in perl that module is not what you want. (BTW ESTScan consists of a driver layer in Perl; the core of the algorithm is written in C. One could try to integrate/rewrite the driver layer into/in Bioperl.) Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From fernan@iib.unsam.edu.ar Mon Jan 8 22:58:17 2001 From: fernan@iib.unsam.edu.ar (Fernan Aguero) Date: Mon, 8 Jan 2001 19:58:17 -0300 Subject: [Bioperl-l] ORF identification/prediction In-Reply-To: ; from jason@chg.mc.duke.edu on Mon, Jan 08, 2001 at 18:10:26 -0300 References: Message-ID: <20010108195817.A1029@iib4.iib.unsam.edu.ar> Currently I am calling getorf (from the EMBOSS package) in my scripts to do this for me. [fernan@iib4 fernan]$ getorf -h Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA Optional qualifiers: -table list Code to use -minsize integer Minimum nucleotide size of ORF to report -find list This is a small menu of possible output options. The first four options are to select either the protein translation or the original nucleic acid sequence of the open reading frame. There are two possible definitions of an open reading frame: it can either be a region that is free of STOP codons or a region that begins with a START codon and ends with a STOP codon. The last three options are probably only of interest to people who wish to investigate the statistical properties of the regions around potential START or STOP codons. The last option assumes that ORF lengths are calculated between two STOP codons. Advanced qualifiers: -[no]methionine bool START codons at the beginning of protein products will usually code for Methionine, despite what the codon will code for when it is internal to a protein. This qualifier sets all such START codons to code for Methionine by default. -circular bool Is the sequence circular -[no]reverse bool Set this to be false if you do not wish to find ORFs in the reverse complement of the sequence. -flanking integer If you have chosen one of the options of the type of sequence to find that gives the flanking sequence around a STOP or START codon, this allows you to set the number of nucleotides either side of that codon to output. If the region of flanking nucleotides crosses the start or end of the sequence, no output is given for this codon. What i find annoying about EMBOSS apps is that the -h (-help) option prints limited information (unless the options are 'boolean' or 'integer' you don't know what to put there). You have to go to EMBOSS web site to look for extended help! Hope this helps, Fernan On Mon, 08 Jan 2001 18:10:26 Jason Stajich wrote: > To the best of my knowledge, we don't currently have bioperl modules > that > predict/identify (depending on your confidence in the software =) Open > Reading Frames. Eric and I were thinking of working on a bioperl > module > for this. Any suggestions, known pitfalls, etc are welcomed. > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ -- # --------------------------------------------------------- # # _ # # Fernan Aguero | / \ # # Bioinformatics | ASCII \ / against # # IIB-UNSAM | ribbon / HTML # # fernan@iib.unsam.edu.ar | campaign / \ email # # ICQ 100325972 | / \ # # # # --------------------------------------------------------- # From nirav@public.arl.Arizona.EDU Mon Jan 8 23:27:11 2001 From: nirav@public.arl.Arizona.EDU (nirav@public.arl.Arizona.EDU) Date: Mon, 08 Jan 2001 16:27:11 -0700 (MST) Subject: [Bioperl-l] EMBOSS -h Was : ORF identification/prediction In-Reply-To: <20010108195817.A1029@iib4.iib.unsam.edu.ar> References: <20010108195817.A1029@iib4.iib.unsam.edu.ar> Message-ID: <978996431.3a5a4ccfbe704@email.arl.arizona.edu> Quoting Fernan Aguero : . > > What i find annoying about EMBOSS apps is that the -h (-help) option > prints limited information (unless the options are 'boolean' or > 'integer' you don't know what to put there). You have to go to EMBOSS > web site to look for extended help! > use tfm for detailed help in EMBOSS regards, Nirav From dblock@gene.pbi.nrc.ca Tue Jan 9 07:50:42 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 9 Jan 2001 01:50:42 -0600 (CST) Subject: [Bioperl-l] [Poop-group] RELEASE: Alzabo 0.20 (fwd) Message-ID: Just something to think about. Anybody play with this? Would it work with BioPerl Objects? Have we been re-inventing a wheel here? Up late, thinking out loud. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan ---------- Forwarded message ---------- Date: Tue, 9 Jan 2001 00:18:32 -0600 (CST) From: Dave Rolsky To: poop-group@lists.sourceforge.net, poop-scoop@lists.sourceforge.net Subject: [Poop-group] RELEASE: Alzabo 0.20 (fwd) Alzabo is a data modelling tool and OO-RDBMS mapper written in Perl. This release includes a lot of changes, both internal and external. Users who have older schemas saved to disk will need the eg/convert.pl utility included with this release. Existing users should also make sure to note the deprecations and incompatibilities detailed at the bottom of the change list. Among the most visible changes/updates are a fairly large amount of documentation revamping and support for Postgres. Alzabo is available from either CPAN or http://www.sourceforge.net/projects/alzabo/ The Alzabo homepage is at http://alzabo.sourceforge.net/. The documentation can be read online at http://alzabo.sourceforge.net/docs/. This is a good place to start for those curious about what Alzaob does. Changes -------------- 0.20 - Preliminary Postgres support. There is no support yet for constraints or foreign keys when reverse engineering or making SQL. There is also no support for large objects (I'm hoping that 7.1 will be released soon so I won't have to think about this). Otherwise, the support is about at the same level as MySQL support, though less mature. - Added Alzabo::MethodMaker module. This can be used to auto-generate useful methods for your schema/table/row objects based on the properties of your objects themselves. - Reworking/expanding/clarifying/editing of the docs. - Add sort_by and limit options whenever creating a cursor. - Method documentation POD from the Alzabo::* modules is merged into the relevant Alzabo::Create::* and Alzabo::Runtime::* modules during install. This should make it easier to find what you need since the average user will only need to look at a few modules in Alzabo::Runtime::*. - Reworked exceptions so they are all now Alzabo::Exception::Something. - Added default as a column attribute (thus there are now Alzabo::Column->default and Alzabo::Create::Column->set_default methods). - Added length & precision attributes for columns. Both are set through the Alzabo::Create::Column->set_length method. - This release includes a script in eg/ called convert.pl to convert older schemas. - Alzabo::Schema->tables & Alzabo::Table->columns now take an optional list of tables/columns as an argument and return a list of matching objects. - Added Alzabo::Column->has_attribute method. - The data browser has actually lost some functionality (the filtering). Making this more powerful is a fairly low priority at the moment. - Fix bugs where extra params passed to Alzabo::Runtime::Table->insert were not making it to the Alzabo::Runtime::Row->new method. - Fix for Alzabo::Runtime::Table->set_prefetch method. - Fixed bug in handling of deleted object in Alzabo::ObjectCacheIPC (they were never reported as deleted). - Fix bug that caused schema to get bigger every time it was saved. - Finally switched to regular hashes for objects. - Added Alzabo::SQLMaker classes to handle generating SQL in a cross-platform compatible way. DEPRECATIONS: - Parameters for Alzabo::Create::Column->new: 'null' parameter is now 'nullable'. The use of the parameter 'null' is deprecated. - Alzabo::Column->null & Alzabo::Column->set_null methods are now Alzabo::Column->nullable & Alzabo::Column->set_nullable. The old methods are deprecated. - Alzabo::Create::ForeignKey->new no longer requires table_from & table_to params (it took me this long to realize I can get that from the column passed in. doh!) INCOMPATIBILITIES: - Alzabo::Runtime::Table->rows_where parameters have changed. The from parameter has been removed (use the Alzabo::Runtime::Schema->join method instead). The where parameter expects something different now. - Alzabo::Runtime::Table->rows_by_where_clause method has been removed. - Alzabo::Runtime::Schema->join method's where parameter expects something different. /*================== www.urth.org We await the New Sun ==================*/ _______________________________________________ Poop-group mailing list Poop-group@lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/poop-group From heikki@ebi.ac.uk Tue Jan 9 09:29:23 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue, 09 Jan 2001 09:29:23 +0000 Subject: [Bioperl-l] Re: make test References: <3A5A0F24.64D86538@gmx.net> Message-ID: <3A5AD9F3.652A421E@ebi.ac.uk> Ewan propbly means that Variation code should be part of the main bioperl cvs but should form a separate module after 0.7 is out. I do not think this a good idea. I'd like to keep Variation and LiveSeq namespaces within Bioperl main distribution. There is an issue of Ensembl needing a copy of Variation code which would favour moving thing over to a saparate module but it can be handled by other means: e.g. by copying the objects over temporarily. -Heikki Hilmar Lapp wrote: > > Ewan Birney wrote: > > > > On Mon, 8 Jan 2001, Joseph Insana wrote: > > > > > > (ie $obj->verbose(-1) turns off all warnings). This means you objects > > > > have to inherit from Bio::Root::RootI. > > > > > > I don't want my objects to inherit from RootI. > > > They are independent and I'd like to have them stay independent. > > > > This is cool (I completely understand). I think we should consider moving > > the variation into its own cvs module, which means that Joseph and Heikki > > are not tied to the bioperl release schedule etc. > > > > This is for post 0.7 branching in my view (Hilmar to make the call). > > > > I'm not sure what you mean by post-0.7 branching. I agree that > under these premises the Variation code should probably better go > into into its own module, even though it's a pity. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From birney@ebi.ac.uk Tue Jan 9 09:30:28 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 9 Jan 2001 09:30:28 +0000 (GMT) Subject: [Bioperl-l] Re: make test In-Reply-To: <3A5AD9F3.652A421E@ebi.ac.uk> Message-ID: On Tue, 9 Jan 2001, Heikki Lehvaslaiho wrote: > Ewan propbly means that Variation code should be part of the main > bioperl cvs but should form a separate module after 0.7 is out. I do > not think this a good idea. I'd like to keep Variation and LiveSeq > namespaces within Bioperl main distribution. I am cool with this as well. . From heikki@ebi.ac.uk Tue Jan 9 10:28:29 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue, 09 Jan 2001 10:28:29 +0000 Subject: [Bioperl-l] Initial draft of bioperl tutorial committed References: <3A5A1887.B6570706@alum.mit.edu> Message-ID: <3A5AE7CD.D6671E36@ebi.ac.uk> Dear Peter, Wonderful! Thank you very much for writing the tutorial. Before any of us goes into details I though it best to wrap the words and remove ^Ms for easier viewing. CVS is happier with short lines, too. This was easy enough to do in emacs. Thanks again, -Heikki Peter Schattner wrote: > > Hello all > > I have committed an initial draft of an introductory bioperl tutorial > (called "bptutorial.pl") to the bioperl-live (main) repository. The > draft tutorial pretty much follows the outline from my proposal: > http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html > One addition to the original proposal is that I have included an > "appendix" which is a working script that demonstrates most of the > bioperl features described in the tutorial. (The script is largely > cut-and-pasted from various test and example files with print statements > added to make it clearer as to what is going on). > > I believe that having a clear and accurate tutorial could make bioperl > more accessible and widely used. On the other hand, if the tutorial is > confusing or contains mistakes, it will turn people away from trying > bioperl (and probably be worse than not having one at all). So I have > a request. I would appreciate it if some of you would read the tutorial > and give me feedback in terms of clarity and accuracy. I am interested > in both general comments (eg "this section is too long - cut out > such-and-such" or "this module description fits better in this section" > or "this module will not be included in the 0.7 release so don't include > it" ) and specific places where there are errors or misleading or > confusing statements. (If you think that the tutorial is clear and/or > that specific parts are particularly helpful I'd of course be happy to > get that feedback too :--). Suggestions on improving the formatting > would also be appreciated. > > I would definitely like feedback from people who have written modules > which are in the 0.7 release to make sure that I have captured your > intent and the proper usage of your module(s). I would also like > comments from folks who are simply bioperl users and, ideally, from a > few people who haven't used bioperl much before to see in what ways the > tutorial makes it easier to use or get started using bioperl (or > doesn't). Feel free to write to me directly at schattner@alum.mit.edu > or via this list. Thanks. > > If you just want to look at the tutorial, you can view it through the > web browsable CVS at : > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. > > (Note: you may need to view the tutorial through a word processor to get > the lines to wrap properly and to get rid of extra '^M's. If someone > can tell me how I need to reformat the file so this is not necessary I'd > be grateful.) > > If you want to also run the tutorial script, you will need to have a > copy of CVS "bioperl-live". The tutorial script will *not* work with > release 0.6. (Note that the contents of bioperl-live are being updated > often so some of the demo scripts may fail - they're working for me now > and if they start failing I'd appreciate finding out). > > Cheers > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Tue Jan 9 10:45:56 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue, 09 Jan 2001 10:45:56 +0000 Subject: [Bioperl-l] Initial draft of bioperl tutorial committed References: <3A5A1887.B6570706@alum.mit.edu> Message-ID: <3A5AEBE4.36B46EE@ebi.ac.uk> Peter, Running of any part of the script is dependent on bioperl-ext package. Since I do not have it, I can not run any demos. A workaround is needed. -Heikki odo ~/src/bioperl-live> perl -w bptutorial.pl 0 The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please install the bioperl-ext package odo ~/src/bioperl-live> perl -w bptutorial.pl 4 The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please install the bioperl-ext package odo ~/src/bioperl-live> Peter Schattner wrote: > > Hello all > > I have committed an initial draft of an introductory bioperl tutorial > (called "bptutorial.pl") to the bioperl-live (main) repository. The > draft tutorial pretty much follows the outline from my proposal: > http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html > One addition to the original proposal is that I have included an > "appendix" which is a working script that demonstrates most of the > bioperl features described in the tutorial. (The script is largely > cut-and-pasted from various test and example files with print statements > added to make it clearer as to what is going on). > > I believe that having a clear and accurate tutorial could make bioperl > more accessible and widely used. On the other hand, if the tutorial is > confusing or contains mistakes, it will turn people away from trying > bioperl (and probably be worse than not having one at all). So I have > a request. I would appreciate it if some of you would read the tutorial > and give me feedback in terms of clarity and accuracy. I am interested > in both general comments (eg "this section is too long - cut out > such-and-such" or "this module description fits better in this section" > or "this module will not be included in the 0.7 release so don't include > it" ) and specific places where there are errors or misleading or > confusing statements. (If you think that the tutorial is clear and/or > that specific parts are particularly helpful I'd of course be happy to > get that feedback too :--). Suggestions on improving the formatting > would also be appreciated. > > I would definitely like feedback from people who have written modules > which are in the 0.7 release to make sure that I have captured your > intent and the proper usage of your module(s). I would also like > comments from folks who are simply bioperl users and, ideally, from a > few people who haven't used bioperl much before to see in what ways the > tutorial makes it easier to use or get started using bioperl (or > doesn't). Feel free to write to me directly at schattner@alum.mit.edu > or via this list. Thanks. > > If you just want to look at the tutorial, you can view it through the > web browsable CVS at : > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. > > (Note: you may need to view the tutorial through a word processor to get > the lines to wrap properly and to get rid of extra '^M's. If someone > can tell me how I need to reformat the file so this is not necessary I'd > be grateful.) > > If you want to also run the tutorial script, you will need to have a > copy of CVS "bioperl-live". The tutorial script will *not* work with > release 0.6. (Note that the contents of bioperl-live are being updated > often so some of the demo scripts may fail - they're working for me now > and if they start failing I'd appreciate finding out). > > Cheers > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Tue Jan 9 10:50:49 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue, 09 Jan 2001 10:50:49 +0000 Subject: [Bioperl-l] Initial draft of bioperl tutorial committed References: <3A5A1887.B6570706@alum.mit.edu> <3A5AE7CD.D6671E36@ebi.ac.uk> Message-ID: <3A5AED09.1BBCF959@ebi.ac.uk> P.S. The URL for the wrapped version of the text is: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.2&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl With new versions coming in shortly it is best to use: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?cvsroot=bioperl And select the last from there. -Heikki Heikki Lehvaslaiho wrote: > > Dear Peter, > > Wonderful! Thank you very much for writing the tutorial. Before any of > us goes into details I though it best to wrap the words and remove ^Ms > for easier viewing. CVS is happier with short lines, too. This was > easy enough to do in emacs. > > Thanks again, > > -Heikki > > Peter Schattner wrote: > > > > Hello all > > > > I have committed an initial draft of an introductory bioperl tutorial > > (called "bptutorial.pl") to the bioperl-live (main) repository. The > > draft tutorial pretty much follows the outline from my proposal: > > http://bioperl.org/pipermail/bioperl-l/2000-December/001972.html > > One addition to the original proposal is that I have included an > > "appendix" which is a working script that demonstrates most of the > > bioperl features described in the tutorial. (The script is largely > > cut-and-pasted from various test and example files with print statements > > added to make it clearer as to what is going on). > > > > I believe that having a clear and accurate tutorial could make bioperl > > more accessible and widely used. On the other hand, if the tutorial is > > confusing or contains mistakes, it will turn people away from trying > > bioperl (and probably be worse than not having one at all). So I have > > a request. I would appreciate it if some of you would read the tutorial > > and give me feedback in terms of clarity and accuracy. I am interested > > in both general comments (eg "this section is too long - cut out > > such-and-such" or "this module description fits better in this section" > > or "this module will not be included in the 0.7 release so don't include > > it" ) and specific places where there are errors or misleading or > > confusing statements. (If you think that the tutorial is clear and/or > > that specific parts are particularly helpful I'd of course be happy to > > get that feedback too :--). Suggestions on improving the formatting > > would also be appreciated. > > > > I would definitely like feedback from people who have written modules > > which are in the 0.7 release to make sure that I have captured your > > intent and the proper usage of your module(s). I would also like > > comments from folks who are simply bioperl users and, ideally, from a > > few people who haven't used bioperl much before to see in what ways the > > tutorial makes it easier to use or get started using bioperl (or > > doesn't). Feel free to write to me directly at schattner@alum.mit.edu > > or via this list. Thanks. > > > > If you just want to look at the tutorial, you can view it through the > > web browsable CVS at : > > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/bptutorial.pl?rev=1.1&content-type=text/vnd.viewcvs-markup&cvsroot=bioperl. > > > > (Note: you may need to view the tutorial through a word processor to get > > the lines to wrap properly and to get rid of extra '^M's. If someone > > can tell me how I need to reformat the file so this is not necessary I'd > > be grateful.) > > > > If you want to also run the tutorial script, you will need to have a > > copy of CVS "bioperl-live". The tutorial script will *not* work with > > release 0.6. (Note that the contents of bioperl-live are being updated > > often so some of the demo scripts may fail - they're working for me now > > and if they start failing I'd appreciate finding out). > > > > Cheers > > > > Peter > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From birney@ebi.ac.uk Tue Jan 9 11:45:33 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 9 Jan 2001 11:45:33 +0000 (GMT) Subject: [Bioperl-l] bptutorial Message-ID: Many thanks to Peter for an excellent tutorial. It is well worth a read: I have spotted no obvious errors, but I will reread more carefully. The dependency problem can be solved with a conditional require and then run time skipping of sections. I agree with heikki that this will be a good thing. I will see what I can do here. People may have noticed as well that jason me and hilmar have been struggling through the refactoring of the main trunk towards 0.7. Much praise goes to jason for doing the lion's share of the work. I have only one module failing for unexplained reasons. I am planning to write on my transatlantic flight today the RichSeq style interfaces ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Tue Jan 9 11:46:08 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 9 Jan 2001 11:46:08 +0000 (GMT) Subject: [Bioperl-l] spoke too soon... Message-ID: Just cvs update'd and run tests... SeqStats has disappeared. Is this deliberate? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 9 15:38:59 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 9 Jan 2001 10:38:59 -0500 (EST) Subject: [Bioperl-l] spoke too soon... In-Reply-To: Message-ID: It appears you might have... It is in Bio::Tools::SeqStats, I have updated the test module to reflect this and split the tests into separate ok statements so we can know which ones are failing. It appears some are and I am not sure if it is an error in the tests or the module. 31 helix ../bio/bioperl/bioperl-live> cvs log Bio/SeqStats.pm RCS file: /home/repository/bioperl/bioperl-live/Bio/Attic/SeqStats.pm,v Working file: Bio/SeqStats.pm head: 1.3 branch: locks: strict access list: symbolic names: keyword substitution: kv total revisions: 3; selected revisions: 3 description: ---------------------------- revision 1.3 date: 2000/03/21 11:47:55; author: birney; state: dead; lines: +0 -0 moved SeqStats, added SeqWords ---------------------------- revision 1.2 date: 2000/03/01 15:36:42; author: birney; state: Exp; lines: +148 -156 Refactored RootI to get exception throwing cleanly out Fixed minor issues in multifile.pm Minor fix to IUPAC added exception test tidied up SeqStats.pm ---------------------------- revision 1.1 date: 2000/02/27 11:36:14; author: birney; state: Exp; added multi_1 test and SeqStats ========================================================================== On Tue, 9 Jan 2001, Ewan Birney wrote: > > Just cvs update'd and run tests... SeqStats has disappeared. Is this > deliberate? > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From insana@ebi.ac.uk Tue Jan 9 19:17:46 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Tue, 9 Jan 2001 19:17:46 +0000 (GMT) Subject: [Bioperl-l] make tests Message-ID: > As I understand from your and Heikki's replies in your test you > wanted the overriding thing to happen, be accepted (even though a > warning was triggered), and the code be able to handle it. Exactly. > if you print a message before the test that a warning should be > expected? This is a nice proposal. But that one is not such an important feature that needs to be absolutely tested, to the point of forcing people to read the pre-warning message and the warning message not to get confused by them.... So I just changed the code to test something closely related, i.e. checking that the code works, avoiding only to check that the "override" of the two parameters is acted (it should anyway). Thank you. Joseph Insana From hlapp@gmx.net Tue Jan 9 19:28:46 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 09 Jan 2001 11:28:46 -0800 Subject: [Bioperl-l] Re: make test References: <3A5A0F24.64D86538@gmx.net> <3A5AD9F3.652A421E@ebi.ac.uk> Message-ID: <3A5B666E.E2992E71@gmx.net> Heikki Lehvaslaiho wrote: > > Ewan propbly means that Variation code should be part of the main > bioperl cvs but should form a separate module after 0.7 is out. I do > not think this a good idea. I'd like to keep Variation and LiveSeq > namespaces within Bioperl main distribution. > Even better. I see I haven't understood the issue, so you guys thrash this out. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From ajm6q@virginia.edu Wed Jan 10 00:31:01 2001 From: ajm6q@virginia.edu (Aaron J Mackey) Date: Tue, 9 Jan 2001 19:31:01 -0500 (EST) Subject: [Bioperl-l] make tests In-Reply-To: Message-ID: Why don't you trap the warning in an eval/$SIG{__WARN__} - I don't see why you can't test for proper warnings, if that's what you were trying to do. -Aaron On Tue, 9 Jan 2001, Joseph Insana wrote: > > As I understand from your and Heikki's replies in your test you > > wanted the overriding thing to happen, be accepted (even though a > > warning was triggered), and the code be able to handle it. > > Exactly. > > > if you print a message before the test that a warning should be > > expected? > > This is a nice proposal. > > But that one is not such an important feature that needs to be absolutely > tested, to the point of forcing people to read the pre-warning message > and the warning message not to get confused by them.... > > So I just changed the code to test something closely related, i.e. checking > that the code works, avoiding only to check that the "override" of the two > parameters is acted (it should anyway). > > Thank you. > Joseph Insana > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From hlapp@gmx.net Wed Jan 10 09:55:32 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 10 Jan 2001 01:55:32 -0800 Subject: [Bioperl-l] Bio::SearchDist, Bio::Ext::Align Message-ID: <3A5C3194.5E71E318@gmx.net> I thought for completeness I install the Bioperl XS modules in Bio::Ext::*, and downloaded bioperl-ext-0.6.tar.gz, which is advertised as the latest version. Installation went fine, but now the t/SearchDist.t tests get executed. This revealed a couple of bugs in Bio::SearchDist, some of which are related to the RootI transition. Others consist of calling functions which are simply not present by that name in the extension module. I tried to fix them all, but now there is a complaint about a missing parameter in fit_EVD (expects two, but gets only 1 hardcoded parameter), which I don't know how to fix. Does anyone use this module currently (and if so, why does it work for you?)? Did I grab the wrong version? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From heikki@ebi.ac.uk Wed Jan 10 12:26:53 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 10 Jan 2001 12:26:53 +0000 Subject: [Bioperl-l] three letter codes for amino acids? Message-ID: <3A5C550D.54F02E78@ebi.ac.uk> I noticed that it is not possible to use three letter codes for amino acids in any bioperl sequence objects. I think should be possible at least to output in three letter code. Mapping three letter code back to one letter code is not too hard, either, but is it a good idea to have? I propose to put method 'seq3' into PrimarySeq.pm which is called from Seq.pm, too. =head2 seq3 Title : seq3 Usage : $string = $obj->seq3() Function: Read only method that returns the amino acid sequence as a string of three letter codes. moltype has to be 'protein'. Output follows the IUPAC standard plus 'Ter' for terminator. Returns : A scalar Args : character used for stop, optional, defaults to '*' character used for unknown, optional, defaults to 'X' =cut Any opinions? -Heikki --  ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From gert.thijs@esat.kuleuven.ac.be Wed Jan 10 15:35:48 2001 From: gert.thijs@esat.kuleuven.ac.be (gert thijs) Date: Wed, 10 Jan 2001 16:35:48 +0100 Subject: [Bioperl-l] Bio::SeqIO::genbank.pm Message-ID: <3A5C8154.5E2EBD56@esat.kuleuven.ac.be> I have been using Bio::SeqIO::genbank.pm quite frequently lately and I stumbled upon a small parsing problem. Sometimes there is no TITLE field defined in the REFERENCE and this makes the parsing of the record fail such that no features are detected. To solve this problem I have added 1 extra check in Bio::SeqIO::genbank.pm at line 602 if (/^ AUTHORS\s+(.*)/) { $au .= $1; while ( defined($_ = $self->_readline) ) { /^ TITLE/ && last; /^ JOURNAL/ && last; ### when no title is given ### /^\s+(.*)/ && do { $au .= $1; $au =~ s/\,(\S)/ $1/g;$au .= " ";next;}; } } -- + Gert Thijs + + email: gert.thijs@esat.kuleuven.ac.be + homepage: http://www.esat.kuleuven.ac.be/~thijs + + K.U.Leuven + ESAT-SISTA + Kasteelpark Arenberg 10 + B-3001 Leuven-Heverlee + Belgium + Tel : +32 16 32 18 84 + Fax : +32 16 32 19 70 From birney@ebi.ac.uk Wed Jan 10 13:35:46 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 10 Jan 2001 13:35:46 +0000 (GMT) Subject: [Bioperl-l] Re: Bio::SearchDist, Bio::Ext::Align In-Reply-To: <3A5C3194.5E71E318@gmx.net> Message-ID: On Wed, 10 Jan 2001, Hilmar Lapp wrote: > I thought for completeness I install the Bioperl XS modules in > Bio::Ext::*, and downloaded bioperl-ext-0.6.tar.gz, which is > advertised as the latest version. > > Installation went fine, but now the t/SearchDist.t tests get > executed. This revealed a couple of bugs in Bio::SearchDist, some > of which are related to the RootI transition. Others consist of > calling functions which are simply not present by that name in the > extension module. I tried to fix them all, but now there is a > complaint about a missing parameter in fit_EVD (expects two, but > gets only 1 hardcoded parameter), which I don't know how to fix. This is my bug to fix. I will look at it. I don't think anyone has used SearchDist before, including me. Doh! > > Does anyone use this module currently (and if so, why does it work > for you?)? Did I grab the wrong version? > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 10 13:46:01 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 10 Jan 2001 13:46:01 +0000 (GMT) Subject: [Bioperl-l] three letter codes for amino acids? In-Reply-To: <3A5C550D.54F02E78@ebi.ac.uk> Message-ID: On Wed, 10 Jan 2001, Heikki Lehvaslaiho wrote: > > > I noticed that it is not possible to use three letter codes for amino > acids in any bioperl sequence objects. I think should be possible at > least to output in three letter code. Mapping three letter code back > to one letter code is not too hard, either, but is it a good idea to > have? > > I propose to put method 'seq3' into PrimarySeq.pm which is called from > Seq.pm, too. > > =head2 seq3 > > Title : seq3 > Usage : $string = $obj->seq3() > Function: Read only method that returns the amino acid sequence > as a string of three letter codes. moltype has to be > 'protein'. Output follows the IUPAC standard plus > 'Ter' for terminator. > Returns : A scalar > Args : character used for stop, optional, defaults to '*' > character used for unknown, optional, defaults to 'X' > > =cut > > Any opinions? Do you really want this? I guess so. There could be an argument to make a SeqUtils class and move this sort of function in there, allowing us to mess less objects/interfaces it would be $seq3 = Bio::SeqUtils->seq3($seq); > > -Heikki > > --  > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 10 19:10:51 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 10 Jan 2001 11:10:51 -0800 Subject: [Bioperl-l] Bio::SeqIO::genbank.pm References: <3A5C8154.5E2EBD56@esat.kuleuven.ac.be> Message-ID: <3A5CB3BB.3B191AF8@gmx.net> Submitted to the Bioperl bug-tracker. (BTW whenever you feel quite sure that your complaint addresses a bug, you can directly submit it to bioperl-bugs@bio.perl.org. If you don't feel sure, you can still do so. The bug-tracking system is the best way of keeping track of such things.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From lorrie@oreilly.com Wed Jan 10 19:31:27 2001 From: lorrie@oreilly.com (Lorrie LeJeune) Date: Wed, 10 Jan 2001 14:31:27 -0500 Subject: [Bioperl-l] Re: Initial draft of bioperl tutorial committed In-Reply-To: <200101091033.FAA28668@pw600a.bioperl.org> Message-ID: <4.2.0.58.20010110142201.022461f0@pop3.east.ora.com> At 05:33 AM 1/9/2001 -0500, Peter Schattner wrote: >I have committed an initial draft of an introductory bioperl tutorial >(called "bptutorial.pl") to the bioperl-live (main) repository. Peter (and fellow BioPerlers): I think the tutorial is a great idea. BioPerl needs good documentation in a big way, and I promised Ewan at BOSC that I'd be willing to volunteer some time to the cause. So I'd be happy to sign on as your editor and help you get it ship-shape. I'm also a beginning Perl programmer, so I'm sure it'll help me learn more about both the language and BioPerl. I'm in the process of finishing up O'Reilly's first bioinformatics book: Developing Bioinformatics Computer Skills. I'd like to put a pointer to the tutorial in the book, but the URL is way too long. D'ya think we might convince the webmaster give it a shorter link that's suitable for publication? Cheers, --Lorrie ------------------------------------------------------ Lorrie LeJeune Editor, Web Technologies and Bioinformatics O'Reilly & Associates 90 Sherman Street, Cambridge, MA 02140 Tel: 617-499-7472; FAX: 617-661-1116 www.oreilly.com ------------------------------------------------------ From hlapp@gmx.net Wed Jan 10 19:35:09 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 10 Jan 2001 11:35:09 -0800 Subject: [Bioperl-l] three letter codes for amino acids? References: <3A5C550D.54F02E78@ebi.ac.uk> Message-ID: <3A5CB96D.BE224F4@gmx.net> Heikki Lehvaslaiho wrote: > > I noticed that it is not possible to use three letter codes for amino > acids in any bioperl sequence objects. I think should be possible at > least to output in three letter code. Mapping three letter code back > to one letter code is not too hard, either, but is it a good idea to > have? > > I propose to put method 'seq3' into PrimarySeq.pm which is called from > Seq.pm, too. > > =head2 seq3 > > Title : seq3 > Usage : $string = $obj->seq3() > Function: Read only method that returns the amino acid sequence > as a string of three letter codes. moltype has to be > 'protein'. Output follows the IUPAC standard plus > 'Ter' for terminator. > Returns : A scalar > Args : character used for stop, optional, defaults to '*' > character used for unknown, optional, defaults to 'X' > > =cut > > Any opinions? > Considering sequence atoms as symbols seems the most natural concept to me. Having single letters representing each symbol makes symbol arrays and strings more or less equivalent in Perl. This might not hold for multi-letter representations, so in the first place I'd expect an array to be returned. However, this is inconsistent with $seq->seq(), and reportedly inefficient due to Perl's array implementation. I know you could still split at every 3rd letter as a simple way to get an array. I'd nevertheless accept a third optional parameter denoting the 'join' character, with a default of ''. Just my few thoughts. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From paul-christophe.varoutas@curie.fr Thu Jan 11 00:48:13 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Thu, 11 Jan 2001 01:48:13 +0100 Subject: [Bioperl-l] Emerging from obscurity Message-ID: <5.0.2.1.2.20010110212839.00aaf3f0@pop.wanadoo.fr> Hi everybody, I am writing because I would like to start contributing to the bioperl project. But first of all let me introduce myself: I am finishing my PhD at the Curie Institute in Paris, France. My subject is molecular genetics in yeast, and more particularly the study of the initiation of meiotic recombination. Apart from molecular genetics, I have a rather strong background in algorithmics and programming, that I developped by studying alone and interacting a lot with people studying computer science. One of my favorite fields is OOP: a have read a lot of books on OOP design and have experience in designing projects using UML and implementing them in C++. I started using C++ on 1992 and ever since I have implemented lots of sexually attractive object classes, such as various types of neural networks (backpropagation nets, BAMs (bidirectional associative memories) and FAMs (like BAMs but integrating fuzzy logic), various cryptographic algorithms, and a basic collection of bioinformatics objects (that was before I discovered bioperl ;-) ), that I then used to develop some small appications, the coolest one probably being a program doing ORF prediction using Fourier transforms. As for Perl, I started learning it on 1995 (that was the year of the 5.001 release). Slowly but steadily it has become my favorite language. I use it extensively to do virtually *everything*, including solving my small everyday problems, such as doing file management, automating various internet activities (from more low-level operations using the IO::Socket:: modules to FTP/telnet sessions and web stuff), automating the very few biocomputing needs I've had for my PhD project. I also use perl for CGI scripting. I am fascinated by the power of regular expressions (I am reading "Mastering Regular Expressions" for the second time, and I am even more fascinated than the first time I read it, I'm still discovering astonishing details and realizing there is so much to learn about them), and try to use them whenever/wherever I can B-) . I discovered the bioperl project two years ago and I am following with big interest the discussion groups for almost a year. Many times I wanted to just jump in the discussions, but I didn't because I knew I would have no time to deal with it on top of my other activities. So, after this rather long introduction, here is the subject of my mail: like all of you, I want to participate in making bioperl better. As I mentionned above I am finishing my PhD, so I don't have much time for the moment. But will have finished the experimental part of my PhD by the end of January, so I will have some time to spare. I will probably pass large amounts of time in front of a macintosh writting my article and PhD. I *hate* macs (my favorite mac software is telnet for loging to our unix servers or to my home PC), and participating in the bioperl project will prevent me from getting insane :-). I was thinking about participating in the discussions about the OOP design of bioperl, participating in the biocorba interoperability project, but for the moment I would prefer starting with something more "smooth", after all I am not (yet) familiar with all bioperl modules. So doing something that can get me more familiarized with the whole set of the bioperl modules should be a good start. I figured out that I can help with some aspects of bioperl that can contribute to the enlargement of the bioperl community. So, here is what I propose to do: - Help figuring out bioperl 0.7 cross-platform compatibility with the MacOS platform. Almost all french labs use macintosh computers, and our lab has a lot of mac boxes with various types of processors and various versions of MacOS (from 7.5.3 to 9.0). Todd Richmond and Mark Colosimo have already pointed out that there are a lot of compatibility problems, their posts are going to be my starting point. I would like to make a list of all problems, figure out which ones can be solved reasonably easily, and make at least a subset of bioperl work on MacOS "Classic" (non-MacOS X) platforms, which is what most Mac people use, and most probably will continue using. - Contribute to Shelly's effort for compatibility with the Windows NT/2000 platform. - Participate in the documentation project of bioperl. I know that there are already people working on various aspects of the documentation, so I would like Ewan / Hilmar to tell me what you prefer: participate in one of the ongoing projects or initiate another project to do something that is missing. I am very glad to contribute to the bioperl group, you are doing some exceptionally good work out there. (For those who are reading this line, thank you for reaching so far :-) ). Paul-Christophe -------------------------------------- Paul-Christophe Varoutas Institut Curie - Section de Recherche - UMR144 Laboratoire de Genetique Moleculaire de la Recombinaison 26, rue d'Ulm 75248 Paris cedex 05 FRANCE Tel: 01.42.34.66.36 Fax: 01.42.34.66.44 ---------------------------------------- From jason@chg.mc.duke.edu Thu Jan 11 01:33:12 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 10 Jan 2001 20:33:12 -0500 (EST) Subject: [Bioperl-l] Emerging from obscurity In-Reply-To: <5.0.2.1.2.20010110212839.00aaf3f0@pop.wanadoo.fr> Message-ID: Paul - We are very happy to have you aboard the project. We are very happy to add you skills to the team and I look forward to you getting aquainted with the modules and helping us in the design (and redesign) of many of the objects. The tasks you outline below sound like a very good starting point and are very sorely needed as many of developers are only plugged into UNIX on a regular basis. The documentation will be a good starting point too, but I strongly suggest you try and use the bioperl modules to solve a task you have in your lab. (I guess the Mac portability tackling will give you this experience - but really try and use it to manipulate some of your yeast data). This will give you the chance to both get used to modules and help write documentation for people who are new to bioperl. The developers who are familiar with the code too often skip over the important details when writing docs. If you are unsure of how to use a module feel free to use the list for questions, I know there are many more people who are looking for ways to get comfortable with the modules. I'd also like to see us consider moving some of the documentation/tutorials to the wiki web site to facilitate more people contributing to it. Perhaps some 'scenerio writing' which describes a problem and how bioperl was used to solve it. Again, welcome aboard and we look forward to your contributions. Jason On Thu, 11 Jan 2001, Paul-Christophe Varoutas wrote: [snip] > > > - Help figuring out bioperl 0.7 cross-platform compatibility with the MacOS > platform. Almost all french labs use macintosh computers, and our lab has a > lot of mac boxes with various types of processors and various versions of > MacOS (from 7.5.3 to 9.0). Todd Richmond and Mark Colosimo have already > pointed out that there are a lot of compatibility problems, their posts are > going to be my starting point. I would like to make a list of all problems, > figure out which ones can be solved reasonably easily, and make at least a > subset of bioperl work on MacOS "Classic" (non-MacOS X) platforms, which is > what most Mac people use, and most probably will continue using. > > - Contribute to Shelly's effort for compatibility with the Windows NT/2000 > platform. > > - Participate in the documentation project of bioperl. I know that there > are already people working on various aspects of the documentation, so I > would like Ewan / Hilmar to tell me what you prefer: participate in one of > the ongoing projects or initiate another project to do something that is > missing. > > I am very glad to contribute to the bioperl group, you are doing some > exceptionally good work out there. > > (For those who are reading this line, thank you for reaching so far :-) ). > > > Paul-Christophe > > > -------------------------------------- > Paul-Christophe Varoutas > Institut Curie - Section de Recherche - UMR144 > Laboratoire de Genetique Moleculaire de la Recombinaison > 26, rue d'Ulm > 75248 Paris cedex 05 > FRANCE > Tel: 01.42.34.66.36 > Fax: 01.42.34.66.44 > ---------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From schattner@alum.mit.edu Thu Jan 11 09:04:19 2001 From: schattner@alum.mit.edu (Peter Schattner) Date: Thu, 11 Jan 2001 01:04:19 -0800 Subject: [Bioperl-l] Re: Initial draft of bioperl tutorial committed References: <4.2.0.58.20010110142201.022461f0@pop3.east.ora.com> Message-ID: <3A5D7712.1A844FA9@alum.mit.edu> Lorrie LeJeune wrote: > > Peter Schattner wrote: > > >I have committed an initial draft of an introductory bioperl tutorial > >(called "bptutorial.pl") to the bioperl-live (main) repository. > So I'd be happy to sign on as your editor and help you > get it ship-shape. Thanks for your offer. I look forward to getting your feedback and recommendations regarding the tutorial. Peter From heikki@ebi.ac.uk Thu Jan 11 10:09:48 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu, 11 Jan 2001 10:09:48 +0000 Subject: [Bioperl-l] three letter codes for amino acids? References: Message-ID: <3A5D866C.C41B98E2@ebi.ac.uk> Dear Adrian, I guess I was not too clear here. I'll post the reply to the list as others might have misunderstood, too. The translate method in PrimarySeqI defaults to '*' and 'X' for stop and any in its output, but there are arguments to the method that allow you to change it. As The resulting protein sequence object can have any come other characters in the one letter code stored in the object. The same argumets are needed in the seq3 method so that the corresponding three letter codes are always 'Ter' and 'Xaa' (IUPAC standard). -Heikki Adrian Goldman wrote: > > Heikki, > > I am not very good at listserv etiquette. Anyway, here is my 2c.. if you want to post it further on to the list server, it's OK by me. Or else you can just ignore what follows as my own personal opinion. > > I don't think it makes much sense to use * as the default character for stop in 3-letter codes, nor X as the default for unknown, for the optional arguments you mention below. Ter (as you propose) for the termination codon and ?XXX for unknown make more sense to me. > > Adrian > > At 12:03 pm -0500 10/1/2001, bioperl-l-request@bioperl.org wrote: > > Message: 5 > Date: Wed, 10 Jan 2001 12:26:53 +0000 > From: Heikki Lehvaslaiho > Organization: EMBL - EBI > To: bioperl-l > Subject: [Bioperl-l] three letter codes for amino acids? > > I noticed that it is not possible to use three letter codes for amino > acids in any bioperl sequence objects. I think should be possible at > least to output in three letter code. Mapping three letter code back > to one letter code is not too hard, either, but is it a good idea to > have? > > I propose to put method 'seq3' into PrimarySeq.pm which is called from > Seq.pm, too. > > =head2 seq3 > > Title : seq3 > Usage : $string = $obj->seq3() > Function: Read only method that returns the amino acid sequence > as a string of three letter codes. moltype has to be > 'protein'. Output follows the IUPAC standard plus > 'Ter' for terminator. > Returns : A scalar > Args : character used for stop, optional, defaults to '*' > character used for unknown, optional, defaults to 'X' > > =cut > > Any opinions? > > -Heikki > > --  > > Professor Adrian Goldman, | Phone: 358-(0)9-191 58923 > Structural Biology Group, | FAX: 358-(0)9-191 58952 > Institute of Biotechnology | Sec: 358-(0)9-191 58921 > University of Helsinki, | Mobile: 358-(0)50-336 8960 > PL 56 | Home: 358-(0)9-728 7103 > 00014 Helsinki | email: Adrian.Goldman@Helsinki.fi > > -- on sabbatical at Brookhaven National labs, June 2000-June 2001 > Adrian Goldman, Biology Department, Building 463 50 Bell Ave., Brookhaven National Lab., Upton NY 11973. Phone: 631-344-2671 (off) 631-344-3417 (lab), 631-344-3407 (FAX). email: agoldman@bnl.gov -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Thu Jan 11 12:00:26 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu, 11 Jan 2001 12:00:26 +0000 Subject: [Bioperl-l] three letter codes for amino acids? References: <3A5C550D.54F02E78@ebi.ac.uk> <3A5CB96D.BE224F4@gmx.net> Message-ID: <3A5DA05A.64D8D872@ebi.ac.uk> > Considering sequence atoms as symbols seems the most natural > concept to me. Having single letters representing each symbol > makes symbol arrays and strings more or less equivalent in Perl. > This might not hold for multi-letter representations, so in the > first place I'd expect an array to be returned. However, this is > inconsistent with $seq->seq(), and reportedly inefficient due to > Perl's array implementation. > > I know you could still split at every 3rd letter as a simple way > to get an array. I'd nevertheless accept a third optional > parameter denoting the 'join' character, with a default of ''. Can be done. In my mind the main use of this function is in displaying translations on top of nucleotide sequnces. Gaps inside codons are clearer with the three letter coding. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From dalke@acm.org Thu Jan 11 12:32:21 2001 From: dalke@acm.org (Andrew Dalke) Date: Thu, 11 Jan 2001 05:32:21 -0700 Subject: [Bioperl-l] looking for datafile parsers Message-ID: <00f501c07bca$8cfd61c0$f3ab323f@josiah> Hello, I'm working on a parser generator as part of the Biopython development. It's getting towards completion which means it's time to start writing papers about it. :) Indeed, my paper was accepted for a talk at the upcoming Python conference. One of the reviewers wanted more information comparing my work to others in the field, so I've been digging up related project. I figure on writing another paper for Bioinformatics which will include some more of this information. The most similar program is SRS, which is also a parser generator, although they are context free while my parser is (mostly) regular. I tried to get a copy of the reference paper (from Meth.Enzy.) from the library but it was checked out. I would love it if someone would offer to answer a few questions for my about it, and to run some benchmarks to see how fast it parses swissprot38, say, as compared to how long it takes the bioperl code to parse the same file. Any takers? There are a few projects which allow users to specific a format using a configuration description which can roughly be classified as a regular expression pattern matcher sitting on top of line type recognizer. This includes Biopy and BioDB-Loader as well as the current Biopython parser. Another class of projects uses a common data structure then implements readers/writers to the different formats at the expense of throwing away some data, such as bioperl and SeqIO. Swissknife is an example of a library which reads/writes from a single format into a data format tailored specifically to that format. A few are special case programs (grep, NiceProt, sp2fasta) which do one and only one thing, although in the case of sw2xml that one thing converts the format (SWISS-PROT) to another format (XML) for which many tools are readily available. Most of the packages throw away formatting information and only store the physical data, although get-sprot-entry is a nice example of why keeping presentation information is useful. The program creates an HTML page which looks the same as the original format except that various fields are marked up with hyperlinks. Finally, the project I've been working on, Martel, lets you develop parsers which handle most, if not all, of these cases. I want to make sure I covered everything so I've been searching for SWISS-PROT parsers as my prototypical example. A description of what I found is below. If something major is missing, please tell me. If you can provide assistence with the SRS, GCG, Java or Lisp parts, also please tell me. Here's a key to some of the notation I use in the listings below: count == count the number of records in a database offset == generate offsets into the file for fast indexing fasta == extract data for FASTA (ID, AC and SQ fields) generic == extract generic sequence data, usually as a data structure containing fields common to multiple formats but ignoring some SWISS-PROT specific fields all == extract all fields validate == validate that a record is in the correct format markup == identifies fields and saves the layout data so as to allow HTML markup without otherwise changing the format (timings not given for markup since it will depend on the specific markup requested, and because only Martel and get-sprot-entry preserve markup) Performance is measured against the 80,000 records of swissprot38 grep - http://www.gnu.org/gnulist/production/grep.html written in C count (when used as "grep ^ID | wc") takes 0m:57s to parse sprot38 offset (when used as "grep -b ^ID") cannot be used for fasta, generic, all, validate, markup one really large regular expression (here as a bit of humor) written in C cannot be used for count, offset, fasta, generic, all, markup can be used for validate in theory, but I haven't tested it bioperl - http://www.bioperl.org/ written in Perl count (as a special case of generic) fasta (as a special case of generic) generic takes 30m:13s to parse sprot38 cannot be used for index (?), all, validate, markup biopython - http://www.biopython.org/ written in Python count (as a special case of all) fasta (as a special case of all) generic (as a special case of all) all takes 28m:55s to parse sprot38 validate cannot be used for index(?), markup biojava - http://www.biojava.org/ written in Java unknown (have source but need to figure it out) performance unknown (don't know how to code in Java) Martel - http://www.biopython.org/~dalke/Martel/ written in Python with a C extension count RecordReader.StartsWith "ID" takes 1m28s to parse sprot38 index fasta (standard format def. but only using the ID and SQ tags) takes 9m:23s to parse sprot38 generic (as a special case of all) all takes 23m:29s to parse sprot38 validate with no callbacks takes 6m:41s markup SRS - http://www.lionbio.co.uk/ written in C (?) have never used it, but it can definitely do count, fasta, generic and all. The standard swissprot format definition http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz? -page+LibInfo+-id+01FXMii+-lib+SWISSPROT cannot be used to validate although SRS itself can. I think SRS can be used to generate HTML markup but I can't begin to guess how that might be done. *** I really want to ask someone questions about SRS *** *** Any takers? *** I don't think it can be used to create your own indicies - you must use its offset tables. swissknife - ftp://ftp.ebi.ac.uk/pub/software/swissprot/ written in Perl count lazy reader takes 1m:48s to parse sprot38 fasta (getting the ->ID and ->SQ attributes) takes 8m:47s to parse sprot38 generic (as a special case of all) all takes 38m:21s to parse sprot38 cannot be used to validate, markup Biopy - http://shag.embl-heidelberg.de:8000/Biopy/ written in Python count (as a special case of all) index (by "position += length($_)") fasta (as a special case of all) generic (as a special case of all) all - requires additional programming to parse the subfields (it only identifies lines) so I actually wouldn't count this as a full parser. * takes roughly 25m to parse cannot be used to validate, markup Darwin - http://cbrg.inf.ethz.ch/Darwinshome.html is its own language and set of libraries contains a converter from SWISS-PROT to its own format. I don't access to the source code so the following is based on the example parser at http://www.inf.ethz.ch/personal/hallett/drive/node92.html count (as a special case of all) fasta (as a special case of all) generic (as a special case of all) all - requires additional programming to parse the subfields although the real implementation may contain all of that. given example cannot be used to index, validate, markup (Why does http://www.inf.ethz.ch/personal/hallett/drive/drive.html say that SWISS-PROT 38 has only 77,977 record when my copy has exactly 80,000?) SeqIO - http://www.cs.ucdavis.edu/~gusfield/seqio.tar.gz written in C count (as a special case of generic) fasta (as a special case of generic) generic have not yet benchmarked cannot be used to index, all, validate, markup readseq (C) - http://iubio.bio.indiana.edu/soft/molbio/readseq/ version1/readseq.shar written in C doesn't have swissprot and need to test of embl works instead to be tested readseq (Java) - http://iubio.bio.indiana.edu/soft/molbio/readseq/ java/readseq-source.zip written in Java have not yet explored (see above where I need help on how to write a good test program in Java.) Boulder - http://stein.cshl.org/software/boulder/ written in Perl count (as a special case of generic) fasta (as a special case of generic) generic have not yet benchmarked cannot be used for index, all, validate, markup molbio++ - ftp://ftp.ebi.ac.uk/pub/software/unix/molbio.tar.Z written in (now obsolete) C++ which doesn't compile I think it can be classified as count (as a special case of generic) fasta (as a special case of generic) generic, although it calls for some extra parsing to get at subfields of a data line * will not be benchmarking since I don't want to spend the effort to get it to compile. cannot be used for index, all, validate, markup BioDB-Loader - http://www.franz.com/services/conferences_seminars/ ismb2000/biodb1.tar.Z written in Common Lisp (Help! I know even less lisp than Java!) I'm guessing it can be classified as count (as a special case of generic) index fasta (as a special case of generic) generic, although it calls for some extra parsing to get at the subfields of a data line * have not benchmarked, although I have downloaded the Allegro common Lisp demo version. cannot be used for all, validate, markup GCG - http://www.gcg.com/products/wis-package.html written in C (?) never used it. Betting it can be classified as count (as a special case of generic) index fasta (as a special case of generic) generic have not benchmarked since I'm not spending that much money just to test the performance. cannot be used for all, validate, markup sp2fasta - part of ftp://ftp.ncbi.nlm.nih.gov/toolkit/ ? Can't seem to find it in the current distribution. Various web pages imply it is a C program to convert SWISS-PROT/EMBL to FASTA. count (if used together with grep and wc) fasta have not benchmarked since I cannot find code cannot be used for index, generic, all, validate, markup sw2xml - http://www.vsms.nottingham.ac.uk/biodom/software/ protsuite-user-dist/sw2xml-protbot.pl written in Perl. It is a translation program from SWISS-PROT to XML so some additional, though minor, XML coding is needed to do the following. count (as a special case of all) fasta (as a special case of all) generic (as a special case of all) all have not yet benchmarked cannot be used to index, validate, markup (because of the 'tidy') NiceProt - used at ExPASy implementation information not available only used to parse a single record parses the data file but doesn't build a data structure (?) so creation of fasta, generic and all require som modifications. cannot be used to count, index, validate(?), markup get-sprot-entry - used at ExPASy implementation not available can be used to markup a record (eg, see http://expasy.cbr.nrc.ca/cgi-bin/get-sprot-entry?P52930 ) doesn't build data structures or convert to another format so it cannot be used for anything else (true?) Whew! I'ld be surprised if I really did miss some other major style of parsing. Actually, I did - there are no lex/yacc grammers for SWISS-PROT but I'm not surprised because the lexing is strongly position dependent which calls for tight, explicit, tricky communications with the parser. Any other suggestions? Sincerely, Andrew Dalke dalke@acm.org From simon.brocklehurst@CambridgeAntibody.com Thu Jan 11 13:59:22 2001 From: simon.brocklehurst@CambridgeAntibody.com (Simon Brocklehurst) Date: Thu, 11 Jan 2001 13:59:22 +0000 Subject: [Bioperl-l] Re: [Biojava-l] looking for datafile parsers References: <00f501c07bca$8cfd61c0$f3ab323f@josiah> Message-ID: <3A5DBC3A.7EBC10AA@CambridgeAntibody.com> Hi Andrew, You might be interested to know that CAT has contributed to biojava a SAX2-compliant, event-based parsing framework for dealing with bioinformatics data files. Essentially, by using a SAX2 model, the framework allows users to build arbritrary XML Content Handlers for dealing with data from bioinformatics files in arbritary ways. The framework generates SAX2 events from bioinformatics format files i.e. the input data isn't XML, nor is it converted into XML internally. It's a reasonable implementation of the SAX2 e.g. Users can: o Set properties on SAX Parsers e.g. configuration of various features namespace reporting etc. o Handle infinitely large files, because it works like a SAXParser should i.e. doesn't keep the whole file in memory etc. o Deals with InputSources i.e. essentially various flavours of streams. A couple of neat benefits of a implementationing of SAX2: o It's trivial to create XML format versions of files so, with which you can do whatever you want with these e.g. using XSLT o By stringing together biojava SAXParsers which are non-validating, with validating SAXParsers from e.g. IBM, you can create parsers that validate against DTDs and/or XML Schemas that we produce for the data formats supported by the framework. Because, the bioinforamtics data from is modelled in a strongly typed way by the framework, you can get genuinely useful benefits from validation. We haven't put SwissProt support into this framework as of yet - biojava already had ways of handling SwissProt data before we put the SAX2 framework in. Currently we have in there OK support for NCBI Blast and WU-Blast, and improving support for HMMER, and PDB data. Hope this info is useful... Simon -- Simon M. Brocklehurst, Ph.D. Head of Bioinformatics & Advanced IS Cambridge Antibody Technology The Science Park, Melbourn, Cambridgeshire, UK http://www.CambridgeAntibody.com/ mailto:simon.brocklehurst@CambridgeAntibody.com From ajm6q@virginia.edu Thu Jan 11 13:52:32 2001 From: ajm6q@virginia.edu (Aaron J Mackey) Date: Thu, 11 Jan 2001 08:52:32 -0500 (EST) Subject: [Bioperl-l] looking for datafile parsers In-Reply-To: <00f501c07bca$8cfd61c0$f3ab323f@josiah> Message-ID: On Thu, 11 Jan 2001, Andrew Dalke wrote: > Finally, the project I've been working on, Martel, > lets you develop parsers which handle most, if not all, of > these cases. Excellent, I look forward to seeing your work. Parsing is the meat and potatoes of bioinformatics, and it's beginning to taste very stale (I dunno, maybe it's been stale for awhile now). My own secret wish list is focused more on result file parsing; I once spent a fair amount of time building a "robust" FASTA result file parser, but found myself constantly needing to tweak it to keep up with fasta development changes. You don't have that problem with SwissProt or other static file formats. > grep - http://www.gnu.org/gnulist/production/grep.html > written in C > count (when used as "grep ^ID | wc") > takes 0m:57s to parse sprot38 > offset (when used as "grep -b ^ID") > cannot be used for fasta, generic, all, validate, markup I've actually found that I now use grep and a small mix of perl more than any other parsing routine (mainly because of the predicament I mention above: when a format changes, I have to fix the entire parser, even if I just want to pull out a few relevant fields at the moment). My result file "parsers" often take a few 'grep swipes' at the file (since the second grep on the same file is commonly much faster than the first), and as you show, it's very fast to begin. The one extension to grep that I'd dearly like to see (perhaps I'll submit a patch) would be to extend the -A and -B (after-context and before-context flags) to take regexp's themselves (i.e. instead of printing N lines after the first match, continue printing until the second regexp is matched, or other possibilities depending on specified flags). Then you could start using (multiple) greps to get 'fasta', 'generic', 'all' satisfied. Use the shell, Luke. -Aaron -- o ~ ~ ~ ~ ~ ~ o / Aaron J Mackey \ \ Dr. Pearson Laboratory / \ University of Virginia \ / (804) 924-2821 \ \ amackey@virginia.edu / o ~ ~ ~ ~ ~ ~ o From insana@ebi.ac.uk Thu Jan 11 18:15:04 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Thu, 11 Jan 2001 18:15:04 +0000 (GMT) Subject: [Bioperl-l] make tests In-Reply-To: Message-ID: > Why don't you trap the warning in an eval/$SIG{__WARN__} - I don't see why > you can't test for proper warnings, if that's what you were trying to do. I didn't know that. Now that I understood what you meant and read through the manual how to apply it, I see it's the perfect solution. Thank you very much Joseph From birney@ebi.ac.uk Fri Jan 12 22:10:36 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 12 Jan 2001 22:10:36 +0000 (GMT) Subject: [Bioperl-l] RootI detachment proposal. Message-ID: [Ewan recovers from rereading the Bio::Root:: stuff...] This is *mainly* for Jason and Hilmar, but in case there are other people who want to chip in: I want to completely detach RootI from the other Root::Objects (in particular Err). This means a heavy refactoring of RootI - mainly in removing the code. I will keep ->throw and ->warn but not ->verbose as a real method. (jason - do you mind this?) (I will have a "deprecation warning" on verbose) I am planning to do this on my local copy now and see how it pans out... Bio::Root::Object in it's full glory will still be there for modules we have not migrated to Bio::Root::RootI thoughts anyone? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From lapp@gnf.org Fri Jan 12 22:32:11 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Fri, 12 Jan 2001 14:32:11 -0800 Subject: [Bioperl-l] RootI detachment proposal. References: Message-ID: <3A5F85EB.8BF76127@gnf.org> Ewan Birney wrote: > > [Ewan recovers from rereading the Bio::Root:: stuff...] > > This is *mainly* for Jason and Hilmar, but in case there are other > people who want to chip in: > > I want to completely detach RootI from the other Root::Objects (in > particular Err). This means a heavy refactoring of RootI - mainly in > removing the code. > > I will keep ->throw and ->warn but not ->verbose as a real method. (jason > - do you mind this?) (I will have a "deprecation warning" on verbose) > > I am planning to do this on my local copy now and see how it pans out... > > Bio::Root::Object in it's full glory will still be there for modules we > have not migrated to Bio::Root::RootI > > thoughts anyone? > verbose() is being made use of heavily as far as I saw some code and code migrations from Jason. I do think that it is beneficial and desirable to have a central mechanism for regulating 'verbosity' (e.g., what happens upon a warning being issued). I also don't see yet why having verbose() in RootI hampers disentangling RootI from the other objects, or where this should interfere. (People who don't want that feature simply override it with a stub.) Maybe I'm missing something. Ideally I don't have to come up with a SeqIO-specific mechanism concerning client-side regulation of the severity of warnings. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney@ebi.ac.uk Fri Jan 12 22:51:21 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 12 Jan 2001 22:51:21 +0000 (GMT) Subject: [Bioperl-l] RootI detachment proposal. In-Reply-To: <3A5F85EB.8BF76127@gnf.org> Message-ID: On Fri, 12 Jan 2001, Hilmar Lapp wrote: > > verbose() is being made use of heavily as far as I saw some code and code > migrations from Jason. I do think that it is beneficial and desirable to > have a central mechanism for regulating 'verbosity' (e.g., what happens > upon a warning being issued). I also don't see yet why having verbose() in > RootI hampers disentangling RootI from the other objects, or where this > should interfere. (People who don't want that feature simply override it > with a stub.) > > Maybe I'm missing something. Ideally I don't have to come up with a > SeqIO-specific mechanism concerning client-side regulation of the severity > of warnings. Yeah. I know. I guess I am thinking with my C-extension hat on again. Ok. verbose stays. > > Hilmar > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Fri Jan 12 23:43:20 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Fri, 12 Jan 2001 18:43:20 -0500 (EST) Subject: [Bioperl-l] RootI detachment proposal. In-Reply-To: Message-ID: On Fri, 12 Jan 2001, Ewan Birney wrote: > > > [Ewan recovers from rereading the Bio::Root:: stuff...] > > This is *mainly* for Jason and Hilmar, but in case there are other > people who want to chip in: > > > I want to completely detach RootI from the other Root::Objects (in > particular Err). This means a heavy refactoring of RootI - mainly in > removing the code. > > I will keep ->throw and ->warn but not ->verbose as a real method. (jason > - do you mind this?) (I will have a "deprecation warning" on verbose) well, actually verbose makes me happy because we can choose whether or not warn will actually print out msgs. Can it just be a get/set method and warn can check to see if verbose > 0 before printing? I like to use it as a debugging flag as well so we can have object specific debugging flags. > > > I am planning to do this on my local copy now and see how it pans out... > > > Bio::Root::Object in it's full glory will still be there for modules we > have not migrated to Bio::Root::RootI > > > thoughts anyone? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Sat Jan 13 01:18:34 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sat, 13 Jan 2001 01:18:34 +0000 (GMT) Subject: [Bioperl-l] refactoring RootI Message-ID: I have finished a very serious refactoring of RootI. This detaches RootI from the other Root:: objects completely. verbose I think it handled nicer. I would venture to say that the code is more readable. I have changed the formatting somewhat of the stack trace in the throw/warn statements. Your milage may vary here... Jason, Hilmar - check it out and tell me what you think. I am now a little exhausted although the final product I think is vastly improved... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From dsokol@osnut.com Sat Jan 13 07:47:09 2001 From: dsokol@osnut.com (dsokol@osnut.com) Date: Sat, 13 Jan 2001 02:47:09 -0500 Subject: [Bioperl-l] Exciting New Nutraceutical Company- Promote your own ideas! Message-ID: <200101130747.CAA14560@pw600a.bioperl.org> --=200101130127= Content-Type: text/html;charset=US-ASCII Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits

bioperl-l@bioperl.org,                                                                                                            

    It was a pleasure learning about your interests in biology from your website.  Based on your credentials, I am offering you the following opportunity, which I hope you may find worthwhile.

Thank you,

Daniel

 Have your nutraceutical ideas become reality and marketed to the general public-and perhaps even globally.

Design Your Own Herbal and Nutritional Supplements and Reap the Financial Benefits from the Quality of your own ideas!

Kava Kava, Ginseng, Echinacea, St. John's Wort...

For FREE information on these nutraceuticals, including their methods of synthesis,  you can go to http://www.osnut.com/freeinfo.htm by clicking HERE.

The explosion in the nutraceutical industry has left open the possibility for considerable profits.  New nutraceuticals and herbal formulas are being discovered, designed, and marketed every day!  If you have a background in herbs/ biology/ chemistry /nutrition and/or medicine, then OSnutraceuticals is the company for you.

Open Source Nutraceuticals, Inc. is a company committed to excellence in the nutraceutical industry by providing an open source for the creation and standardization of nutraceuticals for naturally treating all kinds of conditions. By implementing a linux-like platform for discussion and protection of your ideas, OSnutraceuticals can be the best way to have your innovations marketed to the general public and for you to reap the financial benefits from the sales.

Sign up NOW and get 2 months FREE!

For more information, visit www.osnut.com

by clicking HERE!

(Note: www.osnut.com is best viewed using Microsoft's Internet Explorer but can also be viewed with Netscape as well)

 If you feel you received this ad by mistake, please contact dsokol@osnut.com and put the word "remove" in the subject line.  You will automatically be taken off our mailing list!

--=200101130127=-- From heikki@ebi.ac.uk Sat Jan 13 16:56:10 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Sat, 13 Jan 2001 16:56:10 +0000 Subject: [Bioperl-l] RootI detachment proposal. References: Message-ID: <3A6088AA.2B470FE8@ebi.ac.uk> Jason Stajich wrote: > > On Fri, 12 Jan 2001, Ewan Birney wrote: > > > > > > > [Ewan recovers from rereading the Bio::Root:: stuff...] > > > > This is *mainly* for Jason and Hilmar, but in case there are other > > people who want to chip in: > > > > > > I want to completely detach RootI from the other Root::Objects (in > > particular Err). This means a heavy refactoring of RootI - mainly in > > removing the code. > > > > I will keep ->throw and ->warn but not ->verbose as a real method. (jason > > - do you mind this?) (I will have a "deprecation warning" on verbose) > > well, actually verbose makes me happy because we can choose whether or not > warn will actually print out msgs. Can it just be a get/set method and > warn can check to see if verbose > 0 before printing? I like to use it as > a debugging flag as well so we can have object specific debugging flags. I'd like to use verbose function but RootI documention is a bit hard to read at the moment. I have not followed too closely the discussion about RootI object but once this restructuring is done, it would be great to have a few clear examples what RootI can do and what are the options. For example, I was pleasently surprised that I could ignore the contructor method for a simple class which inherits from Bio::Root:RootI. I was not sure if it worked before trying. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Sat Jan 13 17:38:16 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Sat, 13 Jan 2001 17:38:16 +0000 Subject: [Bioperl-l] three letter codes for amino acids? References: <3A5D866C.C41B98E2@ebi.ac.uk> Message-ID: <3A609288.890F9FC7@ebi.ac.uk> I just committed the first version(s) of Bio::SeqUtils. Add in it any method you'd like Bio::PrimarySeqI compliant objects have. I put it two methods: ->seq3 and ->seq3in. seq3in, since now we do not have to worry about messing with interfaces, translates three letter amino acid codes into one letter code an stores it in the current sequence object. It throws an exception when seeing a code it does not know, although it probably should only warn and let -verbosity decide what to do. As an extra feature, both methods know about selenocystein (Sel, U). Have fun, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From schattner@alum.mit.edu Sat Jan 13 21:41:42 2001 From: schattner@alum.mit.edu (Peter Schattner) Date: Sat, 13 Jan 2001 13:41:42 -0800 Subject: [Bioperl-l] Molecular weight calculations Message-ID: <3A60CB95.54BD8D2E@alum.mit.edu> I've recently been revisiting the dna & protein molecular wieght calculations in SeqStats.pm and realize I have a few related questions I would like to pose to the more bio-chemically oriented folks on the list. In nucleic acid weight calculations: 1. Should SeqStats use the charged or the neutral molecular weight of the sugar-phosphate backbone? Given that these groups are charged at physiological pH it seems reasonable to me - and the one biochemist with whom I spoke - to use the charged values. However, at least one commercial package (VectorNTI) uses neutral weights so I am unsure. (The difference is ~0.5% - 1% ). 2. For the initial (5') and final (3') sugar phosphate, should SeqStats add an extra OH and an extra H respectively? Again adding the weight of the additional water seems readonable to me but is not the way the weight calculation is sometimes performed. (The diference here is 18 which is negligible except when computing molecular weights of very short oligos.) In protein weight calculations: 3. Should SeqStats use the charged or the neutral molecular weights of the acidic and basic amino acid residues (eg aspartate, glutamate, histidine, arginine, lysine) in its computations? Given that these amino acids are charged at physiological pH it seems reasonable to use charged values. However, again VectorNTI uses neutral weights so I am unsure. (The difference is ~0.5% - 1% times the fraction of amino acids in the protein which are acidic or basic). Although the difference in calculated weights is small, my understanding is that with mass spectroscopy becoming increasingly important for protein and nucleic acid analysis, having more precise molecular weights might be useful (but if that's not really true, I'd like to know that too.) It's easy enough to implement the calculation in any of these ways.Just want to do it in the way that seems most useful. Thanks for the help. Peter (The only downside of all this is that my revisiting of these caclulations was triggered by Keith James discovering a bug in the molecular weight calculations in the current (0.6) version of SeqStats.pm which causes it to return inaccurate values :--(. Everything is fixed for the - hopefully soon - 0.7 release, but in the meantime the molecular weight routines of SeqStats should be avoided. The other methods of SeqStats.pm are fine.) From birney@ebi.ac.uk Sun Jan 14 12:39:36 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 14 Jan 2001 12:39:36 +0000 (GMT) Subject: [Bioperl-l] Re: pSW problem In-Reply-To: <3A60B942.EE096B3C@alum.mit.edu> Message-ID: yOn Sat, 13 Jan 2001, Peter Schattner wrote: > Hi Ewan > > I just noticed that the demo of pSW in bptutorial.pl no longer works on > my machine. > Nor does examples/pSW.pl. In either case I get an error message like > that shown below. I can't > tell what's going on. Any ideas what may have changed? > i will track this down. I spotted this as well ;) > Peter > > > [peter@pschattner examples]$ perl -w psw.pl > Use of uninitialized value at > /usr/lib/perl5/site_perl/5.005/Bio/Tools/pSW.pm line 298. > Use of uninitialized value at > /usr/lib/perl5/site_perl/5.005/Bio/Tools/pSW.pm line 298. > Warning Error > Passed in NULL objects into Align_Sequences_ProteinSmithWaterman! > > -------------------- EXCEPTION -------------------- > MSG: Unable to build an alignment > CONTEXT: Error in uNKNOWN CONTEXT > SCRIPT: psw.pl > STACK: > Bio::Tools::pSW::align_and_show(299) > main::psw.pl(89) > --------------------------------------------------- > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 14 12:51:17 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 14 Jan 2001 12:51:17 +0000 (GMT) Subject: [Bioperl-l] updated Message-ID: A couple of days I updated the task list for 0.7 http://bio.perl.org/wiki/html/BioPerl/TaskList.html which is getting much more "green". Hilmar - I think we drop some of the more unlikely things to make it into 0.7 (NetIO class for example?) and concentrate on the last important features ... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 14 12:52:27 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 14 Jan 2001 12:52:27 +0000 (GMT) Subject: [Bioperl-l] RootI detachment proposal. In-Reply-To: <3A6088AA.2B470FE8@ebi.ac.uk> Message-ID: On Sat, 13 Jan 2001, Heikki Lehvaslaiho wrote: > I'd like to use verbose function but RootI documention is a bit hard > to read at the moment. I have not followed too closely the discussion > about RootI object but once this restructuring is done, it would be > great to have a few clear examples what RootI can do and what are the > options. have you cvs updated recently? I think the RootI is looking in much better shape at the moment... > > For example, I was pleasently surprised that I could ignore the > contructor method for a simple class which inherits from > Bio::Root:RootI. I was not sure if it worked before trying. > > -Heikki > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 16 19:02:47 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 16 Jan 2001 11:02:47 -0800 Subject: [Bioperl-l] refactoring RootI References: Message-ID: <3A649AD7.EE7DF8A1@gmx.net> Ewan Birney wrote: > > I have finished a very serious refactoring of RootI. This detaches > RootI from the other Root:: objects completely. verbose I think it handled > nicer. I would venture to say that the code is more readable. > > I have changed the formatting somewhat of the stack trace in the > throw/warn statements. Your milage may vary here... > > Jason, Hilmar - check it out and tell me what you think. > > I am now a little exhausted although the final product I think is vastly > improved... > Well, that was a radical surgery :) Even though SteveC won't be excited about it, it looks we now have a relatively clear and straight code base there. It also seems that Err.pm is now superfluous, so we may want to deprecate it. We should also build a test for $obj->throw(), that it really prints a meaningful stack trace. In addition, there should be a test demonstrating that $obj->verbose(2) really turns warn() into throw(). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 16 19:53:58 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 16 Jan 2001 11:53:58 -0800 Subject: [Bioperl-l] Status 0.7 References: Message-ID: <3A64A6D6.48735E77@gmx.net> Ewan Birney wrote: > > A couple of days I updated the task list for 0.7 > > http://bio.perl.org/wiki/html/BioPerl/TaskList.html > > which is getting much more "green". Hilmar - I think we drop some of the > more unlikely things to make it into 0.7 (NetIO class for example?) and > concentrate on the last important features ... > I think we should stick to our goal of finalizing the 0.7 release by the end of January. The situation actually doesn't look bad. Major things remaining to be addressed as I see it basically comprise of the following. 1) Fuzzy locations coverage. This is probably the most significant hurdle. Jason's already elaborating an interface outline. If anyone has suggestions/views/experience, feel encouraged to post. You may also want to check out Ewan's proposal (http://bioperl.org/pipermail/bioperl-l/2000-November/001724.html). 2) With the preceding being addressed, a review of SeqFeatureI and BioCorba interoperability may go hand in hand. Jason, Brad, is BioCorba 0.2 interoperability still within sight? 3) BPlite update. Lorenz seems to have abandoned the list, or is too busy with other things. It's priority 2, but I think at the same time as we are phasing out support for Blast.pm we need to increase support for BPlite. Anyone out there who would volunteer to assume responsibility? 4) SeqAnalysisParserI needs more elaboration, according to a discussion we (Jason, Ewan, I) had in December. It'll probably be the three of us who thrash this out. 5) Bio::SeqFeature::Transcript object. This will be related to GeneStructure and the concept has been worked out between Ewan and myself. Still, I'll need to put it into Perl code :) 6) Bugs reported on Incoming. (!) (These tend to be forgotten, but I'm sure they won't be fixed in a matter of minutes.) 7) The rest I think (I hope :) is smaller fixups, some of which I need to address myself. We'll probably have to drop Root::StreamIO (priority 3), and probably also fixing Blast.pm bugs, unless SteveC finds the time to fix them. It seems that almost all priority 2 tasks will make it into 0.7, BioCorba 0.2 being the only one not started yet. Since more or less all of us can do BioPerl work only on weekends, I suggest that we freeze the code on a Monday. I'll be off to San Jose (is anyone else going to attend the Microarray Meeting at BiOS?) the next weekend, so I propose to schedule the 0.7 code freeze for Feb. 5th (one week earlier would be Jan 29th). Note that once this is agreed upon, it will be a firm deadline. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 16 20:36:22 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 16 Jan 2001 15:36:22 -0500 Subject: [Bioperl-l] refactoring RootI References: <3A649AD7.EE7DF8A1@gmx.net> Message-ID: <002d01c07ffb$fff37cc0$35eb0398@mc.duke.edu> ----- Original Message ----- From: "Hilmar Lapp" To: "Ewan Birney" Cc: Sent: Tuesday, January 16, 2001 2:02 PM Subject: Re: [Bioperl-l] refactoring RootI > Ewan Birney wrote: > > > > I have finished a very serious refactoring of RootI. This detaches > > RootI from the other Root:: objects completely. verbose I think it handled > > nicer. I would venture to say that the code is more readable. > > > > I have changed the formatting somewhat of the stack trace in the > > throw/warn statements. Your milage may vary here... > > > > Jason, Hilmar - check it out and tell me what you think. > > > > I am now a little exhausted although the final product I think is vastly > > improved... > > > > Well, that was a radical surgery :) Even though SteveC won't be > excited about it, it looks we now have a relatively clear and > straight code base there. It also seems that Err.pm is now > superfluous, so we may want to deprecate it. I am very impressed as well, it should be a lot simplier. I did notice the warn/throw changed to only accept 1 parameter while I think it accepted 2 before - 1st paramet was printed as MSG: $_[0] second as NOTE: $_[1] But I don't think it is seriously important. > > We should also build a test for $obj->throw(), that it really > prints a meaningful stack trace. In addition, there should be a > test demonstrating that $obj->verbose(2) really turns warn() into > throw(). Did that in t/RootI.t I think, but it may not be extremely complete. Tried to make it catch all the thrown errors in eval, I didn't play with the SIG{__WARN__} settings enough to try and catch errors on warn when verbose== 1. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From lapp@gnf.org Tue Jan 16 22:34:33 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 16 Jan 2001 14:34:33 -0800 Subject: [Bioperl-l] Refactor mercilessly Message-ID: <3A64CC79.3A05295B@gnf.org> I found some thoughts about code refactoring at http://www.extremeprogramming.org/rules/refactor.html. As we are experiencing something similar with Bio::Root::*, what do people think about the points made there with particular regard to Bioperl? I enclose some quotes from that page. Hilmar We computer programmers hold onto our software designs long after they have become unwieldy. We continue to use and reuse code that is no longer maintainable because it still works in some way and we are afraid to modify it. [...] Refactor mercilessly to keep the design simple as you go and to avoid needless clutter and complexity. Keep your code clean and concise so it is easier to understand, modify, and extend. Make sure everything is expressed once and only once. [...] There is a certain amount of Zen to refactoring. It is hard at first because you must be able to let go of that perfect design you have envisioned and accept the design that was serendipitously discovered for you by refactoring. You must realize that the design you envisioned was a good guide post, but is now obsolete. -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 16 22:54:20 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 16 Jan 2001 17:54:20 -0500 (EST) Subject: [Bioperl-l] what to do about Blast.pm, parsing Message-ID: On the refactor front - I think BPlite is a good way to go for moving functionality from Blast.pm, however things like to_html/from_html are very nice and I'd like to see migrated along. Perhaps we could get a poll or priority list of features from Blast.pm which identify what we use it for to be sure they are migrated first. Another alternative is to go for a clean code base and write a module like what I've started locally called YABP (Yet Another Blast Parser). I'd like us to really identify the functions we want before starting to write it since porting all of Blast.pm to a new module is sort of silly if we aren't going to see signif benefit in functionality or speed. I do see the value in having a lightweight module to accomplish some tasks and a heavyweight one for doing others. I also have been playing with Parse::RecDescent some. While writing a grammar is not the most fun I've ever had, I've been able to write a parser for GenBank files and get at least accession,locus, and sequence lines parsed (I know, big deal). Feature table will be a bit more fun, but I think it may be a useful exercise whether or not we will really just write grammars for seqformats I don't know. Perhaps a grammar could be written for blast files - might be more trouble than it's worth... Just some thought rattling around... Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Tue Jan 16 23:00:14 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 16 Jan 2001 18:00:14 -0500 (EST) Subject: [Bioperl-l] Refactor mercilessly In-Reply-To: <3A64CC79.3A05295B@gnf.org> Message-ID: On Tue, 16 Jan 2001, Hilmar Lapp wrote: > I found some thoughts about code refactoring at > http://www.extremeprogramming.org/rules/refactor.html. As we are > experiencing something similar with Bio::Root::*, what do people think > about the points made there with particular regard to Bioperl? I enclose > some quotes from that page. > > Hilmar > I like XP for bioperl, but I ask who are our users as users are supposed to drive the product? It seems to be the users are also the system developers. So I think we have to stop occasionally and ask - what do I want to be able to do with this system/api? This is where some of the list subscribers who don't want to develop code can really help out by identifying areas that bioperl needs to focus on or where needs aren't being met. > > We computer programmers hold onto our > software designs long after they have become > unwieldy. We continue to use and reuse code that is > no longer maintainable because it still works in some > way and we are afraid to modify it. > [...] > Refactor mercilessly to keep the design > simple as you go and to avoid needless clutter and > complexity. Keep your code clean and concise so it > is easier to understand, modify, and extend. Make > sure everything is expressed once and only once. > [...] > There is a certain amount of Zen to > refactoring. It is hard at first because you must be > able to let go of that perfect design you have > envisioned and accept the design that was > serendipitously discovered for you by refactoring. > You must realize that the design you envisioned was > a good guide post, but is now obsolete. > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From ajm6q@virginia.edu Tue Jan 16 23:25:13 2001 From: ajm6q@virginia.edu (Aaron J Mackey) Date: Tue, 16 Jan 2001 18:25:13 -0500 (EST) Subject: [Bioperl-l] what to do about Blast.pm, parsing In-Reply-To: Message-ID: On Tue, 16 Jan 2001, Jason Stajich wrote: > I also have been playing with Parse::RecDescent some. While writing a > grammar is not the most fun I've ever had, I've been able to write a > parser for GenBank files and get at least accession,locus, and sequence > lines parsed (I know, big deal). Feature table will be a bit more fun, > but I think it may be a useful exercise whether or not we will really just > write grammars for seqformats I don't know. Perhaps a grammar could be > written for blast files - might be more trouble than it's worth... I've often thought the same (and then stepped back and wondered if blast/fasta/hmmer output could be expressed in BNF [ Backus-Naur Form ]). It seems like an excellent project for an undergrad CS major who wanted to crossover into bioinformatics. There's too much grunt work involved for any of us to want to do it, though ;) Maybe we should take this off-list Jason, but do you have any comments on Parse::ResDecent vs. Parse::Yapp utility? -Aaron -- o ~ ~ ~ ~ ~ ~ o / Aaron J Mackey \ \ Dr. Pearson Laboratory / \ University of Virginia \ / (804) 924-2821 \ \ amackey@virginia.edu / o ~ ~ ~ ~ ~ ~ o From jason@chg.mc.duke.edu Wed Jan 17 02:34:11 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 16 Jan 2001 21:34:11 -0500 (EST) Subject: [Bioperl-l] Status 0.7 In-Reply-To: <3A64A6D6.48735E77@gmx.net> Message-ID: On Tue, 16 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > A couple of days I updated the task list for 0.7 > > > > http://bio.perl.org/wiki/html/BioPerl/TaskList.html > > > > which is getting much more "green". Hilmar - I think we drop some of the > > more unlikely things to make it into 0.7 (NetIO class for example?) and > > concentrate on the last important features ... > > > > I think we should stick to our goal of finalizing the 0.7 release > by the end of January. The situation actually doesn't look bad. > Major things remaining to be addressed as I see it basically > comprise of the following. > > 1) Fuzzy locations coverage. This is probably the most significant > hurdle. Jason's already elaborating an interface outline. If > anyone has suggestions/views/experience, feel encouraged to post. > You may also want to check out Ewan's proposal > (http://bioperl.org/pipermail/bioperl-l/2000-November/001724.html). Hopefully will have something by the end of the week or early next week. > > 2) With the preceding being addressed, a review of SeqFeatureI and > BioCorba interoperability may go hand in hand. Jason, Brad, is > BioCorba 0.2 interoperability still within sight? I haven't played with this much, I was planning on doing it after the SeqFeatureI - LocationI stuff was settled. > > 3) BPlite update. Lorenz seems to have abandoned the list, or is > too busy with other things. It's priority 2, but I think at the > same time as we are phasing out support for Blast.pm we need to > increase support for BPlite. Anyone out there who would volunteer > to assume responsibility? > > 4) SeqAnalysisParserI needs more elaboration, according to a > discussion we (Jason, Ewan, I) had in December. It'll probably be > the three of us who thrash this out. Hmm, we need to determine what the future of SeqFeatureProducerI is as well in this context. > > 5) Bio::SeqFeature::Transcript object. This will be related to > GeneStructure and the concept has been worked out between Ewan and > myself. Still, I'll need to put it into Perl code :) > > 6) Bugs reported on Incoming. (!) (These tend to be forgotten, but > I'm sure they won't be fixed in a matter of minutes.) > > 7) The rest I think (I hope :) is smaller fixups, some of which I > need to address myself. > > We'll probably have to drop Root::StreamIO (priority 3), and > probably also fixing Blast.pm bugs, unless SteveC finds the time > to fix them. It seems that almost all priority 2 tasks will make > it into 0.7, BioCorba 0.2 being the only one not started yet. I wanted to wait until code was stable before working on BioCorba stuff since it is entirely dependant on the bioperl modules api. > > Since more or less all of us can do BioPerl work only on weekends, > I suggest that we freeze the code on a Monday. I'll be off to San > Jose (is anyone else going to attend the Microarray Meeting at > BiOS?) the next weekend, so I propose to schedule the 0.7 code > freeze for Feb. 5th (one week earlier would be Jan 29th). Note > that once this is agreed upon, it will be a firm deadline. Yes. Feb 5 is reasonable. Let's see how close we are the week before and take stock. Thanks for being the lead on this! > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From SnyderEE@pbrc.edu Wed Jan 17 02:43:48 2001 From: SnyderEE@pbrc.edu (Eric Snyder) Date: Tue, 16 Jan 2001 20:43:48 -0600 Subject: [Bioperl-l] Map Manipulation and Genetic Analysis Message-ID: Hello BioPerl Folks, I was thumbing through the BioPerl modules list and noticed that there was not any coverage in the area of processing (non-sequence) maps and genetic data. I am working on some programs for processing physical and genetic maps, as well as genotypic and phenotypic data. I was wondering, is there any interest in these areas in the BioPerl community or, have I overlooked previous work on these things? I know of some of the stuff that Lincoln Stein has done (on ACEDB, RH mapping, etc.) but I have not seen anything in the form of reusable software components for basic map manipulation, comparison, etc. Nor am I aware of modules for manipulating raw data for genetic analysis. I am fairly new to working with genetic data. I would be interested in hearing of leads in this area. However, if it is not already done, I would be willing to write it in the context of BioPerl. Cheers, Eric E. Snyder Associate Professor Pennington Biomedical Research Center 6400 Perkins Road Baton Rouge, LA 70808-4124 USA Phone: (225) 763-3185 Fax: (225) 763-2525 Cell: (225) 235-6271 Email: eesnyder@pbrc.edu ICBM: N 30 24'14.0", W 91 07'20.0" From jason@chg.mc.duke.edu Wed Jan 17 22:07:29 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 17 Jan 2001 17:07:29 -0500 (EST) Subject: [Bioperl-l] Map Manipulation and Genetic Analysis In-Reply-To: Message-ID: Eric - Heikki and I had batted around talking about MarkerI for describing Markers which can be used to build maps. I have some code that I am using for some analysis which I am happy to donate when it is finished. It doesn't do much to represent maps other than assume that markers with the same mapid are part of the same map (data is stored in db). But I think a good representation of Markers first and then Maps would be very good for bioperl and those trying to bridge the gap between genetic analysis, maps, and sequence based investigation. -Jason On Tue, 16 Jan 2001, Eric Snyder wrote: > Hello BioPerl Folks, > > I was thumbing through the BioPerl modules list and noticed that there > was not any coverage in the area of processing (non-sequence) maps and > genetic data. I am working on some programs for processing physical > and genetic maps, as well as genotypic and phenotypic data. I was > wondering, is there any interest in these areas in the BioPerl > community or, have I overlooked previous work on these things? > > I know of some of the stuff that Lincoln Stein has done (on ACEDB, RH > mapping, etc.) but I have not seen anything in the form of reusable > software components for basic map manipulation, comparison, etc. Nor > am I aware of modules for manipulating raw data for genetic analysis. > I am fairly new to working with genetic data. I would be interested > in hearing of leads in this area. However, if it is not already done, > I would be willing to write it in the context of BioPerl. > > Cheers, > > > Eric E. Snyder > Associate Professor > Pennington Biomedical Research Center > 6400 Perkins Road > Baton Rouge, LA 70808-4124 > USA > Phone: (225) 763-3185 > Fax: (225) 763-2525 > Cell: (225) 235-6271 > Email: eesnyder@pbrc.edu > ICBM: N 30 24'14.0", W 91 07'20.0" > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From imre.vastrik@helsinki.fi Thu Jan 18 10:29:58 2001 From: imre.vastrik@helsinki.fi (Imre Vastrik) Date: Thu, 18 Jan 2001 12:29:58 +0200 Subject: [Bioperl-l] BPlite bug Message-ID: <3A66C5A6.1046E7A0@helsinki.fi> Don't know if this one is for Lorenz or Jason: BPlite seems to be unaware of ' Frame = ...' lines in NCBI TBLASTN etc reports. Consequently parsing of the alignment lines does not work properly. The bug does not show up with the current test, since it is BLASTP report (lacks Frame lines). A quick hack would be to introduce the following line between lines 115 and 120: elsif ($_ =~ /^\s*Frame/) {next} However, the frame info, of course, will be lost. Bug report filed. Rgds., imre From jason@chg.mc.duke.edu Thu Jan 18 17:55:53 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 18 Jan 2001 12:55:53 -0500 (EST) Subject: [Bioperl-l] split seq feature and fuzzy feature proposal Message-ID: http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html Please look it over, I didn't describe the detail of the fuzzy feature methods because I'm not sure there will be extra methods, just overriding things like start,end to be remapped. The different feature types need to be differentiated so that Bio::SeqIO::FTHelper can handle then differently when parsing/writing. Ewan, Let me know what I've left off. Hilmar does this sound reasonable, straightforward enough to you? Some may have a beef about the name - SplitSeqFeature - you are welcome to propose a better one. Send you comments or make corrections to the wiki (send a courtesy note to let us know to check the webpage). Thanks for you help. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Thu Jan 18 19:11:57 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 18 Jan 2001 11:11:57 -0800 Subject: [Bioperl-l] Re: LocationI References: Message-ID: <3A673FFD.1CCDAF96@gmx.net> Jason Stajich wrote: > > Interfaces: > > Bio::LocationI -> ISA RangeI > Purpose: capture location information - such as in an EMBL/GenBank > feature > /source 1..345 > Methods: RangeI methods, and ...? [start/end/strand] > > Questions: How is a LocationI object going to be different from the > vanilla SeqFeatureI or should be migrate some methods from > SeqFeature (start/end/strand) to LocationI and make > SeqFeaturesI more about tags (primary/source/has_tag/each_tag) > and gff stuff? In principle I think yes. SeqFeatureI could still keep start/end/strand and map these to calls into the location object. Or, SeqFeatureI loses it (i.e., it's no longer mandatory), but for simplicity SeqFeature::Generic keeps it. > > Bio::ComplexLocationI -> ISA Bio::LocationI > Purpose: capture location information for features that are not linear > as in an EMBL/Genbank join > CDS join(544..589,688..1032) > > Methods: > - sub_Locations() -> a list of LocationI objects that indicate > start/stop boundaries for this object must override overlap, > contains, etc from RangeI with since coordinates are not > contiguous > > Objects: > Bio::SeqFeature::Generic -> ISA Bio::SeqFeatureI, Bio::LocationI > add the location() method to this object, the LocationI object > returned will be a reference to $self. > > Bio::SeqFeature::Complex -> ISA Bio::SeqFeatureI, Bio::ComplexLocationI > Purpose: implementation to handle those join() statements This is the outline you pretty much follow in the proposal on Wiki. The point I'm not so happy with is that purely location-specific issues change the class (type) of a SeqFeature. > > I'm still not clear on what a fuzzy location is supposed to represent > ie - does that mean we know that the feature is located somewhere > in the range, but we don't know the exact start/stop? Exactly. At least to my understanding. > Why can't you treat > it like real start/stop since we don't have any more information? Or > would union/intersection calculations need to behave differently? > Well, biologically you can't, because annotating a sequence with such a feature without indicating the uncertainty of start and end is deceptive. For cDNA entries this is sometimes crucial: <1..100 as CDS location means that the entry doesn't even contain the start of the CDS, and it's totally unclear where that is. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Thu Jan 18 19:26:57 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 18 Jan 2001 11:26:57 -0800 Subject: [Bioperl-l] split seq feature and fuzzy feature proposal References: Message-ID: <3A674381.98F0C7B2@gmx.net> Jason Stajich wrote: > > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html > > Please look it over, I didn't describe the detail of the fuzzy feature > methods because I'm not sure there will be extra methods, just overriding > things like start,end to be remapped. The different feature types need to > be differentiated so that Bio::SeqIO::FTHelper can handle then differently > when parsing/writing. > > Ewan, Let me know what I've left off. Hilmar does this sound reasonable, > straightforward enough to you? > You didn't include actual interface definitions, did you? Just wondering whether I missed the link. As mentioned before, what bothers me is that in this layout location-specific issues impact the class (type) of a SeqFeature. Why should any SeqFeature change it's type only because its location becomes uncertain or compound, and vice-versa? I'd rather favor uncoupling a feature and its location, with features having a reference to a location object which will give further detailsif the application worries. An application that doesn't do anything with the coordinates wouldn't notice a change, but an application that e.g. draws features on sequences will have to decide what to do if the location object says that the coordinates are not well determined. Retrieving the sequence part the feature refers to on its attached seq will also be affected: doing so for a feature with an uncertain location will result in an exception being thrown. Separating SeqFeatureI and LocationI allows also for the following: assume a feature with uncertain start and end. If you're satisfied with an average start and end, you can substitute the location object by a Range with certain start and end, and voila - drawing, sequence excision etc will just work fine on the very same feature object. Maybe I'm missing something. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Thu Jan 18 19:41:51 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 18 Jan 2001 14:41:51 -0500 (EST) Subject: [Bioperl-l] split seq feature and fuzzy feature proposal In-Reply-To: <3A674381.98F0C7B2@gmx.net> Message-ID: On Thu, 18 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html > > > > Please look it over, I didn't describe the detail of the fuzzy feature > > methods because I'm not sure there will be extra methods, just overriding > > things like start,end to be remapped. The different feature types need to > > be differentiated so that Bio::SeqIO::FTHelper can handle then differently > > when parsing/writing. > > > > Ewan, Let me know what I've left off. Hilmar does this sound reasonable, > > straightforward enough to you? > > > > You didn't include actual interface definitions, did you? Just > wondering whether I missed the link. No - didn't describe actual interfaces since we are still struggling through this. Will do that when we agree enough. > > As mentioned before, what bothers me is that in this layout > location-specific issues impact the class (type) of a SeqFeature. > Why should any SeqFeature change it's type only because its > location becomes uncertain or compound, and vice-versa? Ewan and I had decoupled the LocationI from SeqFeature but there was no seen advantage, just interface mish-mash, perhaps we were too hasty? What you suggest above could be done as: Bio::SeqFeatureI ISA RangeI method : location desc : Get/Set method args : LocationI object returns: LocationI object method : start() desc : start location of seqfeature sub start { my($self) = @_; return $self->location->start() } ... similar for end ... Bio::LocationI ISA RangeI Bio::SplitLocationI ISA Bio::LocationI method: sub_SeqFeatures() desc : method for obtaining list of sub Locations - they could be SeqFeature::Exons, SeqFeature::Generic, or LocationI's? returns: list of LocationI or SeqFeatureI objects? Bio::FuzzyLocationI ISA Bio::LocationI method: get_embl_fuzzy_string() desc : possible method to return location as an embl string for a fuzzy location returns: string Does this seem more agreeable - location is decoupled from SeqFeature, but we have to support backwards compatibility with SeqFeatureI ISA RangeI which means all SeqFeatures have a start/end... > > I'd rather favor uncoupling a feature and its location, with > features having a reference to a location object which will give > further detailsif the application worries. An application that > doesn't do anything with the coordinates wouldn't notice a change, > but an application that e.g. draws features on sequences will have > to decide what to do if the location object says that the > coordinates are not well determined. Retrieving the sequence part > the feature refers to on its attached seq will also be affected: > doing so for a feature with an uncertain location will result in > an exception being thrown. Separating SeqFeatureI and LocationI > allows also for the following: assume a feature with uncertain > start and end. If you're satisfied with an average start and end, > you can substitute the location object by a Range with certain > start and end, and voila - drawing, sequence excision etc will > just work fine on the very same feature object. > > Maybe I'm missing something. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Thu Jan 18 20:34:24 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 18 Jan 2001 12:34:24 -0800 Subject: [Bioperl-l] split seq feature and fuzzy feature proposal References: Message-ID: <3A675350.39820E7B@gmx.net> Jason Stajich wrote: > > What you suggest above could be done as: > > Bio::SeqFeatureI ISA RangeI > > method : location > desc : Get/Set method > args : LocationI object > returns: LocationI object > > method : start() > desc : start location of seqfeature > > sub start { > my($self) = @_; > return $self->location->start() > } > Note that as one of the few noticeable changes in the SeqFeatureI API this call should be allowed to throw an exception if 1) the start location is uncertain 2) the start location does not refer to the attached seq (to be disputed) > ... similar for end ... > > Bio::LocationI ISA RangeI > > Bio::SplitLocationI ISA Bio::LocationI > > method: sub_SeqFeatures() > desc : method for obtaining list of sub Locations - they could be > SeqFeature::Exons, SeqFeature::Generic, or LocationI's? > returns: list of LocationI or SeqFeatureI objects? > Yeah, that's the really hairy case. We probably should define first what we would like to be able to do with compound locations. This is a strong call for feedback: what do people out there using the package intend to do with compound locations? E.g. if you draw annotations, would you just draw the part referring to the attached seq? Ensembl people, any experience/wishlists for this? An obvious requirement is the ability to recover the original GenEmbl location string, so all the information necessary should be present. A compound location indeed is somewhat a hybrid between a location and a feature, because a sublocation clearly only makes sense if you also know the sequence it refers to. The sequence can be identified by its name (but then which name? the name in the location line as given in GenBank?), or by an object reference? The latter can be very expensive, because the sequence can be quite long, and if there are many of such sublocations, you quickly eat up your memory. You could also construct the seq object as sort of a dummy, without really holding the seq string. Not really convincing. So why not the simple case: a CompoundLocation has a method sub_Locations(). Each sublocation has a method seqname() (or seq_id() or whatever you prefer), which returns the same string as $feature->seqname() for subfeatures lying on the same seq, and a different name for those referring to other seqs. $feature->seq() for features with a compound location throws an exception, unless all sublocations are on the same (attached) sequence. Too simple? > Bio::FuzzyLocationI ISA Bio::LocationI > > method: get_embl_fuzzy_string() > desc : possible method to return location as an embl string for a fuzzy > location > returns: string > min_start()/max_start() etc should also be included. start() and end() in an implementation are overridden and throw exceptions, depending on which end is uncertain (and least they should be expected to throw exceptions). A certain end can be determined by min_start() == max_start() (or .._end(), resp.). > Does this seem more agreeable - location is decoupled from SeqFeature, but > we have to support backwards compatibility with SeqFeatureI ISA RangeI > which means all SeqFeatures have a start/end... > I indeed like the decoupled approach much better. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Thu Jan 18 23:27:53 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 18 Jan 2001 23:27:53 +0000 (GMT) Subject: [Bioperl-l] split seq feature and fuzzy feature proposal In-Reply-To: Message-ID: On Thu, 18 Jan 2001, Jason Stajich wrote: > On Thu, 18 Jan 2001, Hilmar Lapp wrote: > > > Jason Stajich wrote: > > > > > > http://www.bioperl.org/wiki/html/BioPerl/AdvancedSeqFeatureLocations.html > > > > > > Please look it over, I didn't describe the detail of the fuzzy feature > > > methods because I'm not sure there will be extra methods, just overriding > > > things like start,end to be remapped. The different feature types need to > > > be differentiated so that Bio::SeqIO::FTHelper can handle then differently > > > when parsing/writing. > > > > > > Ewan, Let me know what I've left off. Hilmar does this sound reasonable, > > > straightforward enough to you? > > > > > > > You didn't include actual interface definitions, did you? Just > > wondering whether I missed the link. > > No - didn't describe actual interfaces since we are still struggling > through this. Will do that when we agree enough. > > > > > As mentioned before, what bothers me is that in this layout > > location-specific issues impact the class (type) of a SeqFeature. > > Why should any SeqFeature change it's type only because its > > location becomes uncertain or compound, and vice-versa? > > > Ewan and I had decoupled the LocationI from SeqFeature but there was no > seen advantage, just interface mish-mash, perhaps we were too hasty? Just to chime in, my original proposal had locations separate from SeqFeatures, but at the end of the day we seemed to be making two parallel interface heirarchies with no real gain in abstraction or understanding, and the potential for generating alot of confusion So - I guess to flip around the question - what do we gain from hanging location "off" seqfeature rather than merging the interfaces? (remember interface definitions can be implemented with any number of objects or object collections if so desired...) e. From birney@ebi.ac.uk Thu Jan 18 23:37:24 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 18 Jan 2001 23:37:24 +0000 (GMT) Subject: [Bioperl-l] split seq feature and fuzzy feature proposal In-Reply-To: <3A675350.39820E7B@gmx.net> Message-ID: On Thu, 18 Jan 2001, Hilmar Lapp wrote: > > Note that as one of the few noticeable changes in the SeqFeatureI > API this call should be allowed to throw an exception if > 1) the start location is uncertain > 2) the start location does not refer to the attached seq > (to be disputed) My feeling is that seqfeature->start should still be well defined. It is up to the SeqFeature implementing class to "make a sensible decision" about start/end points. If it is fuzzy/complex/strange the client can test. If the client does not want to test and just wants to "draw it", I think inisiting that start/end/seqname return *something* is valid. Otherwise the client has no real option to figure out what to do with these things... If we let the implementaiton objects get away with not implementing this, the interface becomes less useful... > annotations, would you just draw the part referring to the > attached seq? Ensembl people, any experience/wishlists for this? Experience on our side is that 90% of things are either SeqFeatures or FeaturePairs and fit the simple seqfeature interface just fine the remaining 10% are genes and could be handled via some sort of complex location thing. As genes have transcripts have exons, simple mapping to complex locations is not on. For other internal reasons, Ensembl is very likely to keep with specialised adaptor classes which map Ensembl genes to Bioperl SeqFeatures, so we are flexible here... > > An obvious requirement is the ability to recover the original > GenEmbl location string, so all the information necessary should > be present. Right. > > > min_start()/max_start() etc should also be included. start() and > end() in an implementation are overridden and throw exceptions, > depending on which end is uncertain (and least they should be > expected to throw exceptions). A certain end can be determined by > min_start() == max_start() (or .._end(), resp.). I would be in favour or min_start/max_start but against letting start throw an exception. The implementation has to decide how to "become a hard feature" from being Fuzzy. It is up to the implementation. As long as this is documented, this is no more arbitary than letting the client decide. > > > Does this seem more agreeable - location is decoupled from SeqFeature, but > > we have to support backwards compatibility with SeqFeatureI ISA RangeI > > which means all SeqFeatures have a start/end... > > > > I indeed like the decoupled approach much better. > If we go for a decoupled approach I am keen on it being justified by more than just "it feels good". We are increasing the complexity here alot and we need justification... > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From lapp@gnf.org Fri Jan 19 01:28:09 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 18 Jan 2001 17:28:09 -0800 Subject: [Bioperl-l] split seq feature and fuzzy feature proposal References: Message-ID: <3A679829.1A668563@gnf.org> Ewan Birney wrote: > > > > > > min_start()/max_start() etc should also be included. start() and > > end() in an implementation are overridden and throw exceptions, > > depending on which end is uncertain (and least they should be > > expected to throw exceptions). A certain end can be determined by > > min_start() == max_start() (or .._end(), resp.). > > I would be in favour or min_start/max_start but against letting start > throw an exception. The implementation has to decide how to "become a hard > feature" from being Fuzzy. It is up to the implementation. As long as this > is documented, this is no more arbitary than letting the client decide. > I think it is more arbitrary, and I'll tell you why. There is more than one interpretation of fuzzy locations. I name two for which I think the BioPerl core is not in a position to take the decision from the client, which is why it shouldn't pretend that it is: 1) Uncertainty about the real location, that is, it is clear that the described feature sits at a particular position, but for one reason or another the producer of the feature can only give an estimated range for start and/or end. Now, we can implement (and document) the rule that in such cases $feature->start() and $feature->end() will always return the widest (or smallest, or average, make your choice) possible range. A client is then free to rely on it, thinking that what the BioPerl developers decided for is probably the wisest choice you can make. That's already catch #1. Catch #2 happens if there is a user of the client program who, because he's a good user, read the documentation of the client program, but not that of BioPerl. Do we request users of programs that use BioPerl to read through the BioPerl documentation as well? 2) The location is undefined. A location saying <1..100 is undefined for that feature in its biological meaning. You're not supposed to make up a value for an undefined value. If you had an interface dividing two integers and returning an integer (to prevent you from responding NAN or INF), and the denominator is zero, what do you return? I strongly believe that every client that does something sensible with the feature coordinates should know, and should be required to make sure in order to be safe from an exception, what type of coordinates it is dealing with. It is not the task of BioPerl to relieve the client from thinking, but it is its task to provide every information the client needs for making an educated decision. You can always divide by a number without checking for zero, but by doing so you accept the risk that some day you might get an exception. The same holds for clients calling $feature->start() instead of obtaining the location object and examining it for its capabilities. Maybe I'm missing an important point in having $feature->start() guaranteed to be exception-free. > > > > I indeed like the decoupled approach much better. > > > > If we go for a decoupled approach I am keen on it being justified by more > than just "it feels good". We are increasing the complexity here alot and > we need justification... > First for clarification: I thought we agree that we have different interfaces, that is, SeqFeatureI (ISA RangeI) and LocationI (ISA RangeI), don't we? Regarding complexity, the question is whether we better have subinterfaces for each of FuzzyLocation, CompoundLocation, etc (what is etc?), or whether we pack all into one interface. I have a preference for the first, because it let's you find out the type of location by checking $loc->isa('Bio::SomeLocationInterface'). I maybe missing another equally elegant way if everything's in one interface. The increase in complexity is fairly little I think. All interfaces can be put into their own subdirectory (Bio::Loc?). Only those people are really concerned with it who want to deal with the coordinates in a very reliable way (that is, avoid exceptions and deal with any possible sort of location type). And these people really should care what type of location they could encounter, and they mean. Everyone else could simply use LocationI which in essence is probably the same as RangeI. Regarding your point that there can be many implementations of an interface, sure that's true. In principle I have no problem with $feature->location() returning $self, assuming that the SeqFeature object implements LocationI itself. But I do think it's bad if a SeqFeature implements every type of location interface itself, because if I wanted to change the type of a feature's location I would end up instantiating a SeqFeature passed to a SeqFeature as its location object, which is weird isn't it. I say weird because it's not lightweight. No more of those beast-like classes, please. I don't think the reduction in hierarchy complexity achieved by beast classes makes them easier to learn, or to use. You may ask why I wish to change the type of a location. Consider a client program that draws features. When it encounters a feature with a FuzzyLocation, it may want to ask the user what to do. The user may even be able to set a preference like 'always take the widest possible range'. Then the client program simply replaces the FuzzyLocation with a Range object denoting the widest possible range and passes the feature on to the drawing module. No code change necessary there. And the user knows what he's doing, it's not just an arbitrary decision of a backend library. So, I still think that having not only individual interfaces, but also individual implementations for the different location types is justified, doesn't add too much complexity (in fact, it reduces hidden complexity), and provides a clear API for programmers. Long mail, sorry for wasting your time to read it, but you asked. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney@ebi.ac.uk Fri Jan 19 08:45:58 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 19 Jan 2001 08:45:58 +0000 (GMT) Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... Message-ID: Ok. Hilmar and I are now probably into the "code aesthetics" part of this debate, which definitely is worth having but someone sometime has to make a decision. I suggest that we keep bashing this out on the list for a couple more days (please... other people... if you have a view, do chip in). If Hilmar and I are still disagreeing with aesthetics I would like to nominate Jason to tie-break on the way to go (is this ok with you Hilmar and Jason...?) We have two points of contention: (a) Explicit Location objects or not. Hilmar suggests an explicit location object SeqFeatureI has-a LocationI LocationI is sub classed for Split (join statements) and Fuzzies Benefits - (a) easy to mix and match implementations of locations to different feature objects, and (b) if mix and matching locations to features is common, more realisatic. Hilmar argues that is clearer as well. Against - more objects and infact the majority of seqfeatures are little more than the location, and two extra strings. For backwards compatibility, I think SeqFeatureI->start would *have* to be delegated to SeqFeatureI->location->start - otherwise too much code will break... (of course, this delegation could just be for a while as we move code and people over to using "proper" locations) People might be interested that I originally argued for an explicit location object about 1 month ago. I don't now... I am suggesting that SeqFeatures do not have an explicit location object, but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting from a common SeqFeature interface Benefits - (a) less objects (b) only one place where the client gets the information and (c) more backwardly compatible. Effectively my main argument is that there will always be a pretty clear cut relationship that "this type of SeqFeature" is always "this class of location" so the splitting of the location away from the SeqFeature is just suggesting a mix-and-match world which doesn't actually exist. Simpler and stronger to go for the combined interface in my view. (b) ->start ->end throwing exceptions or not. Hilmar says that for at least Fuzzies and possibly Splits the client should figure out by rooting around the object how to map these more complex locations to a simple start,end. The interface should allow exceptions to be thrown on ->start/->end indicating that the client should be treating this seqfeature somehow differently... Basically we pass the buck to the client. I say that the implementation objects have to provide a default mapping of whatever ->start and ->end are. This means that clients can live in this happy world of "I have well defined start/ends" if they so wish without writing extra code. Smart clients are encouraged to root around in the objects for their "real" interpretation of the fuzziness. There are three reasons why I favour this: (a) Clients for dumping/drawing/manipulation have to treat large numbers of sequence features as a pretty homogeneous mass. If we make seqfeatures less homogeneous then every client is going to have to figure out how to "homogenize" the seqfeatures - this will be different client to client although for the main case they just want a "default way" of handling them. We are encouraging a diversity of views when our clients really want us to solve the problems for them. (b) as 99% of features are nice, well behaved "hard features" many pieces of client code written with the bioperl libaries will just assumme ->start,->end do not throw exceptions. When this piece of code is used by another user with a fuzzy feature, there will be a rather deep exception thrown by bioperl through the client code. I think both the user and the client with some justification will blame bioperl for this, no matter how much we say "you should have read the documentation and written 3 different subroutines to replace every time you go if( $one->start == $two->start ) gets replaced by if( &my_exact_function($one,$two) ) { } ... sub my_exact_function { # one of many if statements... if( $one->isa('Bio::FuzzyFeatureI') && $two->isa('Bio::SimpleFeatureI') { ... } } (c) long experience with seqfeatures has made me claim that the following rules are generally just what people want: - simple features - easy - join statements - ignore leading and trailing '<' '>' and take the edge start/end points on the sequence you are looking at - fuzzy features - either skip or - if you have to draw/compare them, take start/end as the min hard location mentioned and the maximum hard location mentioned, irregardless of the internal grammar. I reckon bioperl will be better to implement the (c) method by default without preventing smart clients from making their own decisions. Another long email, but worth I think knowing where we disagree... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From gert.thijs@esat.kuleuven.ac.be Fri Jan 19 12:14:53 2001 From: gert.thijs@esat.kuleuven.ac.be (gert thijs) Date: Fri, 19 Jan 2001 13:14:53 +0100 Subject: [Bioperl-l] split seq feature and fuzzy feature proposal References: <3A675350.39820E7B@gmx.net> Message-ID: <3A682FBD.8DB5C25D@esat.kuleuven.ac.be> Hilmar Lapp wrote: > > Yeah, that's the really hairy case. We probably should define > first what we would like to be able to do with compound locations. > This is a strong call for feedback: what do people out there using > the package intend to do with compound locations? E.g. if you draw > annotations, would you just draw the part referring to the > attached seq? Ensembl people, any experience/wishlists for this? > I hope do not mind me giving some comments on this issue. I am writing some programs to automatically extract genes and intergenic regions from DNA sequences. So, I am mostly interested in the type of a feature and also its start and end position in the sequence. The main problem I am facing is that sometimes a feature is not extracted from the sequence because it has a fuzzy location. eg. if the location of a CDS is described as "join(AL101010.1:1..201,123..245) this CDS is not add to the list of feature and it is impossible to do anything usefull with this sequence for me. In my opinion, I think it is important that a feature is created even if the location is fuzzy. When there is a problem, it should be possible to access the description of the location. Gert -- + Gert Thijs + + email: gert.thijs@esat.kuleuven.ac.be + homepage: http://www.esat.kuleuven.ac.be/~thijs + + K.U.Leuven + ESAT-SISTA + Kasteelpark Arenberg 10 + B-3001 Leuven-Heverlee + Belgium + Tel : +32 16 32 18 84 + Fax : +32 16 32 19 70 From arek@ebi.ac.uk Fri Jan 19 09:58:45 2001 From: arek@ebi.ac.uk (Arek Kasprzyk) Date: Fri, 19 Jan 2001 09:58:45 +0000 (GMT) Subject: [Bioperl-l] Re: [Fwd: Re: marker manipulation in bioperl] In-Reply-To: <3A68082D.8CC2DD29@ebi.ac.uk> Message-ID: On Fri, 19 Jan 2001, Heikki Lehvaslaiho wrote: Hi guys, I have not been following this discussion very closely but thought you may find useful to poke around a set of ensembl modules which called ensembl-map. I think that some of the ideas you are talking about have been implemented there. Arek > -------- Original Message -------- > Subject: Re: marker manipulation in bioperl > Date: Thu, 18 Jan 2001 13:06:26 -0500 (EST) > From: Jason Stajich > To: Heikki Lehvaslaiho > CC: Eric Snyder > > Heikki - yes I think going via Variation::VariantI is a good way - I > am > not as familiar as I'd like to be with the Variation objects, but this > makes sense and I could imagine actually having ways to handle alleles > later on which might become useful. > > I'd still like to have an interface describe a Marker so we can do > some > fun inheritance things later with different types of markers. So I'd > make > a MarkerI and it would subclasses VariantI and add the methods > pcr_fwd, > pcr_rev (or a more appropriate function name). > > Eric [ might want to read below first ] does the OO stuff make sense > here? > If we make MarkerI with basic methods pcrprimers, chrom, sequence > location > then a concrete implementation of this can be GenericMarker, and > various > subclasses - RhMarker, STSMarker, MicrosatteliteMarker or > GeneticMarker, > RhMarker, ... depending on how you want to describe them. If they > have > specific attributes or methods that are particular to that type of > marker. > > Then on the Maps front, something like a > LinkageMap could be then build using GeneticMarkers or STSMarkers > as they implemented a function like get_genetic_location... or > get_location('cM'); > > Am I too far out there in interface land for you? > > -jason > On Thu, 18 Jan 2001, Heikki Lehvaslaiho wrote: > > > > Jason, > > > > I finally found my notes on upgrading the Ensembl Variation class. > > The problem there is that the SNP with an ID can have several > > locations in a genome. At the moment when several locations are needed > > I simply return several Variation objects with same ID. Not very > > pretty, but the interface requires me to return SeqFeature objects not > > something that contains them. > > > > So, your needs. You said that you need the following methods: > > > > fwd_primer, rev_primer, length, genetic_location, marker_sequence > > > > The following lists where they could go (+) are are already in > > Variation > > classes(%) : > > > > Bio::Variation::VariantI > > subclassed by DNAMutation, RNAChange, AAChange > > > > + fwd_primer, (moltype not protein) > > + rev_primer, (moltype not protein) > > % length, > > % add_DBLink > > % each_DBLink > > % status > > > > Bio::Variation::SeqDiff (VariantI holder class) > > % chromosome > > + genetic_location, (for strings like 12p13.3 ) > > > > Bio::Variation::Allele > > isa Bio::PrimarySeq > > % marker_sequence > > ->seq > > has additional methods repeat_unit and repeat_count > > to describe the sequence: e.g. (CA)5 > > > > > > Separately, these are the methods that I have in Variation: > > > > Bio::Ensembl::ExternalData::Variation > > ------------------------------------- > > same inheritance as in VariantI > > > > in addition: > > > > start_in_clone_coord > > end_in_clone_coord > > (status) > > alleles (string as opposed to Allele object in VariantI) > > (upStreamSeq) (same as in VariantI) > > (dnStreamSeq) (same as in VariantI) > > > > > > So, it seems to me almost everything can be accomodated within > > VariantI implementing objects. > > > > Do you want to say if marker is defined on DNA or RNA? > > moltype method? > > What additional methods you can think of having? > > > > > > It might be enough just to have a > > Bio::Variation::Marker class (isa Bio::Variation::VariantI) > > add > > + fwd_primer, (moltype not protein) > > + rev_primer, (moltype not protein) > > into Bio::Variation::VariantI > > > > and have method for genetic_location and override status method to > > accept > > any scalar (it is now restricted to values 'suspected'/'proven'). It > > might > > be a good idea to have a separate chromosome method a la GenBank/EMBL? > > > > + chromosom > > + genetic_location > > + status > > > > You could use Allele class and VariantI method to manipulate the > > sequence > > data of you could come up with a simplier implementation or interface. > > > > What do you think? > > > > Yours, > > > > -Heikki > > > > > > > > Jason Stajich wrote: > > > > > > I won't be writing anything substantial until holidays are over, I have > > > just been thinking about this and had some time to play last week as > > > things were slow for me. I guessed you would have some ideas and insight. > > > Let's see if we start coming up with an interface or extensions to > > > VariationI after Jan 1st. > > > > > > Happy holidays. > > > -jason > > > > > > On Sat, 23 Dec 2000, Heikki Lehvaslaiho wrote: > > > > > > > Hi Jason, > > > > > > > > Sorry I have not answered. I am on holiday and Christmas is in a day > > > > or two. > > > > > > > > > > > > Jason Stajich wrote: > > > > > > > > > > I'm trying to write some code that allows me to manipulate marker > > > > > information (SNPs, Microsattelites, STS). Thought it might be a useful > > > > > bioperl object. Right now I want to associate the following data with a > > > > > marker name - fwd_primer, rev_primer, length, genetic_location, > > > > > marker_sequence. I am also querying GDB, genbank, and local databases for > > > > > this and thought it would make sense to create a reusable object. Does > > > > > any/all of this fit into any of the Variation modules? I feel like if > > > > > > > > It fits fine. You could also have a look what I have put into > > > > ensembl-external as a Variation class. That is a gough and dirty class > > > > for holding SNP information. > > > > > > > > I have plans somewhere to extend it .... (I can not find the text I > > > > wrote...have to look with more time in my hands.... ) > > > > > > > > > there isn't one already this should somehow fall into the Variation > > > > > category. I have already written many throw away scripts to manipulate > > > > > the information, but it seems to me that this should be a object. I can > > > > > relate the information to physical sequence via blast and the > > > > > marker_sequence or e-PCR and the primers, but often I might want to > > > > > process the markers for something else. > > > > > > > > > > Bio::Variation::GeneticMarker? A SNP would be a sequence change, but also > > > > > a marker ... I imagine this working on multiple levels - sequence, maps, > > > > > etc. > > > > > > > > I think we should see what could be put into a interface file and what > > > > into an istantiable class. > > > > > > > > Bio::Variation::MarkerI > > > > Bio::Variation::Marker > > > > > > > > Altenatively, Bio::Variation::VariationI is already there and can me > > > > extended. > > > > > > > > I have to go... > > > > Are you going to do write this right now or can we think about this > > > > over the holidays? > > > > > > > > -Heikki > > > > > > > > > Jason Stajich > > > > > jason@chg.mc.duke.edu > > > > > Center for Human Genetics > > > > > Duke University Medical Center > > > > > http://www.chg.duke.edu/ > > > > > > > > -- > > > > ______ _/ _/_____________________________________________________ > > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > > > > Jason Stajich > > > jason@chg.mc.duke.edu > > > Center for Human Genetics > > > Duke University Medical Center > > > http://www.chg.duke.edu/ > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ http://www.ebi.ac.uk/mutations/ > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > ___ _/_/_/_/_/________________________________________________________ > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > ------------------------------------------------------------------------------- Dr Arek Kasprzyk EMBL-European Bioinformatics Institute. Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Tel: +44-(0)1223-494606 Fax: +44-(0)1223-494468 ------------------------------------------------------------------------------- From heikki@ebi.ac.uk Fri Jan 19 14:05:14 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Fri, 19 Jan 2001 14:05:14 +0000 Subject: [Bioperl-l] Re: [Fwd: Re: marker manipulation in bioperl] References: Message-ID: <3A68499A.EA3FF7BC@ebi.ac.uk> Arek Kasprzyk wrote: > > On Fri, 19 Jan 2001, Heikki Lehvaslaiho wrote: > > Hi guys, > I have not been following this discussion very closely but > thought you may find useful to poke around a set of ensembl modules which > called ensembl-map. I think that some of the ideas you are talking > about have been implemented there. The URl is: http://www.ensembl.org/cgi-bin/cvsweb/cvsweb.cgi/ensembl-map/modules/Bio/EnsEMBL/Map/ -Heikki > Arek > > > > -------- Original Message -------- > > Subject: Re: marker manipulation in bioperl > > Date: Thu, 18 Jan 2001 13:06:26 -0500 (EST) > > From: Jason Stajich > > To: Heikki Lehvaslaiho > > CC: Eric Snyder > > > > Heikki - yes I think going via Variation::VariantI is a good way - I > > am > > not as familiar as I'd like to be with the Variation objects, but this > > makes sense and I could imagine actually having ways to handle alleles > > later on which might become useful. > > > > I'd still like to have an interface describe a Marker so we can do > > some > > fun inheritance things later with different types of markers. So I'd > > make > > a MarkerI and it would subclasses VariantI and add the methods > > pcr_fwd, > > pcr_rev (or a more appropriate function name). > > > > Eric [ might want to read below first ] does the OO stuff make sense > > here? > > If we make MarkerI with basic methods pcrprimers, chrom, sequence > > location > > then a concrete implementation of this can be GenericMarker, and > > various > > subclasses - RhMarker, STSMarker, MicrosatteliteMarker or > > GeneticMarker, > > RhMarker, ... depending on how you want to describe them. If they > > have > > specific attributes or methods that are particular to that type of > > marker. > > > > Then on the Maps front, something like a > > LinkageMap could be then build using GeneticMarkers or STSMarkers > > as they implemented a function like get_genetic_location... or > > get_location('cM'); > > > > Am I too far out there in interface land for you? > > > > -jason > > On Thu, 18 Jan 2001, Heikki Lehvaslaiho wrote: > > > > > > Jason, > > > > > > I finally found my notes on upgrading the Ensembl Variation class. > > > The problem there is that the SNP with an ID can have several > > > locations in a genome. At the moment when several locations are needed > > > I simply return several Variation objects with same ID. Not very > > > pretty, but the interface requires me to return SeqFeature objects not > > > something that contains them. > > > > > > So, your needs. You said that you need the following methods: > > > > > > fwd_primer, rev_primer, length, genetic_location, marker_sequence > > > > > > The following lists where they could go (+) are are already in > > > Variation > > > classes(%) : > > > > > > Bio::Variation::VariantI > > > subclassed by DNAMutation, RNAChange, AAChange > > > > > > + fwd_primer, (moltype not protein) > > > + rev_primer, (moltype not protein) > > > % length, > > > % add_DBLink > > > % each_DBLink > > > % status > > > > > > Bio::Variation::SeqDiff (VariantI holder class) > > > % chromosome > > > + genetic_location, (for strings like 12p13.3 ) > > > > > > Bio::Variation::Allele > > > isa Bio::PrimarySeq > > > % marker_sequence > > > ->seq > > > has additional methods repeat_unit and repeat_count > > > to describe the sequence: e.g. (CA)5 > > > > > > > > > Separately, these are the methods that I have in Variation: > > > > > > Bio::Ensembl::ExternalData::Variation > > > ------------------------------------- > > > same inheritance as in VariantI > > > > > > in addition: > > > > > > start_in_clone_coord > > > end_in_clone_coord > > > (status) > > > alleles (string as opposed to Allele object in VariantI) > > > (upStreamSeq) (same as in VariantI) > > > (dnStreamSeq) (same as in VariantI) > > > > > > > > > So, it seems to me almost everything can be accomodated within > > > VariantI implementing objects. > > > > > > Do you want to say if marker is defined on DNA or RNA? > > > moltype method? > > > What additional methods you can think of having? > > > > > > > > > It might be enough just to have a > > > Bio::Variation::Marker class (isa Bio::Variation::VariantI) > > > add > > > + fwd_primer, (moltype not protein) > > > + rev_primer, (moltype not protein) > > > into Bio::Variation::VariantI > > > > > > and have method for genetic_location and override status method to > > > accept > > > any scalar (it is now restricted to values 'suspected'/'proven'). It > > > might > > > be a good idea to have a separate chromosome method a la GenBank/EMBL? > > > > > > + chromosom > > > + genetic_location > > > + status > > > > > > You could use Allele class and VariantI method to manipulate the > > > sequence > > > data of you could come up with a simplier implementation or interface. > > > > > > What do you think? > > > > > > Yours, > > > > > > -Heikki > > > > > > > > > > > > Jason Stajich wrote: > > > > > > > > I won't be writing anything substantial until holidays are over, I have > > > > just been thinking about this and had some time to play last week as > > > > things were slow for me. I guessed you would have some ideas and insight. > > > > Let's see if we start coming up with an interface or extensions to > > > > VariationI after Jan 1st. > > > > > > > > Happy holidays. > > > > -jason > > > > > > > > On Sat, 23 Dec 2000, Heikki Lehvaslaiho wrote: > > > > > > > > > Hi Jason, > > > > > > > > > > Sorry I have not answered. I am on holiday and Christmas is in a day > > > > > or two. > > > > > > > > > > > > > > > Jason Stajich wrote: > > > > > > > > > > > > I'm trying to write some code that allows me to manipulate marker > > > > > > information (SNPs, Microsattelites, STS). Thought it might be a useful > > > > > > bioperl object. Right now I want to associate the following data with a > > > > > > marker name - fwd_primer, rev_primer, length, genetic_location, > > > > > > marker_sequence. I am also querying GDB, genbank, and local databases for > > > > > > this and thought it would make sense to create a reusable object. Does > > > > > > any/all of this fit into any of the Variation modules? I feel like if > > > > > > > > > > It fits fine. You could also have a look what I have put into > > > > > ensembl-external as a Variation class. That is a gough and dirty class > > > > > for holding SNP information. > > > > > > > > > > I have plans somewhere to extend it .... (I can not find the text I > > > > > wrote...have to look with more time in my hands.... ) > > > > > > > > > > > there isn't one already this should somehow fall into the Variation > > > > > > category. I have already written many throw away scripts to manipulate > > > > > > the information, but it seems to me that this should be a object. I can > > > > > > relate the information to physical sequence via blast and the > > > > > > marker_sequence or e-PCR and the primers, but often I might want to > > > > > > process the markers for something else. > > > > > > > > > > > > Bio::Variation::GeneticMarker? A SNP would be a sequence change, but also > > > > > > a marker ... I imagine this working on multiple levels - sequence, maps, > > > > > > etc. > > > > > > > > > > I think we should see what could be put into a interface file and what > > > > > into an istantiable class. > > > > > > > > > > Bio::Variation::MarkerI > > > > > Bio::Variation::Marker > > > > > > > > > > Altenatively, Bio::Variation::VariationI is already there and can me > > > > > extended. > > > > > > > > > > I have to go... > > > > > Are you going to do write this right now or can we think about this > > > > > over the holidays? > > > > > > > > > > -Heikki > > > > > > > > > > > Jason Stajich > > > > > > jason@chg.mc.duke.edu > > > > > > Center for Human Genetics > > > > > > Duke University Medical Center > > > > > > http://www.chg.duke.edu/ > > > > > > > > > > -- > > > > > ______ _/ _/_____________________________________________________ > > > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > > > > > > > Jason Stajich > > > > jason@chg.mc.duke.edu > > > > Center for Human Genetics > > > > Duke University Medical Center > > > > http://www.chg.duke.edu/ > > > > > > -- > > > ______ _/ _/_____________________________________________________ > > > _/ _/ http://www.ebi.ac.uk/mutations/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk > > > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > > > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > > > _/ _/ _/ Cambs. CB10 1SD, United Kingdom > > > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > > > ___ _/_/_/_/_/________________________________________________________ > > > > > > > Jason Stajich > > jason@chg.mc.duke.edu > > Center for Human Genetics > > Duke University Medical Center > > http://www.chg.duke.edu/ > > > > ------------------------------------------------------------------------------- > Dr Arek Kasprzyk > EMBL-European Bioinformatics Institute. > Wellcome Trust Genome Campus, Hinxton, > Cambridge CB10 1SD, UK. > Tel: +44-(0)1223-494606 > Fax: +44-(0)1223-494468 > ------------------------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason@chg.mc.duke.edu Fri Jan 19 15:00:46 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Fri, 19 Jan 2001 10:00:46 -0500 (EST) Subject: [Bioperl-l] Bio::Index::Abstract & bug #860 Message-ID: Looking through this bug - I had 'fixed' it by adding use DB_File; at the top, but now I realize that may not be the best since it still causes an error when -type is specified as 'SDBM_File'. Could just add both in the 'use' but what if DB_File is not present... The code for the method dbm_package assumes that if you specify a package it will have been already 'included'. What to do... Try and require both in the BEGIN block so they are explictly loaded no matter what? Trap errors if DB_file is not present and user asks for it? From Bio::Index::Abstract sub dbm_package { my( $self, $value ) = @_; if ($value) { $self->{'_dbm_package'} = $value; } elsif (! $self->{'_dbm_package'}) { if ($USE_DBM_TYPE) { $self->{'_dbm_package'} = $USE_DBM_TYPE; } else { my( $type ); # DB_File isn't available on all systems eval { require DB_File; DB_File->import("$DB_HASH"); }; if ($@) { require SDBM_File; $type = 'SDBM_File'; } else { $type = 'DB_File'; } $USE_DBM_TYPE = $self->{'_dbm_package'} = $type; } } return $self->{'_dbm_package'}; } Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ ---------- Forwarded message ---------- Date: Fri, 19 Jan 2001 09:55:29 +0000 (GMT) From: K Howe To: Jason Stajich Subject: Re: biperl bug #860 Hi Jason, We use the following command: bpindex.pl -fmt EMBL -dir /nfs/disk92/Pfam/index -type DB_File pfamseq.index where /nfs/disk92/Pfam/index is the intended location of the index file, and pfamseq.index is the name of it. The key thing is that we explicitly give the type as DB_File, and when this happens, it dies (when you don't specify type, and it has to make a guess as to which dbm type to use, it works, but this is not scalalble for us, since the default dmb file in bioperl may change from DB_File in the future). Hope this is enough information. Best, Kevin On Thu, 18 Jan 2001, Jason Stajich wrote: > Kevin - I'm trying to track down a bug you submitted for > Bio::Index::Abstract - I may have fixed it, but I want to be sure. Can > you give me an example of how to invoke bpfetch/bpindex so to throw an > error due to a potentially missing require. > > Thanks. > -Jason From birney@ebi.ac.uk Fri Jan 19 16:51:12 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 19 Jan 2001 16:51:12 +0000 (GMT) Subject: [Bioperl-l] Bio::Index::Abstract & bug #860 In-Reply-To: Message-ID: On Fri, 19 Jan 2001, Jason Stajich wrote: > Looking through this bug - I had 'fixed' it by adding > use DB_File; at the top, but now I realize that may not be the best > since it still causes an error when -type is specified as 'SDBM_File'. > Could just add both in the 'use' but what if DB_File is not present... > > The code for the method dbm_package assumes that if you specify a package > it will have been already 'included'. What to do... Try and require both > in the BEGIN block so they are explictly loaded no matter what? Trap > errors if DB_file is not present and user asks for it? Go for a require run-time load.... check out pSW.pm for an example or the SeqIO.pm for another run-time load. > > >From Bio::Index::Abstract > > sub dbm_package { > my( $self, $value ) = @_; > > if ($value) { > $self->{'_dbm_package'} = $value; > } > elsif (! $self->{'_dbm_package'}) { > if ($USE_DBM_TYPE) { > $self->{'_dbm_package'} = $USE_DBM_TYPE; > } else { > my( $type ); > # DB_File isn't available on all systems > eval { > require DB_File; > DB_File->import("$DB_HASH"); > }; > if ($@) { > require SDBM_File; > $type = 'SDBM_File'; > } else { > $type = 'DB_File'; > } > $USE_DBM_TYPE = $self->{'_dbm_package'} = $type; > } > } > return $self->{'_dbm_package'}; > } > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > ---------- Forwarded message ---------- > Date: Fri, 19 Jan 2001 09:55:29 +0000 (GMT) > From: K Howe > To: Jason Stajich > Subject: Re: biperl bug #860 > > > Hi Jason, > > We use the following command: > > bpindex.pl -fmt EMBL -dir /nfs/disk92/Pfam/index -type DB_File > pfamseq.index > > where /nfs/disk92/Pfam/index is the intended location of the index file, > and pfamseq.index is the name of it. The key thing is that we explicitly > give the type as DB_File, and when this happens, it dies (when you don't > specify type, and it has to make a guess as to which dbm type to use, it > works, but this is not scalalble for us, since the default dmb file in > bioperl may change from DB_File in the future). > > Hope this is enough information. > > Best, > > Kevin > > On Thu, 18 Jan 2001, Jason Stajich wrote: > > > Kevin - I'm trying to track down a bug you submitted for > > Bio::Index::Abstract - I may have fixed it, but I want to be sure. Can > > you give me an example of how to invoke bpfetch/bpindex so to throw an > > error due to a potentially missing require. > > > > Thanks. > > -Jason > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Fri Jan 19 19:13:57 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Fri, 19 Jan 2001 11:13:57 -0800 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: Message-ID: <3A6891F5.A0C2BEBE@gmx.net> Ewan Birney wrote: > > Ok. Hilmar and I are now probably into the "code aesthetics" part of this > debate, which definitely is worth having but someone sometime has to make > a decision. > > I suggest that we keep bashing this out on the list for a couple more days > (please... other people... if you have a view, do chip in). If Hilmar and > I are still disagreeing with aesthetics I would like to nominate Jason to > tie-break on the way to go (is this ok with you Hilmar and Jason...?) > Jason, you're going to play the Supreme Court judge here (no appeals possible) :-) In fact, I'd like to hear more feedback from actual users of these features. It seems that most people are happy if only those special GenBank features no longer get completely lost. However, there are people who do want to do meaningful stuff with the coordinates. One of these is our group in Vienna (yes, we draw features, and yes, that adds to my concern). The other I know of is David with his GUI, which is why I put him on cc. David, any strong or weak feelings about this issue from your perspective? The BioJava project came up, as far as I can recall, with a Location class model separate from the Feature class. I put Matthew and Thomas on the cc to ask for their experience with this model, and what the feedback from the biojava community was so far. > We have two points of contention: > > (a) Explicit Location objects or not. > > Hilmar suggests an explicit location object > > SeqFeatureI has-a LocationI > > LocationI is sub classed for Split (join statements) and Fuzzies > > Benefits - (a) easy to mix and match implementations of locations to > different feature objects, and (b) if mix and matching locations to > features is common, more realisatic. Hilmar argues that is clearer as > well. > > Against - more objects and infact the majority of seqfeatures are little > more than the location, and two extra strings. > > For backwards compatibility, I think SeqFeatureI->start would *have* to be > delegated to SeqFeatureI->location->start - otherwise too much code will > break... (of course, this delegation could just be for a while as we move > code and people over to using "proper" locations) > I agree completely here. I even think $feature->start() can stay there forever. > People might be interested that I originally argued for an explicit > location object about 1 month ago. I don't now... > > I am suggesting that SeqFeatures do not have an explicit location object, > but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting > >from a common SeqFeature interface > > Benefits - (a) less objects (b) only one place where the client gets the > information and (c) more backwardly compatible. > I'd like to note here that 'less objects' is not a benefit by itself, unless loading modules imposes a significant run-time performance hit, which I think we agree it doesn't. Having less objects I think does constitute a benefit if it removes redundant definitions, and makes for a steeper learning curve of the API, that is, if they're easier to use. This is the point I doubt here: I think further inflating SeqFeatureI flattens the learning curve. And I think Location (where) and Feature (what) are not redundant. As for the backward compatibility, I think the only problem here is the exception yes/no issue, isn't it? So, backward compatibility does not argue against decoupling Location/Feature, does it? > Effectively my main argument is that there will always be a pretty clear > cut relationship that "this type of SeqFeature" is always "this class of > location" so the splitting of the location away from the SeqFeature is > just suggesting a mix-and-match world which doesn't actually exist. It does exist. It may not be the most frequent case, but it is a use case for us. And probably for everyone who draws features. > Simpler and stronger to go for the combined interface in my view. > > (b) ->start ->end throwing exceptions or not. > > Hilmar says that for at least Fuzzies and possibly Splits the client > should figure out by rooting around the object how to map these more > complex locations to a simple start,end. The interface should allow > exceptions to be thrown on ->start/->end indicating that the client should > be treating this seqfeature somehow differently... > > Basically we pass the buck to the client. > Right. And I said that's where it belongs. > I say that the implementation objects have to provide a default mapping > of whatever ->start and ->end are. This means that clients can live in > this happy world of "I have well defined start/ends" if they so wish > without writing extra code. Smart clients are encouraged to root around in > the objects for their "real" interpretation of the fuzziness. > > There are three reasons why I favour this: > > (a) Clients for dumping/drawing/manipulation have to treat large > numbers of sequence features as a pretty homogeneous mass. If we make > seqfeatures less homogeneous then every client is going to have to figure > out how to "homogenize" the seqfeatures - this will be different client to > client although for the main case they just want a "default way" of > handling them. We are encouraging a diversity of views when our clients > really want us to solve the problems for them. > This can be solved easily. For FuzzyLocation we implement a default way of computing valid start/end, which can be activated (globally) by client code. (I hear you saying if we do it this way it should be activated by default :-) > (b) as 99% of features are nice, well behaved "hard features" many > pieces of client code written with the bioperl libaries will just assumme > ->start,->end do not throw exceptions. When this piece of code is used by > another user with a fuzzy feature, there will be a rather deep exception > thrown by bioperl through the client code. I think both the user and the > client with some justification will blame bioperl for this, no matter how > much we say "you should have read the documentation and written 3 > different subroutines to replace every time you go > > if( $one->start == $two->start ) > > gets replaced by > > if( &my_exact_function($one,$two) ) { > > } > > ... > > sub my_exact_function { > > # one of many if statements... > > if( $one->isa('Bio::FuzzyFeatureI') && > $two->isa('Bio::SimpleFeatureI') { > ... > > } > > } > This can be accomplished much simpler: if($user_prefs{"fuzzyLocs"} eq "simplifyToWidest") { $loc1 = $feat_one->location(); $range = new Bio::Range(-start => $loc1->min_start(), -end => $loc1->max_end()); $feat_one->location($range); # same for $feat_two follows ... } # carry on as if there were no fuzzy etc features # and you're safe from exceptions > (c) long experience with seqfeatures has made me claim that the > following rules are generally just what people want: > > - simple features - easy > > - join statements - ignore leading and trailing '<' '>' and take the > edge start/end points on the sequence you are looking at > > - fuzzy features - either skip or - if you have to draw/compare them, > take start/end as the min hard location mentioned and the maximum hard > location mentioned, irregardless of the internal grammar. > > I reckon bioperl will be better to implement the (c) method by default > without preventing smart clients from making their own decisions. > Well, I think you can have a full model and still always provide simple implementations satisfying most people's use cases (to be activated by client code, or activated by default, I think that's a matter of taste). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From td2@sanger.ac.uk Fri Jan 19 19:45:44 2001 From: td2@sanger.ac.uk (Thomas Down) Date: Fri, 19 Jan 2001 19:45:44 +0000 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... In-Reply-To: <3A6891F5.A0C2BEBE@gmx.net>; from hlapp@gmx.net on Fri, Jan 19, 2001 at 11:13:57AM -0800 References: <3A6891F5.A0C2BEBE@gmx.net> Message-ID: <20010119194544.F9203@jabba.sanger.ac.uk> On Fri, Jan 19, 2001 at 11:13:57AM -0800, Hilmar Lapp wrote: > > The BioJava project came up, as far as I can recall, with a > Location class model separate from the Feature class. I put > Matthew and Thomas on the cc to ask for their experience with this > model, and what the feedback from the biojava community was so > far. Yes, we have this approach (well, strictly speaking we have a Location interface plus various implementation). It's worked pretty well for us so far -- any type of feature can have any type of location attached to it (point, range, compound), and it's efficient in terms of memory usage. We've also found that the Location objects can be kind-of useful on their own -- I've got all sorts of scripts which use bare Locations for tracking coverage, or even keeping track of available space when working out an optimal GUI layout. I don't know exactly how this experience would translate into your design, though. > > People might be interested that I originally argued for an explicit > > location object about 1 month ago. I don't now... > > > > I am suggesting that SeqFeatures do not have an explicit location object, > > but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting > > >from a common SeqFeature interface The only potential consideration is that this then makes any further polymorphism of SeqFeature quite difficult. We're experimenting with polymorphic features in BioJava -- look at the org.biojava.bio.seq.genomic package for lots of useful sub-interfaces of Feature. If you are thinking of ever going down this route, beware the possible explosion of combinations of feature type and location type. > > Benefits - (a) less objects (b) only one place where the client gets the > > information and (c) more backwardly compatible. > > I'd like to note here that 'less objects' is not a benefit by > itself, unless loading modules imposes a significant run-time > performance hit, which I think we agree it doesn't. Having less > objects I think does constitute a benefit if it removes redundant > definitions, and makes for a steeper learning curve of the API, > that is, if they're easier to use. This is the point I doubt here: > I think further inflating SeqFeatureI flattens the learning curve. > And I think Location (where) and Feature (what) are not redundant. Actually, my understanding is that the per-object overhead in perl is pretty high, especially for objects implemented as hashes. If you ever want to hold millions of SeqFeatures in memory (a not unreasonable requirement, I'd suggest), a few hundred bytes per location might come back with a vengence. Of course, this can probably be mitigated by implementing the locations as C structs. Is this approach currently being used in BioPerl? So I'm going to be inconclusive. I like the seeparate Locations design, but I'd suggest investigating the memory-usage issues before deciding one way or the other. Thomas. From lapp@gnf.org Fri Jan 19 20:53:00 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Fri, 19 Jan 2001 12:53:00 -0800 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: <3A6891F5.A0C2BEBE@gmx.net> <20010119194544.F9203@jabba.sanger.ac.uk> Message-ID: <3A68A92C.74811DA3@gnf.org> Thomas Down wrote: > > Actually, my understanding is that the per-object overhead in > perl is pretty high, especially for objects implemented as > hashes. If you ever want to hold millions of SeqFeatures in > memory (a not unreasonable requirement, I'd suggest), a few > hundred bytes per location might come back with a vengence. Hmm. I guess I can't make a sensible comment on this. Anyone else out there who has experienced a performance drawback imposed by Perl's object handling (well, I know in fact it's not objects Perl handles ...)? If this problem is real, any chances this will be mitigated in upcoming Perl releases (5.6? 6.0?)? In general I hate having to adapt an object model to the limitations of a language ... :( > > Of course, this can probably be mitigated by implementing the > locations as C structs. Is this approach currently being > used in BioPerl? > Well, given the users on Win32 and Mac this is probably not an option for any module that is somewhat part of the core. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mwilkinson@gene.pbi.nrc.ca Fri Jan 19 20:49:30 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Fri, 19 Jan 2001 14:49:30 -0600 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: <3A6891F5.A0C2BEBE@gmx.net> Message-ID: <3A68A85A.CA49BC31@gene.pbi.nrc.ca> Hi all! > However, there are people who do want to do meaningful stuff with > the coordinates. One of these is our group in Vienna (yes, we draw > features, and yes, that adds to my concern). The other I know of > is David with his GUI, which is why I put him on cc. Hey! Don't forget the primary author of the SeqCanvas GUI :-) If it's okay I have $0.02 to contribute too... > I agree completely here. I even think $feature->start() can stay > there forever. > >snip< > And I think Location (where) and Feature (what) are not redundant. This, to me, is the crux of the argument, and I have to side with Hilmar on this. From a biological perspective, location and feature are absolutely *not* redundant. We are arguing about how to represent something computationally that has not been universally agreed upon even by the geneticists/MolBiologists themselves: What is a gene? I personally think that Hilmars view is more "biologically correct" (tm), that a gene, or more generally a feature, is best described as it was described to me as a first year undergraduate many years ago, "a functional unit of DNA". These "functional units" may be overlapping, even extensively, but if they do not have *exactly* the same function then they should probably be considered entirely different features, rather than a single feature with multiple compositions... (I hope I am not over-interpreting your views, Hilmar...). This single-feature-multiple-function is an absolute nightmare for annotators!! So, in my world view, $Feature->start should only be ambiguous if that *unique functional unit* has a bona fide ambiguous start. In such a case, I would then side with Ewan in his proposal that there should, nevertheless, be a default $Feature->start value for these fuzzy features (NO EXCEPTION THROWING!!), but that they are somehow "flagged" such that smarter clients will be able to easily query these features for their fuzziness and display this fuzziness if they have the ability (interestingly, we just initiated a research project with several CompSci students to investigate how to best visualize exactly these kinds of "fuzzy" or ambiguous situations!!). This was not my primary consideration when I was writing SeqCanvas, but I have already noticed that this module, as it stands, is nowhere near sufficient to represent "reality", and will need to be thought-out from scratch over the next few months as our group trips over these kinds of problems more and more often. (Stay tuned! I intend to re-focus my energies on this code as soon as other more pressing issues are out of the way!) So, w.r.t. SeqCanvas & other GUI's which exist already, I would hope that these are not an issue in this debate! My personal opinion is that BioPerl should make the capturing of biological reality its primary concern and, within reason, leave the problem of parsing and displaying this data to the client; "it's an S.E.P.". If it is generally agreed upon by the community that $Feature->start is no longer an adequate representation of "reality", then it should be dumped, regardless of what parsers may already exist. $Feature->start is not the holy grail, the biological data is. (Personally, I can't imagine a scenario where $Feature->start would no longer be useful... but you probably understand what I am getting at...) > It does exist. It may not be the most frequent case, but it is a > use case for us. And probably for everyone who draws features. indeed, it does exist! And it looks like it will only get worse as we learn more... Anyway, for what it's worth, that's my two bits :-) Cheers all! M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From birney@ebi.ac.uk Fri Jan 19 21:10:16 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 19 Jan 2001 21:10:16 +0000 (GMT) Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... In-Reply-To: <3A68A92C.74811DA3@gnf.org> Message-ID: On Fri, 19 Jan 2001, Hilmar Lapp wrote: Hilmar. I agree with all your statements basically, but I'm still sticking to my claims. I think we need to hear more feedback - I'd be really interested in david's call on the defaultness of ->start not throwing an exception. We then might need to call in the supreme court here... > Thomas Down wrote: > > > > Actually, my understanding is that the per-object overhead in > > perl is pretty high, especially for objects implemented as > > hashes. If you ever want to hold millions of SeqFeatures in > > memory (a not unreasonable requirement, I'd suggest), a few > > hundred bytes per location might come back with a vengence. > > Hmm. I guess I can't make a sensible comment on this. Anyone else out > there who has experienced a performance drawback imposed by Perl's > object handling (well, I know in fact it's not objects Perl handles > ...)? Oh yes ;) Ensembl can trivially generate > 10,000 features in a modest sized query. To get this to happen in any sensible way we have a packed C struct. I would be against inisiting on two objects have to be present. Of course having $seqfeature->location return $self for these cases could really solve it. Therefore this is not a show-stopper for Ensembl, but be aware that if we made this the default for Bioperl we would be doubling our memory for feature-heavy queries, and I suspect suffering for it. > > If this problem is real, any chances this will be mitigated in > upcoming Perl releases (5.6? 6.0?)? In general I hate having to adapt > an object model to the limitations of a language ... :( > > > > > Of course, this can probably be mitigated by implementing the > > locations as C structs. Is this approach currently being > > used in BioPerl? > > > > Well, given the users on Win32 and Mac this is probably not an option > for any module that is somewhat part of the core. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From mdalphin@amgen.com Fri Jan 19 22:27:19 2001 From: mdalphin@amgen.com (Mark Dalphin) Date: Fri, 19 Jan 2001 14:27:19 -0800 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: Message-ID: <3A68BF46.10887F6F@amgen.com> Ewan Birney wrote: > On Fri, 19 Jan 2001, Hilmar Lapp wrote: > > Hilmar. I agree with all your statements basically, but I'm still sticking > to my claims. I think we need to hear more feedback - I'd be really > interested in david's call on the defaultness of ->start not throwing an > exception. > I would rather NOT have an exception thrown. It is too hard to recover from that. In fact, for much code, either the maximum, minimum or average really doesn't matter (for example, on many displays, the same set of pixels will be lit up) and the client program will need to make that choice in anycase. As a "compromise", I wonder about having ->start() return 'a value' in a SCALAR context, basically continuing what it now does, while in an ARRAY context, it might return something like: ($start, $fuzzy_type, $fuzzy_value) = $obj->start(). This way, the client can obtain the data, if desired, or ignore it. I don't know how to defined "fuzzy_type", but it would be something like { '<' | '>' | '.' } representing the NCBI types '<', '>' and '.' (ie extends-5-prime, extends-3-prime and between-two-values). Then $start would be either the lowest, highest or average value and $fuzzy_value would represent the distance around it. (BTW, I tried using the average +- $fuzzy_value and found the code messy. It seems to be tidier to use: $start is the most extreme point (5' or 3') and $fuzzy value brings you inwards). Finally, the code could be written to make a SCALAR $start return one of the above (min, max, avg) OR throw and exception as a parameter. But I don't think it is worth the trouble. Just my $0.02. Mark -- Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work) From mrp@sanger.ac.uk Mon Jan 22 13:07:52 2001 From: mrp@sanger.ac.uk (Matthew Pocock) Date: Mon, 22 Jan 2001 13:07:52 +0000 Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... References: <3A6891F5.A0C2BEBE@gmx.net> Message-ID: <3A6C30A7.842C63AD@sanger.ac.uk> Hi. Just thought I'd have a short inane ramble. Please ignore everything that you don't agree with. I'm realy looking at this more as a user of the libraries rather than an implementer, so things may look different your side of the fence. If you intend to end up with multiple feature implementations and multiple types of locations (point, range, fuzzy etc.) then you should definitely consider composition - Location interface, Feature interface hasA Location. Please don't do things like having FuzzyFeature extends Feature, FuzzyLocation - if Feature must extend Location, then it should be the stupidest extention possible - otherwise people will get realy confused realy quickly. We make a lot of stuff very easy by defining that every Location has min & max that are the lowest and highest index that are within the location. If Feature must extend Location, then it's min & max should delegate off to min & max in it's location delegate. These methods should never throw exceptions. If you go for the composition/delegation aproach, then it feels wrong to me that Feature extends Location - but there is no reason why the current implementations of Feature shouldn't implement it, or the Feature interface may choose to define min/max (or do you use start/end?) so that the legacy code runs. If you go for Location & Feature, the hierachy of features should represent the semantic knowledge about what you are annotating, and the (potential) location hieracy hanging off a feature should be shallow - just pertain to that feature only. Locations are stupid math objects. For example, if you have a gene feature, it's location should span the entire gene area, where as the feature may only contain child exon features that span part of that region. Otherwise, you end up with two hierachies that look nearly exactly the same as each other & life gets confusing. It works well for us putting strand info in features and leaving locations a-directional. Strand stuff requires semantic knowledge (you need context), and that belongs in features - they represent the biological information. Horible EMBL locations that reference other sequences could be handled with complicated sequence/featre/location implementations/interfaces - or - you could just build an assembly of the two entries and project the feature into assembly-space to get out something that you can represent cleanly. I don't know how well bioperl does assemblies... Anyway, that's it. These are the kind of details that give me the Hammer Hooror tingley spine every time I think about them. Eugh. Embl locations suck. Matthew From dblock@gene.pbi.nrc.ca Mon Jan 22 18:20:16 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Mon, 22 Jan 2001 12:20:16 -0600 (CST) Subject: [Bioperl-l] Hilmar and Ewan debate SeqFeatures some more... In-Reply-To: <3A6C30A7.842C63AD@sanger.ac.uk> Message-ID: Hello everyone! Just back from Calgary, doing final bits of paperwork to prepare for my defense. After Feb 20, my mind will be a lot clearer! Okay, I just read through everybody's arguments, and since you want my opinion, I'll give it to you. Our pathway to enlightenment here has been that we started with simple cases, then met complex cases and had to tear everything down multiple times to accomodate complexity. So it looks like BioPerl is doing that now with fuzzy locations (which have been tossed around the list for longer than I've been on it). We should bite the bullet and build for posterity. Extensibility is a major priority in this situation, and for that reason, Hilmar wins my vote :) Backwards compatibility- I would like it very much if for simple cases, a simple location object was by default created. A complex location object should only be created when complex location input is given. Then the familiar start, end notation would refer to the default simple location object. I like the idea of some sort of global environment-type variable that would set the policy for fuzzy instances. A well-documented default would be fine here as well. What Workbench would do would be to use the default behaviour (widest, probably) for fuzzy locations, and then when details were requested, would show that fuzziness at the base-pair level. So it would be great if start, end returned hard locations according to some policy that could be defined (at object creation?), and details would be returned only when requested. In that case, could location be an optional object, only created when needed? So start, end would return numbers, either hard numbers given to them at creation, or numbers computed by a location object. A different call ($feature->detailedstart or something) would call $feature->start if there was no more info on the location, and would call the location object otherwise. This could then return whatever array or hash we decide on. That would take care of the memory concerns (we create a lot of objects with Workbench as well), since in most cases, the start/end pair would be all that was stored. The complexities could be handled whenever the client desired complexity. Would it be necessary to flag objects that have detailed location information? Well, that's a simple check for the presence of a LocationI object attached to the SeqFeature object. Okay, there's my opinion. Let me know what you think. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From birney@ebi.ac.uk Mon Jan 22 20:13:06 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Mon, 22 Jan 2001 20:13:06 +0000 (GMT) Subject: [Bioperl-l] conceeding to has-a location Message-ID: Ok. It looks like I have to conceed the has-a location, as long as I am allowed to return $self for C extensions for ensembl ;) I think I have "won" on the no exception throwing (???) Jason/Hilmar - what do you think? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Mon Jan 22 21:07:20 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 22 Jan 2001 16:07:20 -0500 (EST) Subject: [Bioperl-l] conceeding to has-a location In-Reply-To: Message-ID: I think the discussion generated a number of good points. I was ambivalent about separating the Location initially, but I can definitely see a advantages to has-a location now as well. So I agree that this model seems the best. I don't know if Object creation penalties will come back to haunt us, but this model seems the most biologically applicable. As for exception throwing, in the simple case no exceptions thrown ie everything the bioperl currently supports. If we want to later on define a structure for delegating start/end calculation (DetermineStartEndFromFuzzyLocationAdaptor) then maybe we can do that and exceptions could be thrown by that model. However, in the current model I'd like to rely on start/end to be callable even if it is delegating to the Location object and thus no exceptions at this time. Are we going to end up ripping this out and rewriting again? I will update the wiki text to reflect these agreements and we can see where we stand. I'm hoping we can have a reasonable agreement by the Thurs so the coding can begin. -Jason On Mon, 22 Jan 2001, Ewan Birney wrote: > > Ok. It looks like I have to conceed the has-a location, as long as I am > allowed to return $self for C extensions for ensembl ;) > > > I think I have "won" on the no exception throwing (???) > > > Jason/Hilmar - what do you think? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Tue Jan 23 09:27:47 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 23 Jan 2001 01:27:47 -0800 Subject: [Bioperl-l] Feature/Location References: Message-ID: <3A6D4E93.DC069A8A@gmx.net> Ewan Birney wrote: > > Ok. It looks like I have to conceed the has-a location, as long as I am > allowed to return $self for C extensions for ensembl ;) > > I think I have "won" on the no exception throwing (???) > I think it's BioPerl that won -- by all the feedback we got. We can have more confidence now that it makes some sense what we code. Thanks to everyone, and sorry for forgetting you, Mark, I'm glad you stepped in without being asked. I see that exception throwing in ->start()/end() is not the best idea for many applications. In a sense the situation may be similar to SeqIO, where we now have client-controllable severity level of putative format violations (which in fact mostly are BioPerl incapabilities). So, we can design the start/end implementation along a client-controllable policy, with a relaxed default. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 23 15:41:52 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 23 Jan 2001 10:41:52 -0500 (EST) Subject: [Bioperl-l] Feature/Location In-Reply-To: <3A6D4E93.DC069A8A@gmx.net> Message-ID: On Tue, 23 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > Ok. It looks like I have to conceed the has-a location, as long as I am > > allowed to return $self for C extensions for ensembl ;) > > > > I think I have "won" on the no exception throwing (???) > > > > I think it's BioPerl that won -- by all the feedback we got. We > can have more confidence now that it makes some sense what we > code. Thanks to everyone, and sorry for forgetting you, Mark, I'm > glad you stepped in without being asked. > > I see that exception throwing in ->start()/end() is not the best > idea for many applications. In a sense the situation may be > similar to SeqIO, where we now have client-controllable severity > level of putative format violations (which in fact mostly are > BioPerl incapabilities). So, we can design the start/end > implementation along a client-controllable policy, with a relaxed > default. I'll see how that shakes out as we start to look at implementation. Also - should our locations go into a new directory? Interfaces - Bio::Location::LocationI Bio::Location::SplitLocationI Bio::Location::FuzzyLocationI Implementations - Bio::Location::SimpleLocation Bio::Location::SplitLocation Bio::Location::FuzzyLocation I updated the wiki - please feel free to make corrections, clarifications, or to elaborated the interfaces. SplitLocationI will have a method sub_Locations which returns the list of LocationI objects that represent the sub locations of the, well, location. In code terms - # get a $geneobj somehow my $location = $geneobj->location; if( $location->isa('Bio::Location::SplitLocationI') ) { foreach my $exon ( $location->sub_locations() ){ print "exon at ", $exon->start, "..", $exon->end, "\n"; } } One problem with this approach - what if I want to actually have the real Exon object.... Must I instead iterate through what is returned by sub_Features? Does the SeqFeature::GeneStructureI instead handle all of this and I should instead call $geneobj->exons() not touching the Location objects (makes most sense to me). -jason > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From dblock@gene.pbi.nrc.ca Tue Jan 23 15:56:08 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 23 Jan 2001 09:56:08 -0600 (CST) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: > > I updated the wiki - please feel free to make corrections, clarifications, > or to elaborated the interfaces. SplitLocationI will have a method > sub_Locations which returns the list of LocationI objects that represent > the sub locations of the, well, location. In code terms - > > # get a $geneobj somehow > my $location = $geneobj->location; > if( $location->isa('Bio::Location::SplitLocationI') ) { > foreach my $exon ( $location->sub_locations() ){ > print "exon at ", $exon->start, "..", $exon->end, "\n"; > } > } > > One problem with this approach - what if I want to actually have the real > Exon object.... Must I instead iterate through what is returned > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > of this and I should instead call $geneobj->exons() not touching the > Location objects (makes most sense to me). > > -jason > That would be good. Then you could call that exon's location method to get the location object of the exon. So you have two routes to the start/end pair. That sounds good to me. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From birney@ebi.ac.uk Tue Jan 23 16:45:34 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 23 Jan 2001 16:45:34 +0000 (GMT) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: On Tue, 23 Jan 2001, David Block wrote: > > > > I updated the wiki - please feel free to make corrections, clarifications, > > or to elaborated the interfaces. SplitLocationI will have a method > > sub_Locations which returns the list of LocationI objects that represent > > the sub locations of the, well, location. In code terms - > > > > # get a $geneobj somehow > > my $location = $geneobj->location; > > if( $location->isa('Bio::Location::SplitLocationI') ) { > > foreach my $exon ( $location->sub_locations() ){ > > print "exon at ", $exon->start, "..", $exon->end, "\n"; > > } > > } > > > > One problem with this approach - what if I want to actually have the real > > Exon object.... Must I instead iterate through what is returned > > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > > of this and I should instead call $geneobj->exons() not touching the > > Location objects (makes most sense to me). > > > > -jason > > > > That would be good. Then you could call that exon's location method to > get the location object of the exon. So you have two routes to the > start/end pair. That sounds good to me. <> I think we are giving ourselves *alot of rope* to hang ourselves here and we will end up with different conventions about how to descend these objects... But... I guess I should roll with the has-a decision. So... my view here would be that in "stupid" implementations location->sub_locations() give separate location objects, but in "smart" implementations (perhaps bioperl's gene/transcript object?) is gives the same location object as the exon, therefore guarenteeing that whichever route you take to an exon's location, you get the same thing... ie... this is up to the implmentation, and the generic implementation has to be "stupid" I guess.... > > -- > David Block > dblock@gene.pbi.nrc.ca > http://bioinfo.pbi.nrc.ca/dblock/wiki > Plant Biotechnology Institute > National Research Council of Canada > Saskatoon, Saskatchewan > > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 23 18:00:37 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 23 Jan 2001 12:00:37 -0600 (CST) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: On Tue, 23 Jan 2001, Ewan Birney wrote: > On Tue, 23 Jan 2001, David Block wrote: > > > > > > > I updated the wiki - please feel free to make corrections, clarifications, > > > or to elaborated the interfaces. SplitLocationI will have a method > > > sub_Locations which returns the list of LocationI objects that represent > > > the sub locations of the, well, location. In code terms - > > > > > > # get a $geneobj somehow > > > my $location = $geneobj->location; > > > if( $location->isa('Bio::Location::SplitLocationI') ) { > > > foreach my $exon ( $location->sub_locations() ){ > > > print "exon at ", $exon->start, "..", $exon->end, "\n"; > > > } > > > } > > > > > > One problem with this approach - what if I want to actually have the real > > > Exon object.... Must I instead iterate through what is returned > > > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > > > of this and I should instead call $geneobj->exons() not touching the > > > Location objects (makes most sense to me). > > > > > > -jason > > > Okay, for clarity, this only is relevant when there is a SplitLocationI situation, correct? So the implementation of SplitLocationI was going to be an array of simple LocationI's? If not, then what I'm talking about is irrelevant. > > > > That would be good. Then you could call that exon's location method to > > get the location object of the exon. So you have two routes to the > > start/end pair. That sounds good to me. > > <> > > I think we are giving ourselves *alot of rope* to hang ourselves here and > we will end up with different conventions about how to descend these > objects... Different conventions for different situations? What I was talking about was the two different situations: 1) gene drawing, I want to know all the locations that are 'gene' so I can draw them somehow -> sub_locations gives me a list of simple locations that I can iterate through. I don't care about the nature of the exons I am drawing, just that they belong to a gene. 2) exon interrogation, I want to examine each exon individually. Now I want the gene/transcript's exons method to give me each exon. Each of those also has a location. The exon's location method links to the location object that is linked to by the sub_location call, so there is no duplication of data. And if any of these exon locations are fuzzy or split, etc., the location object gives us that. > > But... I guess I should roll with the has-a decision. Yes, you should (hee, hee, we win). So... my view here > would be that in "stupid" implementations location->sub_locations() give > separate location objects, but in "smart" implementations (perhaps > bioperl's gene/transcript object?) is gives the same location object as > the exon, therefore guarenteeing that whichever route you take to an > exon's location, you get the same thing... I think that's what I was thinking too, isn't it? > > > ie... this is up to the implmentation, and the generic implementation has > to be "stupid" I guess.... No comment. > > > > > > > > > > -- > > David Block > > dblock@gene.pbi.nrc.ca > > http://bioinfo.pbi.nrc.ca/dblock/wiki > > Plant Biotechnology Institute > > National Research Council of Canada > > Saskatoon, Saskatchewan > > > > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From jason@chg.mc.duke.edu Tue Jan 23 18:32:21 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 23 Jan 2001 13:32:21 -0500 (EST) Subject: [Bioperl-l] Feature/Location In-Reply-To: Message-ID: On Tue, 23 Jan 2001, David Block wrote: > On Tue, 23 Jan 2001, Ewan Birney wrote: > > > On Tue, 23 Jan 2001, David Block wrote: > > > > > > > > > > I updated the wiki - please feel free to make corrections, clarifications, > > > > or to elaborated the interfaces. SplitLocationI will have a method > > > > sub_Locations which returns the list of LocationI objects that represent > > > > the sub locations of the, well, location. In code terms - > > > > > > > > # get a $geneobj somehow > > > > my $location = $geneobj->location; > > > > if( $location->isa('Bio::Location::SplitLocationI') ) { > > > > foreach my $exon ( $location->sub_locations() ){ > > > > print "exon at ", $exon->start, "..", $exon->end, "\n"; > > > > } > > > > } > > > > > > > > One problem with this approach - what if I want to actually have the real > > > > Exon object.... Must I instead iterate through what is returned > > > > by sub_Features? Does the SeqFeature::GeneStructureI instead handle all > > > > of this and I should instead call $geneobj->exons() not touching the > > > > Location objects (makes most sense to me). > > > > > > > > -jason > > > > > > Okay, for clarity, this only is relevant when there is a SplitLocationI > situation, correct? So the implementation of SplitLocationI was going to > be an array of simple LocationI's? If not, then what I'm talking about is > irrelevant. No you're right, I imagine it will be a list of LocationI objects at some point. sub_Locations will be a SplitLocationI method. > > > > > > > That would be good. Then you could call that exon's location method to > > > get the location object of the exon. So you have two routes to the > > > start/end pair. That sounds good to me. > > > > <> > > > > I think we are giving ourselves *alot of rope* to hang ourselves here and > > we will end up with different conventions about how to descend these > > objects... I agree, I think this is why you and I were leaning towards collapsing Location into SeqFeature, but I also agree with many of arguments for splitting the 2. > > > Different conventions for different situations? What I was talking about > was the two different situations: > 1) gene drawing, I want to know all the locations that are 'gene' so I can > draw them somehow -> sub_locations gives me a list of simple locations > that I can iterate through. I don't care about the nature of the exons I > am drawing, just that they belong to a gene. > > 2) exon interrogation, I want to examine each exon individually. Now I > want the gene/transcript's exons method to give me each exon. Each of > those also has a location. The exon's location method links to the > location object that is linked to by the sub_location call, so there is no > duplication of data. So we use the sub_SeqFeature method on a SeqFeatureI to get the list of sub-features for a feature (since exons should be sub features of gene). In specialized objects like Gene we could call exons() to get these objects. > And if any of these exon locations are fuzzy or split, etc., the location > object gives us that. > Without getting lost in example land - here is one question of how to instantiate these things: Imagine the case of parsing a GenBank/EMBL file with annotated genes on a genomic sequence via the bioperl SeqIO system. We get to a SplitLocation. How should we represent the object? If primary_tag == 'CDS' do we instantiate a GeneStructure object? Otherwise we will instantiate all features with SeqFeature::Generic, some will have LocationI locations, some will have SplitLocationI locations. Assuming we sufficiently capture all of the information encoded about the Feature a user could write code to transform collections of CDS, source, exon, etc.. primary tags retrieved from a GenBank/EMBL parse into a GeneStructure object. I am going to guess that at some point we'd like to write an object that handles this gene instantiation in the common case or at least gives good examples on how to do it. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Tue Jan 23 22:08:20 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 23 Jan 2001 17:08:20 -0500 (EST) Subject: [Bioperl-l] Location naming semantics Message-ID: Anyone with a problem with these names? If so, please shout now. Interfaces Bio::LocationI Bio::Location::SplitLocationI Bio::Location::FuzzyLocationI Implementations Bio::Location::SimpleLocation Bio::Location::SplitLocation Bio::Location::FuzzyLocation Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Wed Jan 24 09:55:01 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 24 Jan 2001 01:55:01 -0800 Subject: [Bioperl-l] Re: [Bioperl-guts-l] RestrictionEnzyme.pm References: <5.0.2.1.2.20010124095100.00a81978@mailhost.curie.fr> Message-ID: <3A6EA675.ABBE91FF@gmx.net> Paul-Christophe Varoutas wrote: > > Hi again, > > Yesterday night I started experimenting with RestrictionEnzyme.pm. > > I liked very much the '-MAKE' =>'custom' switch in the constructor but I > think it would nevertheless be a good idea to write a public method which > updates the enzyme list from the NEBASE site. > > I suggest to write a sub (lets call it update_list or update_RE_list) that: > > - goes to the NEBASE site and gets the last version of the restriction > enzyme list. We can choose between http/ftp and various types of > lists/formats. My preference would be to go to their ftp site and get what > they call "format 18": DNAStrider format, list of all commercially > available enzymes. The file is ftp://ftp.nebase.com/pub/nebase/striderc.*, > the extension of the file reflects the version). > - saves this list in a text file, in the Bio/Tools/ directory. An > alternative is to update the enzyme list in the RestrictionEnzyme.pm file > itself, at the beginning of the file, within the definition of the %RE > hash, but intuitively I would not tend to recommend it, as I don't know if > writting in a file at the same time it is being read by the perl > interpreter will behave well in all operating systems. Tell me what you > think about it. You normally can't write to Bio/Tools as a user (under Unix), and a user client shouldn't attempt to do so under any circumstances. Regarding the ability to update the list of known REs, I see the following options. 1) Accept an additional (named!) parameter at initialization that denotes a file (in DNAStrider format?) containing the enzymes to be known in addition to a collection of hard-coded enzymes. 2) Same as before, but the parameter denotes a URL from where to obtain this file. 3) Put all hard-coded enzymes into a file that resides at a known place within the Bio/ directory tree, and read (parse) that upon initialization of RestrictionEnzyme.pm. An update would mean updating that file. I'm not sure option 3) would have compelling advantages to the present layout. Options 1) and 2) are certainly worthwhile to pursue and in essence are almost identical, the only difference being how to open the stream containing the enzyme data. So, one could try to combine both into one parameter, and have the code figure out whether it's a file or a http/ftp URL. Hilmar Do you already have a CVS write account? > - if the enzyme list is saved in a separate file, I will also modify the > initialisation of the %RE hash, with code that reads and parses the enzyme > list file. > > If this sounds OK to you, I will write it this weekend and submit it. Of > course if you had something completely different in mind please say it, I > will try to adapt to it. > > Paul-Christophe > -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 16:18:56 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 11:18:56 -0500 (EST) Subject: [Bioperl-l] Location committed Message-ID: cvs update -d in order to get the new directory. Location objects have been created and extracted from the clutches of SeqFeatureI. A LocationI object does support strandness because that is part of RangeI. A SeqFeatureI is still a RangeI for the practical purpose of backwards compatibility and simplicity, but this actually delegates things like start/end/strand to the LocationI object contained by the SeqFeatureI. If you want to debate any parts of this object model, start now because the code is still in the early stages. We currently have a Bio::Location::Simple to handle the current bioperl location behavior. Next step is to write implementations of the SplitLocationI and attach to the SeqIO parsing. I'll be adding more module documentation soon, but wanted to get these interfaces and simple implementation out there first so that others can help find problems AND possibly help write objects.... -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 16:54:57 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 16:54:57 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 Message-ID: Just to warn people, Test.pm does not ship with 5.004, patch level 4, which is a relatively common perl version installed. It can be installed, but I suspect moans and compliants about this to some extent. I will fix up the makefile to Barf more intelligently if it can't find test.pm... (Jason - I can't test your new objects at the moment due to the above problem... I'll try to do it from my laptop and I have asked Sanger systems to install test.pm...) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 17:22:52 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 12:22:52 -0500 (EST) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: I was hoping no one would notice... ;) I found this out last week then I tried to install Test.pm on a 5.00404 system and it won't install requires at least 5.00504. This means our test suite doesn't work under 5.00404. That is not good... grrrr... not wanting to back port all the Test.pm dependacies... Do we have to roll our own replacement and have that be included? On Wed, 24 Jan 2001, Ewan Birney wrote: > > Just to warn people, Test.pm does not ship with 5.004, patch level 4, > which is a relatively common perl version installed. It can be installed, > but I suspect moans and compliants about this to some extent. > > I will fix up the makefile to Barf more intelligently if it can't find > test.pm... > > > > (Jason - I can't test your new objects at the moment due to the above > problem... I'll try to do it from my laptop and I have asked Sanger > systems to install test.pm...) > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 17:26:39 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 17:26:39 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > I was hoping no one would notice... ;) > > I found this out last week then I tried to install Test.pm on a 5.00404 > system and it won't install requires at least 5.00504. This means our > test suite doesn't work under 5.00404. That is not good... > > grrrr... not wanting to back port all the Test.pm dependacies... Do we > have to roll our own replacement and have that be included? > this is a potential SHOW STOPPER. time to think about this... We have to either (a) jettison 5.00404 compatibility OR (b) back-port test suite (sorry jason) OR (c) roll own replacement I don't like any of these. Comments? (PS - I am not a big, huge, boy this is really making my life easier fan of Test.pm --- what is wrong with print "ok 2\n";. I don't need a module for this!) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 24 18:00:46 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 18:00:46 +0000 (GMT) Subject: [Bioperl-l] BOSC 2001 [Bioinformatics Open Source Conference] Message-ID: [PLEASE do not reply to this mail as it is cross-posted to many lists. PLEASE reply to bosc@bubbles.sonsorol.org. I am assumming that people KNOW HOW TO DRIVE THEIR MAIL CLIENTS. Think before hitting reply! This is an experiment to see how smart the general bioinformatics hacker is] We will be attempting to run another Bioinformatics Open Source Conference just before ISMB 2001. We have recieved information that this is likely to be able to occur and will possibly have extensive computer support, therefore allowing development to occur as well as talks. At the moment we are gathering our thoughts and generally mapping out the form of the conference. We would like input from the wider open source bioinformatics community for ideas about the conference. The practical aims of this is to (a) come up with a format for the day(s) (b) appoint a committee to run the conference. It is likely that myself and Chris Dadigidan will be the core of the committee as we've done this before and we know what is going on. (Frankly if someone wants to take over my cheer-leading role, you are more than welcome! Endless patience and good email-discipline is a must...) I would suggest the following committee membership: Each of the major groups nominate one person on the committee. I would suggest: bioperl (possibly me or chris), biojava, biopython, emboss, acedb, ensembl (possibly me) each has one person assigned to be on the committee. Then I would like to see if we can reach out into the smaller projects, including ones I haven't listed here, such as the nascent bioLISPers, I believe there is an open source Bio PathWays group, the Apollo/Gadfly people might want to make sure they are represented. (biocorba and bioxml - you are smaller projects at the moment) Ideally one or two people can come from the smaller projects. Total committee should be 8 or less. [PS - if you know of people who "have a project" but they are in the primordial soup stage of the project, please forward this mail onto them] Comments should be addressed to bosc@bubbles.sonsorol.org - like I said, I expect the major projects to assign their own representitive or say they are not interested. Ewan Birney ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 24 18:40:53 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 24 Jan 2001 10:40:53 -0800 Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 References: Message-ID: <3A6F21B5.BAD76917@gmx.net> Ewan Birney wrote: > > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > I was hoping no one would notice... ;) > > > > I found this out last week then I tried to install Test.pm on a 5.00404 > > system and it won't install requires at least 5.00504. This means our > > test suite doesn't work under 5.00404. That is not good... > > > > grrrr... not wanting to back port all the Test.pm dependacies... Do we > > have to roll our own replacement and have that be included? > > > > > > this is a potential SHOW STOPPER. time to think about this... > > We have to either > > (a) jettison 5.00404 compatibility OR > > (b) back-port test suite (sorry jason) OR > > (c) roll own replacement > > I don't like any of these. Comments? > > (PS - I am not a big, huge, boy this is really making my life easier fan > of Test.pm --- what is wrong with print "ok 2\n";. I don't need a module > for this!) > Well, I did find the test script code more concise after migrating to Test.pm. However, we don't use much of its functionality yet, only rather basic things. Would it be that hard to roll our own Test.pm version that offers just the basic things we're currently using, maybe even by porting the original? Would make the switch to the system module easy, once we drop 5.004 compatibility (we won't keep that eternally, will we?)? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 18:49:45 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 13:49:45 -0500 Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 References: Message-ID: <008f01c08636$6bba2090$61eb0398@mc.duke.edu> ----- Original Message ----- From: "Ewan Birney" To: "Jason Stajich" Cc: Sent: Wednesday, January 24, 2001 12:26 PM Subject: Re: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > I was hoping no one would notice... ;) > > > > I found this out last week then I tried to install Test.pm on a 5.00404 > > system and it won't install requires at least 5.00504. This means our > > test suite doesn't work under 5.00404. That is not good... > > > > grrrr... not wanting to back port all the Test.pm dependacies... Do we > > have to roll our own replacement and have that be included? > > > > > > this is a potential SHOW STOPPER. time to think about this... > > We have to either > > (a) jettison 5.00404 compatibility OR > > (b) back-port test suite (sorry jason) OR > > (c) roll own replacement > I can back port if necessary, I'd rather us have roll our own t/Test.pm that duplicates Test.pm functionality that we use though. I do rather like no having to keep track of which test number I am at so adding a test to the middle of the pack doesn't involve upping the ones that follow by 1. > I don't like any of these. Comments? > > > (PS - I am not a big, huge, boy this is really making my life easier fan > of Test.pm --- what is wrong with print "ok 2\n";. I don't need a module > for this!) > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > From agoldman@bnl.gov Wed Jan 24 19:31:39 2001 From: agoldman@bnl.gov (Adrian Goldman) Date: Wed, 24 Jan 2001 14:31:39 -0500 Subject: [Bioperl-l] Re: Restriction Enzyme methods... In-Reply-To: <200101241705.f0OH5Dp16304@pw600a.bioperl.org> References: <200101241705.f0OH5Dp16304@pw600a.bioperl.org> Message-ID: Hi, As a _user_ of the RestrictionEnzyme module, I think it is worth noting that -- as far as I can divine from the 0.6 code and my (admitedly limited) knowledge of Perl -- neither the external interface (make=custom) nor the internals support offset cutters correctly. (The documentation also implies the same.) Obviously this would have to be fixed before anything useful could be done with updating from rebase. The problem, as I understand it, is that the module supports essentially only one index into the restriction site; it recognises offset recognition elements correctly (requiring use of the reverse complement) but it can't "cut the DNA" correctly. As it turns out, it is enough for my application that the recognition is correct, as I am looking for the absence of sites -- but I hardly think that that is usual. If I'm wrong about the above, I'd _love_ to be told how to specify an offset cutter correctly through the make=custom switch!... Adrian Goldman >Paul-Christophe Varoutas wrote: > > > > Hi again, > > > > Yesterday night I started experimenting with RestrictionEnzyme.pm. > > > > I liked very much the '-MAKE' =>'custom' switch in the constructor but I > > think it would nevertheless be a good idea to write a public method which > > updates the enzyme list from the NEBASE site. > > > > I suggest to write a sub (lets call it update_list or update_RE_list) that: > > > > - goes to the NEBASE site and gets the last version of the restriction > > enzyme list. We can choose between http/ftp and various types of > > lists/formats. My preference would be to go to their ftp site and get what > > they call "format 18": DNAStrider format, list of all commercially > > available enzymes. The file is ftp://ftp.nebase.com/pub/nebase/striderc.*, > > the extension of the file reflects the version). > > - saves this list in a text file, in the Bio/Tools/ directory. An > > alternative is to update the enzyme list in the RestrictionEnzyme.pm file > > itself, at the beginning of the file, within the definition of the %RE > > hash, but intuitively I would not tend to recommend it, as I don't know if > > writting in a file at the same time it is being read by the perl > > interpreter will behave well in all operating systems. Tell me what you > > think about it. > >You normally can't write to Bio/Tools as a user (under Unix), and >a user client shouldn't attempt to do so under any circumstances. >Regarding the ability to update the list of known REs, I see the >following options. >1) Accept an additional (named!) parameter at initialization that >denotes a file (in DNAStrider format?) containing the enzymes to >be known in addition to a collection of hard-coded enzymes. >2) Same as before, but the parameter denotes a URL from where to >obtain this file. >3) Put all hard-coded enzymes into a file that resides at a known >place within the Bio/ directory tree, and read (parse) that upon >initialization of RestrictionEnzyme.pm. An update would mean >updating that file. > >I'm not sure option 3) would have compelling advantages to the >present layout. Options 1) and 2) are certainly worthwhile to >pursue and in essence are almost identical, the only difference >being how to open the stream containing the enzyme data. So, one >could try to combine both into one parameter, and have the code >figure out whether it's a file or a http/ftp URL. > > Hilmar > >Do you already have a CVS write account? > > > - if the enzyme list is saved in a separate file, I will also modify the > > initialisation of the %RE hash, with code that reads and parses the enzyme > > list file. > > > > If this sounds OK to you, I will write it this weekend and submit it. Of > > course if you had something completely different in mind please say it, I > > will try to adapt to it. > > > > Paul-Christophe > > > >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- Professor Adrian Goldman, | Phone: 358-(0)9-191 58923 Structural Biology Group, | FAX: 358-(0)9-191 58952 Institute of Biotechnology | Sec: 358-(0)9-191 58921 University of Helsinki, | Mobile: 358-(0)50-336 8960 PL 56 | Home: 358-(0)9-728 7103 00014 Helsinki | email: Adrian.Goldman@Helsinki.fi -- on sabbatical at Brookhaven National labs, June 2000-June 2001 Adrian Goldman, Biology Department, Building 463 50 Bell Ave., Brookhaven National Lab., Upton NY 11973. Phone: 631-344-2671 (off) 631-344-3417 (lab), 631-344-3407 (FAX). email: agoldman@bnl.gov From birney@ebi.ac.uk Wed Jan 24 19:39:03 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 19:39:03 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: <3A6F21B5.BAD76917@gmx.net> Message-ID: On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > Well, I did find the test script code more concise after migrating > to Test.pm. However, we don't use much of its functionality yet, > only rather basic things. Would it be that hard to roll our own > Test.pm version that offers just the basic things we're currently > using, maybe even by porting the original? Would make the switch > to the system module easy, once we drop 5.004 compatibility (we > won't keep that eternally, will we?)? This is an ok route for me as well. I guess it is not too hard. Is this going to drop between the three of us? Jason are you volunteering (I am aware jason you have done the lion's share of towards branch coding so far...) > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 19:50:37 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 14:50:37 -0500 (EST) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Ewan Birney wrote: > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > Well, I did find the test script code more concise after migrating > > to Test.pm. However, we don't use much of its functionality yet, > > only rather basic things. Would it be that hard to roll our own > > Test.pm version that offers just the basic things we're currently > > using, maybe even by porting the original? Would make the switch > > to the system module easy, once we drop 5.004 compatibility (we > > won't keep that eternally, will we?)? > > This is an ok route for me as well. I guess it is not too hard. > > Is this going to drop between the three of us? Jason are you volunteering > (I am aware jason you have done the lion's share of towards branch coding > so far...) It's pretty mindless to make the corrections so I can do it while I'm waiting for some analysis to finish. Can we just be sure that we are doing what seems to be the RIGHT thing. I really don't want to break the build on 5.00404 so let's roll our own Test.pm with and ok() and skip() methods or backport (my vote) to the old way of sub test {}. I'd put the Test module in t/Test.pm. > > > > > > Hilmar > > -- > > ----------------------------------------------------------------- > > Hilmar Lapp email: hlapp@gmx.net > > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > > ----------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 19:54:24 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 19:54:24 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > > On Wed, 24 Jan 2001, Ewan Birney wrote: > > > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > > > > Well, I did find the test script code more concise after migrating > > > to Test.pm. However, we don't use much of its functionality yet, > > > only rather basic things. Would it be that hard to roll our own > > > Test.pm version that offers just the basic things we're currently > > > using, maybe even by porting the original? Would make the switch > > > to the system module easy, once we drop 5.004 compatibility (we > > > won't keep that eternally, will we?)? > > > > This is an ok route for me as well. I guess it is not too hard. > > > > Is this going to drop between the three of us? Jason are you volunteering > > (I am aware jason you have done the lion's share of towards branch coding > > so far...) > > It's pretty mindless to make the corrections so I can do it while I'm > waiting for some analysis to finish. Can we just be sure that we are > doing what seems to be the RIGHT thing. I really don't want to break the > build on 5.00404 so let's roll our own Test.pm with and ok() and skip() > methods or backport (my vote) to the old way of sub test {}. I'd put the > Test module in t/Test.pm. I don't understand the backport sub test {} method - does this mean each Test uses this routine? I trust your call in here... From jason@chg.mc.duke.edu Wed Jan 24 19:57:26 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 14:57:26 -0500 (EST) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Ewan Birney wrote: > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > > > It's pretty mindless to make the corrections so I can do it while I'm > > waiting for some analysis to finish. Can we just be sure that we are > > doing what seems to be the RIGHT thing. I really don't want to break the > > build on 5.00404 so let's roll our own Test.pm with and ok() and skip() > > methods or backport (my vote) to the old way of sub test {}. I'd put the > > Test module in t/Test.pm. > > > I don't understand the backport sub test {} method - does this mean each > Test uses this routine? > > I trust your call in here... Stupid me, I meant to vote for a rolled our own Test.pm with ok and skip methods. The other option is to go back to what we had which was to defined a method test() in every .t file. But that is sort of dumb to copy and paste that method into every t file. We should at a minimum make one file that has the necessary method(s) - why not mimic Test.pm in this case with ok() and skip()... > > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 19:59:50 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 19:59:50 +0000 (GMT) Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > On Wed, 24 Jan 2001, Ewan Birney wrote: > > > On Wed, 24 Jan 2001, Jason Stajich wrote: > > > > > > > > > It's pretty mindless to make the corrections so I can do it while I'm > > > waiting for some analysis to finish. Can we just be sure that we are > > > doing what seems to be the RIGHT thing. I really don't want to break the > > > build on 5.00404 so let's roll our own Test.pm with and ok() and skip() > > > methods or backport (my vote) to the old way of sub test {}. I'd put the > > > Test module in t/Test.pm. > > > > > > I don't understand the backport sub test {} method - does this mean each > > Test uses this routine? > > > > I trust your call in here... > > Stupid me, I meant to vote for a rolled our own Test.pm with ok and skip > methods. > > The other option is to go back to what we had which was to defined a > method test() in every .t file. But that is sort of dumb to copy and > paste that method into every t file. We should at a minimum make one file > that has the necessary method(s) - why not mimic Test.pm in this case with > ok() and skip()... Sounds good to me. I can help at least with the Testing of the test suite, and possibly some of the leg work (not tonight... about to get dinner...) > > > > > > > > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 20:00:55 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 15:00:55 -0500 (EST) Subject: [Bioperl-l] SplitFeature parsing Message-ID: Anyone want to help look at SplitLocation adding in SeqIO/FTHelper? We do this sub_SeqFeature addition with the code 'EXPAND' but it never gets reinterpreted when FT writing. Anyways, I suspect we chuck all this and go to a SplitLocation - right? All are invited to help here, but otherwise I'll go ahead and try and do it myself. Will need to first implement a Bio::Location::SplitLocation and then change the FTHelper code. Diving in soon... Hoping we can make the end of the month goal. Hilmar it might be helpful to recap the todo list on email so anyone who wants to join in knows what is left to do... -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Wed Jan 24 20:06:03 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 24 Jan 2001 20:06:03 +0000 (GMT) Subject: [Bioperl-l] SplitFeature parsing In-Reply-To: Message-ID: On Wed, 24 Jan 2001, Jason Stajich wrote: > Anyone want to help look at SplitLocation adding in SeqIO/FTHelper? We > do this sub_SeqFeature addition with the code 'EXPAND' but it never gets > reinterpreted when FT writing. Anyways, I suspect we chuck all this and > go to a SplitLocation - right? All are invited to help here, but > otherwise I'll go ahead and try and do it myself. Will need to first > implement a Bio::Location::SplitLocation and then change the FTHelper > code. Diving in soon... Go for it. I can certainly review it. I still have RichSeqI stuff to do. . Not enough time... > > Hoping we can make the end of the month goal. Hilmar it might be helpful > to recap the todo list on email so anyone who wants to join in knows what > is left to do... > > -jason > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From lapp@gnf.org Wed Jan 24 20:11:48 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 12:11:48 -0800 Subject: [Bioperl-l] Test.pm does not ship with 5.004 patch level 4 References: Message-ID: <3A6F3704.5C90DFB@gnf.org> Jason Stajich wrote: > > I meant to vote for a rolled our own Test.pm with ok and skip > methods. > > The other option is to go back to what we had which was to defined a > method test() in every .t file. But that is sort of dumb to copy and > paste that method into every t file. We should at a minimum make one file > that has the necessary method(s) - why not mimic Test.pm in this case with > ok() and skip()... > That's exactly what I meant. Then the whole test code migration was not in vain, because in the end (whenever that is :) we simply exchange a use statement at the top of each test script to use the system-supplied Test.pm. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 24 20:43:57 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 15:43:57 -0500 (EST) Subject: [Bioperl-l] Bio::SeqIO::FTHelper Message-ID: Looking at FTHelper _parse_loc method. Do we want our good buddy FTHelper to continue to create tag/value pairs for sub_Features to represent things like '_part_feature' and '_zero_width_feature'? Or are we happy with having Split/Fuzzy Locations handle this representation? -jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Wed Jan 24 21:01:54 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 13:01:54 -0800 Subject: [Bioperl-l] Bio::SeqIO::FTHelper References: Message-ID: <3A6F42C2.3C1D65CC@gnf.org> Jason Stajich wrote: > > Looking at FTHelper _parse_loc method. > > Do we want our good buddy FTHelper to continue to create tag/value pairs > for sub_Features to represent things like '_part_feature' and > '_zero_width_feature'? Or are we happy with having Split/Fuzzy Locations > handle this representation? > Maybe I'm missing something, but I think migrating semantics from undocumented tags to explicit types was one of the objectives. If it was really undocumented, there shouldn't be client code relying on those tags outside of the Bioperl core itself. In theory :) People out there, if you have a client that relies on undocumented tags in a SeqFeature::Generic, please shout. Otherwise you can reckon that these tags will be gone. Please also shout if you have clients relying on documented tags pertaining to location and length (I recall having added some tags to the documentation in summer or fall last year, but hopefully no-one noticed :o) Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From MEColosimo@alumni.carnegiemellon.edu Wed Jan 24 23:31:27 2001 From: MEColosimo@alumni.carnegiemellon.edu (Marc Colosimo) Date: Wed, 24 Jan 2001 18:31:27 -0500 Subject: [Bioperl-l] Re: [Bioperl-guts-l] RestrictionEnzyme.pm (Hilmar Lapp) References: <200101241703.f0OH3ip16053@pw600a.bioperl.org> Message-ID: <3A6F65CC.3FB627FA@alumni.carnegiemellon.edu> > Paul-Christophe Varoutas wrote: > > > > Hi again, > > > > Yesterday night I started experimenting with RestrictionEnzyme.pm. > > > > I liked very much the '-MAKE' =>'custom' switch in the constructor but I > > think it would nevertheless be a good idea to write a public method which > > updates the enzyme list from the NEBASE site. > > > > I suggest to write a sub (lets call it update_list or update_RE_list) that: > > > > - goes to the NEBASE site and gets the last version of the restriction > > enzyme list. We can choose between http/ftp and various types of > > lists/formats. My preference would be to go to their ftp site and get what > > they call "format 18": DNAStrider format, list of all commercially > > available enzymes. The file is ftp://ftp.nebase.com/pub/nebase/striderc.*, > > the extension of the file reflects the version). > > - saves this list in a text file, in the Bio/Tools/ directory. An > > alternative is to update the enzyme list in the RestrictionEnzyme.pm file > > itself, at the beginning of the file, within the definition of the %RE > > hash, but intuitively I would not tend to recommend it, as I don't know if > > writting in a file at the same time it is being read by the perl > > interpreter will behave well in all operating systems. Tell me what you > > think about it. > > You normally can't write to Bio/Tools as a user (under Unix), and > a user client shouldn't attempt to do so under any circumstances. > Regarding the ability to update the list of known REs, I see the > following options. > 1) Accept an additional (named!) parameter at initialization that > denotes a file (in DNAStrider format?) containing the enzymes to > be known in addition to a collection of hard-coded enzymes. > 2) Same as before, but the parameter denotes a URL from where to > obtain this file. > 3) Put all hard-coded enzymes into a file that resides at a known > place within the Bio/ directory tree, and read (parse) that upon > initialization of RestrictionEnzyme.pm. An update would mean > updating that file. I would like to out that not all systems are as mean as Unix. Also, it would be nice to read them in save them in the local space. That way the user can just tell it to use the one in his/her space. That way they can have different ones (for what every reason). If your going through the trouble of doing this. Could you also add the ability to use multiple enzymes and/or list multiple enzymes? > > > I'm not sure option 3) would have compelling advantages to the > present layout. Options 1) and 2) are certainly worthwhile to > pursue and in essence are almost identical, the only difference > being how to open the stream containing the enzyme data. So, one > could try to combine both into one parameter, and have the code > figure out whether it's a file or a http/ftp URL. > > Hilmar Marc From jason@chg.mc.duke.edu Wed Jan 24 23:44:06 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 18:44:06 -0500 (EST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split Message-ID: I'd just like to reiterate - beware bioperl-live is development code. I added these handlers for Fuzzy and Split features. I decided to create methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether or now we saw the <, > descriptors. I probably need some more test cases to make sure we are really getting everything to work, but the test in t/SeqIO test.genbank in genbank.out seem to work for most things except the variation feature type which uses the operator 'replace'. We'll have to define that in the FTHelper model, I didn't plan for it. So the checked in code will screw up the variation features, but everything else seems to work. I'd like to do a better job detecting the feature location type from [.., ., ^] and use that to describe the Location object better, but we have the case of '<' and '>' which are technically fuzzy so I'm not sure how I really want to store these types of locations. Anyways, I'm not being very clear, so have a look, I know there are areas of improvement, you too can help us make this a robust parser.... -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Wed Jan 24 23:57:45 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 24 Jan 2001 18:57:45 -0500 (EST) Subject: [Bioperl-l] failing test (singular!) Message-ID: So I don't owe beer in the Ensembl method of open source... I am getting errors from t/LiveSeq.t because start/end are not defined when a new LiveSeq Exon is instantiated. I'll be sure and look at it Thursday when I get a chance. Apologies for somehow breaking that, it is not clear where the error is, but it has something to do with either FTHelper changes or the way SeqFeatures get their start/end/strand information (my money is on this) with the new Location model. Everything else seems to pass except for the occasional crapout on NCBI website connection in the t/DB.t test. Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Thu Jan 25 00:22:40 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 16:22:40 -0800 Subject: [Bioperl-l] failing test (singular!) References: Message-ID: <3A6F71D0.601FF22E@gnf.org> Jason Stajich wrote: > > So I don't owe beer in the Ensembl method of open source... > I am getting errors from t/LiveSeq.t because start/end are not defined > when a new LiveSeq Exon is instantiated. I'll be sure and look at it > Thursday when I get a chance. Apologies for somehow breaking that, it is > not clear where the error is, but it has something to do with either > FTHelper changes or the way SeqFeatures get their start/end/strand > information (my money is on this) with the new Location model. > > Everything else seems to pass except for the occasional crapout on NCBI > website connection in the t/DB.t test. > Cool, Jason. I have the feeling 0.7 can become a release we can really be proud of. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Thu Jan 25 00:36:29 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Wed, 24 Jan 2001 16:36:29 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: Message-ID: <3A6F750D.367663F8@gnf.org> Jason Stajich wrote: > > I'd just like to reiterate - beware bioperl-live is development code. > > I added these handlers for Fuzzy and Split features. I decided to create > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > or not we saw the <, > descriptors. I probably need some more test cases I may have missed the obvious solution, but how are we going to distinguish 'unknown start/end' and 'somewhere in between'? That is, '<150' meaning 'before position 150', making it non-obvious how to return a minimal start, and '120.130' meaning it's between two known positions. Will I have to test fuzzy_start() before I'm allowed to safely call min_start()? (no, I don't want to suggest exceptions ... :O) > to make sure we are really getting everything to work, but the test in > t/SeqIO test.genbank in genbank.out seem to work for most things except > the variation feature type which uses the operator 'replace'. We'll have > to define that in the FTHelper model, I didn't plan for it. > I'm not sure the 'replace' operator is still standard (i.e., allowed). I seem to recall that it is no longer among the allowed operators, so you might wish to double-check on NCBI's feature table grammar definition. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason@chg.mc.duke.edu Thu Jan 25 15:06:17 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 25 Jan 2001 10:06:17 -0500 (EST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: <3A6F750D.367663F8@gnf.org> Message-ID: On Wed, 24 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > I added these handlers for Fuzzy and Split features. I decided to create > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > or not we saw the <, > descriptors. I probably need some more test cases > > I may have missed the obvious solution, but how are we going to > distinguish 'unknown start/end' and 'somewhere in between'? That is, > '<150' meaning 'before position 150', making it non-obvious how to > return a minimal start, and '120.130' meaning it's between two known > positions. Will I have to test fuzzy_start() before I'm allowed to > safely call min_start()? (no, I don't want to suggest exceptions ... > :O) Hmm, perhaps I was confused. I thought Split Location would deal with min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a better word, suggestions welcome]. All 3 can be present in the same location so they have to be independent operators. When you call start, it will return what it thinks is the start but you'll have to test to see if the range or the start is fuzzy ($loc->range_fuzzy || $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an exception here, but can be persuaded. Feel free to suggest a better set of methods for this. Now I'm cheating because I just added range_fuzzy this morning since I wanted to think about that some more. Learning by doing.... Oh and I think I just messed up - I'm not handling the 3'/5' different for the fuzziness, (< vs >). Will fix that by start_fuzzy/end_fuzzy returning -1, 0, 1 meaning 5', not fuzzy, on 3'. Unless you think it should return "<100" or "100>" instead? > > > to make sure we are really getting everything to work, but the test in > > t/SeqIO test.genbank in genbank.out seem to work for most things except > > the variation feature type which uses the operator 'replace'. We'll have > > to define that in the FTHelper model, I didn't plan for it. > > > > I'm not sure the 'replace' operator is still standard (i.e., allowed). > I seem to recall that it is no longer among the allowed operators, so > you might wish to double-check on NCBI's feature table grammar > definition. okay, well, we'll have to think about whether or not we want to just handle non-standard operators in a bulk way 'NonStandardLocation' which stores a tag that describes the operator so that we can preserve the tag name, or if we should build the flexibility some other way. Clearly what is being output right now variation 2913^2913 /replace="g" is relatively different from variation replace(347,"c") > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Thu Jan 25 15:19:53 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 25 Jan 2001 15:19:53 +0000 (GMT) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Jason Stajich wrote: > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > Jason Stajich wrote: > > > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > > > I added these handlers for Fuzzy and Split features. I decided to create > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > > or not we saw the <, > descriptors. I probably need some more test cases > > > > I may have missed the obvious solution, but how are we going to > > distinguish 'unknown start/end' and 'somewhere in between'? That is, > > '<150' meaning 'before position 150', making it non-obvious how to > > return a minimal start, and '120.130' meaning it's between two known > > positions. Will I have to test fuzzy_start() before I'm allowed to > > safely call min_start()? (no, I don't want to suggest exceptions ... > > :O) > > Hmm, perhaps I was confused. I thought Split Location would deal with > min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a > better word, suggestions welcome]. All 3 can be present in the same > location so they have to be independent operators. When you call > start, it will return what it thinks is the start but you'll have to > test to see if the range or the start is fuzzy ($loc->range_fuzzy || > $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an > exception here, but can be persuaded. In my experience it is crucial to treat join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not as a class of FuzzyFeature. The above syntax is the most used "fuzziness" and nearly everyone discards the leading and trailing '<' '>' as it means "partial gene" with the coordinates interpreted in a hard way. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From Shailesh L Mistry" Hi All, Here is the latest news about testing bioperl on WinNT :- 1) The BPbl2seq.t fails on test 7 because it wants the value to be 2e-053 but it gets 2e-53. (not sure if this is worth pursuing). 2) The alarm function has not been fixed and so blast.t, html.t and SimilarityPair.t fail. A decision needs to be made about whether to avoid using it or to just put a switch in to detect for Win32. 3) Index.t still has a file handle bug in it (Bug 865), so it can't be checked any further. 4) There is an intermittent problem with gdb.t and liveseq.t, both of which are proving difficult to track down. I hope this helps. Shelly. PS. My email is still stuffed so replies from me may be delayed. From jason@chg.mc.duke.edu Thu Jan 25 16:27:39 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 25 Jan 2001 11:27:39 -0500 (EST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Ewan Birney wrote: > On Thu, 25 Jan 2001, Jason Stajich wrote: > > > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > Jason Stajich wrote: > > > > > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > > > > > I added these handlers for Fuzzy and Split features. I decided to create > > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > > > or not we saw the <, > descriptors. I probably need some more test cases > > > > > > I may have missed the obvious solution, but how are we going to > > > distinguish 'unknown start/end' and 'somewhere in between'? That is, > > > '<150' meaning 'before position 150', making it non-obvious how to > > > return a minimal start, and '120.130' meaning it's between two known > > > positions. Will I have to test fuzzy_start() before I'm allowed to > > > safely call min_start()? (no, I don't want to suggest exceptions ... > > > :O) > > > > Hmm, perhaps I was confused. I thought Split Location would deal with > > min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start > > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a > > better word, suggestions welcome]. All 3 can be present in the same > > location so they have to be independent operators. When you call > > start, it will return what it thinks is the start but you'll have to > > test to see if the range or the start is fuzzy ($loc->range_fuzzy || > > $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an > > exception here, but can be persuaded. > > In my experience it is crucial to treat > join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not > as a class of FuzzyFeature. > > The above syntax is the most used "fuzziness" and nearly everyone discards > the leading and trailing '<' '>' as it means "partial gene" with the > coordinates interpreted in a hard way. Okay I was interpreting this as a SplitLocation with 3 LocationI objects 2 of which are Fuzzy Locations... I just wasn't handling all the possible cases of 10..<100 10..100> <10..100 10>..100 I consider this fuzzy -- since a start or end point is not well defined. I also consider 5.12 fuzzy since its 'range' is not fuzzy. > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Thu Jan 25 16:38:21 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 25 Jan 2001 16:38:21 +0000 (GMT) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Jason Stajich wrote: > > On Thu, 25 Jan 2001, Ewan Birney wrote: > > > On Thu, 25 Jan 2001, Jason Stajich wrote: > > > > > On Wed, 24 Jan 2001, Hilmar Lapp wrote: > > > > > > > Jason Stajich wrote: > > > > > > > > > > I'd just like to reiterate - beware bioperl-live is development code. > > > > > > > > > > I added these handlers for Fuzzy and Split features. I decided to create > > > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether > > > > > or not we saw the <, > descriptors. I probably need some more test cases > > > > > > > > I may have missed the obvious solution, but how are we going to > > > > distinguish 'unknown start/end' and 'somewhere in between'? That is, > > > > '<150' meaning 'before position 150', making it non-obvious how to > > > > return a minimal start, and '120.130' meaning it's between two known > > > > positions. Will I have to test fuzzy_start() before I'm allowed to > > > > safely call min_start()? (no, I don't want to suggest exceptions ... > > > > :O) > > > > > > Hmm, perhaps I was confused. I thought Split Location would deal with > > > min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start > > > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a > > > better word, suggestions welcome]. All 3 can be present in the same > > > location so they have to be independent operators. When you call > > > start, it will return what it thinks is the start but you'll have to > > > test to see if the range or the start is fuzzy ($loc->range_fuzzy || > > > $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an > > > exception here, but can be persuaded. > > > > In my experience it is crucial to treat > > join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not > > as a class of FuzzyFeature. > > > > The above syntax is the most used "fuzziness" and nearly everyone discards > > the leading and trailing '<' '>' as it means "partial gene" with the > > coordinates interpreted in a hard way. > > Okay I was interpreting this as a > SplitLocation with > 3 LocationI objects > 2 of which are Fuzzy Locations... Ok. This is a good solution here, but the trouble about this recursion is that of course it allows SplitLocationI has-a SplitLocationI etc, which now becomes (a) a nightmare to do anything with (b) impossible to represent in EMBL/GenBank (c) generally lots of rope to hang ourselves with Two options - punt on these cases in the code... or pop in another inheritance layer in the interfaces: LocationI ^ | ------------------------ SingleLocationI SplitLocationI | sub_Locations defined to return SingleLocationI array | ----------------- SimpleLocationI FuzzyLocationI (does the above crappy ascii art make sense to you?) I guess this says that all FuzzyLocations can be made as combination of a single SplitLocation with a set of FuzzyLocations. ???? (ewan sighs again about fuzziness. It is just a can of worms that noone needs and noone should use) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Thu Jan 25 18:35:24 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 25 Jan 2001 10:35:24 -0800 Subject: [Bioperl-l] multiple blast script Message-ID: <3A7071EC.810AABA1@gmx.net> To bring it to the right audience. Please do not post such things to bioperl-guts-l, they will be ignored normally. Hilmar -------- Original Message -------- Subject: [Bioperl-guts-l] multiple blast script Date: Wed, 24 Jan 2001 13:22:50 -0600 From: Willy Valdivia To: bioperl-guts-l@bioperl.org Dear Group: I am looking for a Perl script that I can may allow me to perform multiple sequences alignment at once using BLAST. Thank you, Willy Valdivia Granda Plant Sciences Dept North Dakota State University _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From hlapp@gmx.net Thu Jan 25 18:38:59 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Thu, 25 Jan 2001 10:38:59 -0800 Subject: [Bioperl-l] RestrictionEnzyme.pm Message-ID: <3A7072C3.82932227@gmx.net> To bring it to the right audience. Please note that despite some module PODs still saying that bioperl-guts-l is for technical discussion, it is not in fact. The guts-list is for CVS messages and similar stuff most people never want to hear about. Hilmar -------- Original Message -------- Subject: RE: [Bioperl-guts-l] RestrictionEnzyme.pm Date: Wed, 24 Jan 2001 14:00:30 +0100 From: "Paul-Christophe Varoutas" To: You *are* right about not writting to the Bio/Tools directory, I guess I was rather sleepy when I wrote my previous mail %-P. And using my Win2000 and Linux as root all time doesn't arrange things either ;-). It is a good idea to incorporate the RE list update in the object constructor, and combining Hilmar's options (1) and (2) seems great because it's a flexible solution and should suite most needs. For the URL retrieval, I guess http will be more suitable, I will contact NEBASE to be sure that the URL we will decide to mostly use is to remain stable. This solution raises a small question: that of multiple occurences. The fact that we are using hashes will take care of eliminating multiple occurences of enzymes (one from the hard-coded collection, one from the the file / URL). Perhaps a minor issue would be to decide whether we just "let perl do the work" or if we do verifications while replacements are done, and/or define how they are done. We can make the assumption that, say, AatII always has the same recognition site, but if I make a issue out of this is because I don't know yet how this module is being used, and especially if it is only used for what it has initially been designed for. Do you know if there are users out there using this module in an unorthodox way, defining enzyme names/recognition sequences that don't exist, but could risk to create conflicts/unusual behavior ? Another issue is enzymes cutting asymetrically. For the moment the other RestrictionEnzyme methods don't know how to deal with them (as far as I understood), so the code will just ignore them while parsing the RE list file. One remaining question is about the RE list file format: is the DNAStrider format OK for everybody, or is there another suggestion ? An alternative would be to contact NEBASE and ask them to add a new 'bioperl' format to their database, and then define a format that minimizes parsing and suits best our needs. On their web site they say: "As REBASE expands, new data formats are provided. Requests for specialized formats are welcome, as we are prepared to support each major sequence analysis package". (The URL is: http://rebase.neb.com/rebase/rebase.serv.html ) So what do you think about this idea ? > Do you already have a CVS write account? I have already successfully anonymously CVSed from my home PC (under Win2000 and linux), but I don't have a write account yet. I will contact Ewan / Chris about that. Paul-Christophe > You normally can't write to Bio/Tools as a user (under Unix), and > a user client shouldn't attempt to do so under any circumstances. > Regarding the ability to update the list of known REs, I see the > following options. > 1) Accept an additional (named!) parameter at initialization that > denotes a file (in DNAStrider format?) containing the enzymes to > be known in addition to a collection of hard-coded enzymes. > 2) Same as before, but the parameter denotes a URL from where to > obtain this file. > 3) Put all hard-coded enzymes into a file that resides at a known > place within the Bio/ directory tree, and read (parse) that upon > initialization of RestrictionEnzyme.pm. An update would mean > updating that file. > > I'm not sure option 3) would have compelling advantages to the > present layout. Options 1) and 2) are certainly worthwhile to > pursue and in essence are almost identical, the only difference > being how to open the stream containing the enzyme data. So, one > could try to combine both into one parameter, and have the code > figure out whether it's a file or a http/ftp URL. > > Hilmar > > Do you already have a CVS write account? > > > - if the enzyme list is saved in a separate file, I will also modify the > > initialisation of the %RE hash, with code that reads and parses > the enzyme > > list file. > > > > If this sounds OK to you, I will write it this weekend and submit it. Of > > course if you had something completely different in mind please > say it, I > > will try to adapt to it. > > > > Paul-Christophe > > > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From lapp@gnf.org Thu Jan 25 21:10:23 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 25 Jan 2001 13:10:23 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: Message-ID: <3A70963F.34F10F@gnf.org> First, I think it is better to bring this back to the list, because users *will* be affected by the final design and implementation (i.e., Mark & David & others, watch out, don't complain afterwards). Jason Stajich wrote: > > So I that have really clearly solved this - > lease correct me if any of the following statement is false. ( N is a > location point) > > - start/end can be fuzzy at both points and it could be > (on 3') at either start/end point. However, N< and >N are invalid fuzzy > point descriptions. If they are indeed true then my start_fuzzy will > need to be more than just (-1, 0, 1) -- (5', not fuzzy, 3') but 5 > points (5' before, 5' after, 0, 3' before, 3' after) and I really don't > even know what that would mean since I would be so wrapped up in strand > coordinates - would think a 'complement' would simplify it ( no, not a > pat on the back, that's when we get to the release) > > - in plain simple genbank/embl terms > <5..12> and <5.12> > are valid, but > >5..12, 5<..12, 5..12<, 5..>12 > are invalid. The GenBank documentation is somewhat inconsistent here. Let me quote: From http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB If the "<" symbol precedes a base span, the sequence is partial on the 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, the sequence is partial on the 3' end (e.g., CDS 435..915>). From http://www.ncbi.nlm.nih.gov/collab/FT/index.html CDS <1..>336 /codon_start=1 /gene="IGHV1" /product="immunoglobulin heavy chain variable region" V_region <1..>336 /gene="IGHV1" /product="immunoglobulin heavy chain variable region" From the BNF grammar definition of the feature table, to be found at http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur local_location ::= | | base_position ::= | | | low_base_bound ::= > high_base_bound ::= < two_base_bound ::= . between_position ::= ^ base_range ::= .. The sample record link seems to be pretty new, but I'm not sure. Shall we simply build upon the BNF? Maybe we should ask someone from NCBI. > > Questions: > 1. Do we need to override the famous pocock RangeI contains/overlaps > methods for a Split location to take into account where the pieces > of the contained LocationI are? > Or do we take the easy route and just use min_start/max_end? I think > that right now start/end return 0 for a split location since they are > not explictly set, should they default to delegating to > min_start/max_start? I think so. > > What about in Fuzzy, do we want to throw exceptions or do we just use > the best information we have and do some logic and coordinate > gymnastics to try and return a reasonable value or else throw an > exception? > As I understood the comments from users, exceptions should be avoided here whenever possible. However, since there are different policies one can think of, a mechanism should be provided to switch between them. > 2. Deep Split/Fuzziness - [copying famous artwork from Ewan's latest > email] > > LocationI > ^ > | > ------------------------ > SingleLocationI SplitLocationI > | sub_Locations defined to return SingleLocationI array > | > ----------------- > SimpleLocationI FuzzyLocationI > > > (does the above crappy ascii art make sense to you?) > > I guess this says that all FuzzyLocations can be made as combination of > a single SplitLocation with a set of FuzzyLocations. > > [ end Ewan's included message ] > > This is exactly what I have assumed. I see SplitLocation as simply a > Collection of LocationI objects some of which may be fuzzy. The only > problem is how to define min_start/max_end for a > SplitLocation when the beginning and end of the locations are fuzzy? > > As for deep SplitLocation (ie SplitLocation containing Location objects > that are SplitLocations), this will work in a very gross way just like > perl flattens arrays, except I don't plan to simplify the join(...join()) > code into a single join() unless you guys think its worth it. It wouldn't > be hard, just let perl collapse the arrays... > Be aware that you don't lose information you need for recovering the original location entry upon writing. If that seems to inflate the object tree unnecessarily, we can also store the original location string as a property. Not beautiful, but KISS is not a bad principle. > Any other problems you guys can think of. > > So close... I wonder if we should include Alan on this so we can see if > the biocorba IDL will really handle all of this now? I guess I could To my understanding BioCorba and BioPerl pretty much affect each other, don't they? If so, we should definitely get a comment from him. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Thu Jan 25 21:18:44 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 25 Jan 2001 13:18:44 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A706F7F.6020908@sanger.ac.uk> Message-ID: <3A709834.64E33D50@gnf.org> Bringing this (in part) back to the list, too. Matthew Pocock wrote: > > Jason Stajich wrote: > > Questions: > > 1. Do we need to override the famous pocock RangeI contains/overlaps > > methods for a Split location to take into account where the pieces > > of the contained LocationI are? > > Or do we take the easy route and just use min_start/max_end? I think > > that right now start/end return 0 for a split location since they are > > not explictly set, should they default to delegating to > > min_start/max_start? I think so. > > > Originaly the BioJava Locations just used min/max for all location > operators - this turned out to be a *very bad thing* under most > conditions. You are better off having operators that use split locations > return split locations - also, the union of two ranges that don't > overlap is the split location containing both ranges. It is more work to > set up, but it pays off & if you don't do it you get confusing bugs later. > > > What about in Fuzzy, do we want to throw exceptions or do we just use > > the best information we have and do some logic and coordinate > > gymnastics to try and return a reasonable value or else throw an > > exception? > My gut says to return the inner-most coordinate that is known but > provide API to get the full fuzzy coordinates out - so > > full loc -> start..end : minStart..maxEnd > <50..100> -> 50..100 : -INF..+INF > (78.90)..(100.107) -> 90..100 : 78..107 > I think I am much more in favor of returning the outer-most coordinates as the default policy. David, Mark? I'm also not sure whether INF or NaN are good return values in perl (i.e., can you test for INF or NaN by numeric comparison? I figured that e.g. you can't obtain NaN by sqrt(-1), as would be the result in C). Hilmar > > > > As for deep SplitLocation (ie SplitLocation containing Location objects > > that are SplitLocations), this will work in a very gross way just like > > perl flattens arrays, except I don't plan to simplify the join(...join()) > > code into a single join() unless you guys think its worth it. It wouldn't > > be hard, just let perl collapse the arrays... > Should work - there is the pathalogical case where an index is included > via two paths. CompoundLocation in BioJava does all the collapsing at > constructor time. All our Location objects are immutable, so once > constructed, you can't change their contained indexes in any way. The > hierachy of Location containment is never exposed to the user - we may > have to expose it if we provide a full fuzzy-location editor, though. > Now I come to think of it, I have seen Embl CDS entries with internal > exons that have < or > operators on them. Pants. > -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Thu Jan 25 21:48:47 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Thu, 25 Jan 2001 15:48:47 -0600 (CST) Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split In-Reply-To: <3A709834.64E33D50@gnf.org> Message-ID: This is my Modest Proposal for resolving some things: If there is a defined location, return the defined location, ie: my $start=$feature->start $start equals 42. IF there is not a hard location (it is fuzzy, split, whatever), return the location object, and let the client suss out what it wants to do with it. my $start=$feature->start; if (ref($start) eq 'LocationI') { #whatever the perl syntax is $start=myLocParser($start); } Then $start could be made to be min_start, max_start, an_array_of_start_values, or whatever was convenient for the client. $0.02 Cdn is pretty cheap nowadays. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From mdalphin@amgen.com Thu Jan 25 21:51:03 2001 From: mdalphin@amgen.com (Mark Dalphin) Date: Thu, 25 Jan 2001 13:51:03 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A70963F.34F10F@gnf.org> Message-ID: <3A709FC7.84F164F9@amgen.com> Hilmar Lapp wrote: > > - in plain simple genbank/embl terms > > <5..12> and <5.12> > > are valid, but > > >5..12, 5<..12, 5..12<, 5..>12 > > are invalid. > > The GenBank documentation is somewhat inconsistent here. Let me quote: > > >From http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB > > > If the "<" symbol precedes a base span, the sequence is partial on the > 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, > the > sequence is partial on the 3' end (e.g., CDS 435..915>). > > > >From http://www.ncbi.nlm.nih.gov/collab/FT/index.html > > > CDS <1..>336 > /codon_start=1 > /gene="IGHV1" > /product="immunoglobulin heavy chain variable region" > V_region <1..>336 > /gene="IGHV1" > /product="immunoglobulin heavy chain variable region" > > > >From the BNF grammar definition of the feature table, to be found at > http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur > > > local_location ::= | | > base_position ::= | | | > > > low_base_bound ::= > > > high_base_bound ::= < > > two_base_bound ::= . > > between_position ::= ^ > > base_range ::= .. > > I just looked for an example at NCBI and found this: http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&list_uids=234355&dopt=GenBank As you can see, the symbol '>' does end up BEFORE the position it is modifing which is consistant with the BNF. Hope this helps... LOCUS S52564 10 bp DNA PRI 05-APR-1999 DEFINITION Homo sapiens phenylalanine hydroxylase (PAH) gene, partial cds. ACCESSION S52564 VERSION S52564.1 GI:234355 SOURCE human. ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. FEATURES Location/Qualifiers source 1..10 /organism="Homo sapiens" /db_xref="taxon:9606" gene <1..>10 /gene="PAH" CDS <1..>10 /gene="PAH" /note="missense mutation" /codon_start=2 /product="phenylalanine hydroxylase" /protein_id="AAD14912.2" /db_xref="GI:4559419" /translation="HGV" variation 5..7 /gene="PAH" /note="Gly for Glu221" BASE COUNT 3 a 2 c 3 g 2 t ORIGIN 1 ccatggagta // Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work) From mwilkinson@gene.pbi.nrc.ca Thu Jan 25 21:44:36 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Thu, 25 Jan 2001 15:44:36 -0600 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A706F7F.6020908@sanger.ac.uk> <3A709834.64E33D50@gnf.org> Message-ID: <3A709E44.7B69B527@gene.pbi.nrc.ca> Hilmar Lapp wrote: > > full loc -> start..end : minStart..maxEnd > > <50..100> -> 50..100 : -INF..+INF > > (78.90)..(100.107) -> 90..100 : 78..107 > > I think I am much more in favor of returning the outer-most > coordinates as the default policy. David, Mark? In my gut I would also favour outer-most, only because, even with a simple scan of the data, you are able to say "there's something there" or not. However, the phrase "$Feature->start/stop returns the outer-most start/stop positions unless either is undefined in which case that one (or both) return the minimum" gives me the shivers! Still, this is more of a problem for unsophisticated parsers, which presumably will be asking unsophisticated questions - what will be most important for them (I think) is to be given the coordinates which span the maximum "secure" region. So, yes, I agree that outermost is preferable to innermost. > whether INF or NaN are good return values in perl YUCK! Please don't go there... Perhaps returning undef in a call to maxStart or maxEnd would be better... it functions nicely in testing statements. [[ Dave just told me he would prefer to return a Location object in a call to Feature->start that needed to return a fuzzy value, and let the parser choke on the resulting errors :-) Although this is nice OO Perl, I doubt that most existing parsers (or their authors) would be very happy with that solution! ]] -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From mwilkinson@gene.pbi.nrc.ca Thu Jan 25 21:46:53 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Thu, 25 Jan 2001 15:46:53 -0600 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: Message-ID: <3A709ECD.5B0EA636@gene.pbi.nrc.ca> David Block wrote: > $0.02 Cdn is pretty cheap nowadays. Your head will be worth more than that if we go that route... ;-) M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From mdalphin@amgen.com Thu Jan 25 22:06:48 2001 From: mdalphin@amgen.com (Mark Dalphin) Date: Thu, 25 Jan 2001 14:06:48 -0800 Subject: [Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split References: <3A706F7F.6020908@sanger.ac.uk> <3A709834.64E33D50@gnf.org> Message-ID: <3A70A377.C99B79B1@amgen.com> Hilmar Lapp wrote: > > > What about in Fuzzy, do we want to throw exceptions or do we just use > > > the best information we have and do some logic and coordinate > > > gymnastics to try and return a reasonable value or else throw an > > > exception? > > My gut says to return the inner-most coordinate that is known but > > provide API to get the full fuzzy coordinates out - so > > > > full loc -> start..end : minStart..maxEnd > > <50..100> -> 50..100 : -INF..+INF > > (78.90)..(100.107) -> 90..100 : 78..107 > > > > I think I am much more in favor of returning the outer-most > coordinates as the default policy. David, Mark? I'm also not sure > whether INF or NaN are good return values in perl (i.e., can you test > for INF or NaN by numeric comparison? I figured that e.g. you can't > obtain NaN by sqrt(-1), as would be the result in C). > > Hilmar My inclination is also to select the outer-most ranges for the defined regions. I understand the reason for selected the "certainty" of the inner ranges, but most of the biologists here (it seems to me...) would rather have "weak data showing some potential" rather than "more certain data which risks missing something". This is a philosopical issue that involves many end-users. I would end up writing it to take the outer-most to please my customers, but I am not sure that it doesn't just give them more noise to wade through. For the uncertain edges, ie '<' and '>' I am not certain how best to handle them in Perl. There are really several cases here: 1) The most common in GenBank, I believe is where you just don't have more sequence so you end up with: CDS <1..>$Seq_Len Here we are saying that we don't even have sequence to go with. Displaying it is not really a problem, usually. 2) An uglier problem is when a gene-prediction program predicts an intial "exon". This "exon" is really only part of an exon as the program only predicts coding sequence and ignores the 5'-UTR. This might lead to: exon <105..300 CDS join(105..300, 405..1004) Here we have the upstream sequence (5'UTR) and know it extends directly upstream of position 105, but we don't really know where. I don't really know what to do with these. I think the best we can do is indicate it with a flag, similar to '<' or '>', whether we are drawing a picture or trying to extract in "interesting" sequence from a genomic fragment. I don't think returning NAN or INF is correct; we have an uncertainty, but we certainly don't have INF or even NAN. We need to pass on this "uncertainty" to the calling program for it to express to the user. Mark -- Mark Dalphin email: mdalphin@amgen.com Mail Stop: 29-2-A phone: +1-805-447-4951 (work) One Amgen Center Drive +1-805-375-0680 (home) Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work) From jason@chg.mc.duke.edu Thu Jan 25 23:17:27 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Thu, 25 Jan 2001 18:17:27 -0500 (EST) Subject: [Bioperl-l] test suite upgrade Message-ID: A Smarter test suite and some code corrections to make perl 5.00404 happy have been checked in. Please checkout the live version and give it a try. Notes: - This involved copying Test.pm version 1.15 I believe, removing the line that required Test::Harness of a certain version (which won't install on 5.00404) and doing some fun use lib stuff in the BEGIN block of t test. Unfortunately my first hope of just pushing the 't' dir on the @INC stack did not work under 5.00404 - it was not being recognized. I'm not sure if that was not available in earlier versions of perl or what. At ant rate it was solved by our good friend eval... BEGIN { # to handle systems with no installed Test module # we include the t dir (where a copy of Test.pm is located) # as a fallback eval { require Test; }; if( $@ ) { use lib 't'; } use Test; plan tests => 35 } All new test modules should follow this format or they won't be able to use Test.pm on platforms with Test.pm not installed. - The LiveSeq test is still not working, but that's probably because I haven't really dug much to find out why it is failing. - I get strange errors (the ever cryptic 'dubious' message) in 5.00404 when exit is called in the BEGIN block which is necessary for tests where all the necessary modules are not installed on the system. - Things to deal with platform compatibility have not been addressed (alarm still called,index.t). I tried to work on the Index.t problem but didn't get the general solution to work (because we can't depend on File::Spec to be installed,grrr) so I will have to probably rely on the suggested fix by Shailesh. -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Fri Jan 26 00:42:13 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Thu, 25 Jan 2001 16:42:13 -0800 Subject: [Bioperl-l] test suite upgrade References: Message-ID: <3A70C7E5.1DD88031@gnf.org> Jason Stajich wrote: > use lib 't'; Isn't it required to append a slash to the directory name? I thought I read about that, but right now can't verify in the lib POD (there is no notion about a trailing slash requirement or absence requirement). Does anyone know for sure? Could it even be that Perl is smart here and allows both? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney@ebi.ac.uk Fri Jan 26 09:46:28 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 26 Jan 2001 09:46:28 +0000 (GMT) Subject: [Bioperl-l] test suite upgrade In-Reply-To: Message-ID: On Thu, 25 Jan 2001, Jason Stajich wrote: > A Smarter test suite and some code corrections to make perl 5.00404 happy > have been checked in. > > Please checkout the live version and give it a try. > > Notes: > > - This involved copying Test.pm version 1.15 I believe, removing the line > that required Test::Harness of a certain version (which won't install on > 5.00404) and doing some fun use lib stuff in the BEGIN block of t test. > Unfortunately my first hope of just pushing the 't' dir on the @INC > stack did not work under 5.00404 - it was not being recognized. I'm not > sure if that was not available in earlier versions of perl or what. At > ant rate it was solved by our good friend eval... Jason is becoming THE MAN for this release. I'll check this out and report back. Awesome Jason! > > BEGIN { > # to handle systems with no installed Test module > # we include the t dir (where a copy of Test.pm is located) > # as a fallback > eval { require Test; }; > if( $@ ) { > use lib 't'; > } > use Test; > plan tests => 35 } > > All new test modules should follow this format or they won't be able to > use Test.pm on platforms with Test.pm not installed. > > - The LiveSeq test is still not working, but that's probably because I > haven't really dug much to find out why it is failing. > > - I get strange errors (the ever cryptic 'dubious' message) in 5.00404 > when exit is called in the BEGIN block which is necessary for tests > where all the necessary modules are not installed on the system. > I've worked around this one before. I'll see what I can do here... > - Things to deal with platform compatibility have not been addressed > (alarm still called,index.t). I tried to work on the Index.t problem > but didn't get the general solution to work (because we can't depend on > File::Spec to be installed,grrr) so I will have to probably rely on the > suggested fix by Shailesh. > > -Jason > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Fri Jan 26 16:09:15 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Fri, 26 Jan 2001 10:09:15 -0600 (CST) Subject: [Bioperl-l] test suite upgrade In-Reply-To: <3A70C7E5.1DD88031@gnf.org> Message-ID: On Thu, 25 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > use lib 't'; > > Isn't it required to append a slash to the directory name? I thought I > read about that, but right now can't verify in the lib POD (there is > no notion about a trailing slash requirement or absence requirement). > > Does anyone know for sure? Could it even be that Perl is smart here > and allows both? It does, in my experience. > > Hilmar > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From jason@chg.mc.duke.edu Fri Jan 26 23:12:05 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Fri, 26 Jan 2001 18:12:05 -0500 (EST) Subject: [Bioperl-l] more fuzziness checked in Message-ID: more robust fuzzy and split feature handling checked in. FTHelper will try and see if start==end, if it does and there is no splitlocation delimiter then the code will return just a single number representing the location ie variation 500 /allele="C" /allele="T" Bio::Location::Split - added method 'splittype' to capture more directives than just 'join'. ie 'order'. This is then called in FTHelper when constituting a feature table for output. Bio::Location::Fuzzy - renamed methods to fuzzy_string, fuzzy_end, fuzzy_range which will return strings representing the fuzzy points and range type (. or ^). This method validates a string to be sure it is a valid type of fuzzy location or range delimiter and then stores it literally so it can be returned later. I also added methods called _fuzzypointencode and _fuzzyrangeencode which return integers intended to represent the type of fuzzy location, this doesn't put any burden on the parser to interpret the <3..12 means starting on the 5', etc. All of these methods were added to the interface Bio::Location::FuzzyLocationI. So FTHelper just calls fuzzy_start and fuzzy_end to get the end points ( sane numeric is returned if the point is not fuzzy). start/end were overridden by Location::Fuzzy to pass their values to fuzzy_start/end if they were indeed fuzzy and parses to get the basic integer out to store in start. This means length will return something even if it is not technically correct ie for <3..12 length() will return 9. Enjoy... -Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Sun Jan 28 07:26:44 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sat, 27 Jan 2001 23:26:44 -0800 Subject: [Bioperl-l] Bio::PrimarySeq Message-ID: <3A73C9B4.7147B41A@gmx.net> Is there any particular reason that length() and subseq() in Bio::PrimarySeq obtain the sequence string by a direct access of the hash instead of calling seq()? This is potentially dangerous if a derived object overrides seq(). If there's no real reason I'll fix it. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 28 10:04:09 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 28 Jan 2001 10:04:09 +0000 (GMT) Subject: [Bioperl-l] Bio::PrimarySeq In-Reply-To: <3A73C9B4.7147B41A@gmx.net> Message-ID: On Sat, 27 Jan 2001, Hilmar Lapp wrote: > Is there any particular reason that length() and subseq() in > Bio::PrimarySeq obtain the sequence string by a direct access of > the hash instead of calling seq()? This is potentially dangerous > if a derived object overrides seq(). Not that I know of... > > If there's no real reason I'll fix it. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Sun Jan 28 10:23:17 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sun, 28 Jan 2001 02:23:17 -0800 Subject: [Bioperl-l] Empty Seqs Message-ID: <3A73F315.50AA1F74@gmx.net> I added the possibility to create empty sequences to Bio::PrimarySeq (and thereby Bio::Seq), and support for reading and writing empty sequences to fasta format in Bio::SeqIO. Entries with an empty line following the description line as well as those without the additional empty line are supported. Note that if you initialize an explicitely empty sequence you MUST provide the -moltype parameter. The reason is that a sequence must have a moltype, and for an empty sequence it cannot be guessed (which it is otherwise). Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Sun Jan 28 10:24:49 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sun, 28 Jan 2001 02:24:49 -0800 Subject: [Bioperl-l] Bio::PrimarySeq References: Message-ID: <3A73F371.8706C95E@gmx.net> Ewan Birney wrote: > > > Is there any particular reason that length() and subseq() in > > Bio::PrimarySeq obtain the sequence string by a direct access of > > the hash instead of calling seq()? This is potentially dangerous > > if a derived object overrides seq(). > > Not that I know of... > Okay. I fixed it. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 28 14:29:52 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 28 Jan 2001 14:29:52 +0000 (GMT) Subject: [Bioperl-l] all tests but LiveSeq.t pass Message-ID: All tests but LiveSeq.t pass. Jason - I am going to start looking at your sensational Location stuff to give it another pair of eyes over the code. I still need to look at getting RichSeq or something similar in... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Sun Jan 28 20:34:36 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Sun, 28 Jan 2001 12:34:36 -0800 Subject: [Bioperl-l] flexible warning/exception in SeqIO Message-ID: <3A74825C.E024BA97@gmx.net> This is on our tasklist. To reiterate briefly the background, we had a discussion a while ago that there are many applications which would rather lose an entry of a databank file or a feature of an entry than choking due to an exception being thrown. The reason for such exceptions are entries which are either misformatted or contain syntax not yet understood by BioPerl (there will be significantly less though due to the new location model). The conclusion was that we want to have some flexibility on the client side, who can turn such incidents into exceptions if he/she wants to, but the default would be to only warn. I'm not sure but as I understood the changes to RootI every object has the ability to turn warn() into throw() by saying $obj->verbose(2). Is that right, and if so, do people agree that this fulfills the requirements in SeqIO warn/throw flexibility (which implies that the SeqIO code only warn()s). If people agree, this point becomes light green. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Sun Jan 28 21:53:09 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Sun, 28 Jan 2001 21:53:09 +0000 (GMT) Subject: [Bioperl-l] flexible warning/exception in SeqIO In-Reply-To: <3A74825C.E024BA97@gmx.net> Message-ID: On Sun, 28 Jan 2001, Hilmar Lapp wrote: > This is on our tasklist. To reiterate briefly the background, we > had a discussion a while ago that there are many applications > which would rather lose an entry of a databank file or a feature > of an entry than choking due to an exception being thrown. The > reason for such exceptions are entries which are either > misformatted or contain syntax not yet understood by BioPerl > (there will be significantly less though due to the new location > model). > > The conclusion was that we want to have some flexibility on the > client side, who can turn such incidents into exceptions if he/she > wants to, but the default would be to only warn. > > I'm not sure but as I understood the changes to RootI every object > has the ability to turn warn() into throw() by saying > $obj->verbose(2). Is that right, and if so, do people agree that > this fulfills the requirements in SeqIO warn/throw flexibility > (which implies that the SeqIO code only warn()s). I agree. So the SeqIO code should ->warn in recoverable positions and ->throw on utterly non-recoverable positions > > If people agree, this point becomes light green. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From paul-christophe.varoutas@curie.fr Mon Jan 29 00:12:49 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Mon, 29 Jan 2001 01:12:49 +0100 Subject: [Bioperl-l] Vector.pm commit Message-ID: <5.0.2.1.2.20010128232227.00b38e90@pop.wanadoo.fr> cvs checkedout under cygwin (win2000). I noticed that when I make test under perl 5.6.1 / cygwin I have these 2 lines appearing a *lot* of times: Ambiguous call resolved as CORE::shift(), qualify as such or use & at blib/lib/B io/Root/Vector.pm line 948. Ambiguous call resolved as CORE::shift(), qualify as such or use & at blib/lib/B io/Root/Vector.pm line 973. I guess this is because there is a shift() sub in Vector.pm, line 722: #--------- sub shift { #--------- my($self,%param) = @_; $self = $self->first(); $self = $self->remove(%param); } Lines 948 and 973 are of this type: #------------- sub valid_any { #------------- my $self = shift; ... I didn't have these warnings when I make tested with perl 5.004_04 / SunOS 5.6. I replaced $self = shift; by $self = &shift(@_); (thanks Ewan) in both lines. Seems to be OK with perl 5.6.1 / cygwin and perl 5.004_04 / SunOS 5.6. paulc From krbou@pgsgent.be Mon Jan 29 07:55:13 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Mon, 29 Jan 2001 08:55:13 +0100 Subject: [Bioperl-l] all tests but LiveSeq.t pass In-Reply-To: ; from birney@ebi.ac.uk on Sun, Jan 28, 2001 at 02:29:52PM +0000 References: Message-ID: <20010129085513.C6855@gryzo.pgsgent.be> Quoting Ewan Birney (birney@ebi.ac.uk): > > All tests but LiveSeq.t pass. Jason - I am going to start looking at your > sensational Location stuff to give it another pair of eyes over the code. > > > I still need to look at getting RichSeq or something similar in... > Will this make it into 0.7, or should I have a go at cleaning up some of the Swiss-Prot issues I found. Kris, From paul-christophe.varoutas@curie.fr Mon Jan 29 10:28:02 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Mon, 29 Jan 2001 11:28:02 +0100 Subject: [Bioperl-l] RetrictionEnzyme.pm: a proposal Message-ID: <5.0.2.1.2.20010129100143.00b32138@pop.wanadoo.fr> Yesterday I studied RestrictionEnzyme.pm more in depth. I haven't yet added the methods I wanted to, because in my opinion it is far more urgent for this module to get some redesigning. The module somewhat suffers of poor design, and just adding methods to it will just worsen the situation. RestrictionEnzyme has methods which are proper to the restriction enzymes: - seq() is the accessor method to the enzyme's recognition sequence. - cut_seq() "cuts" a Bio::Seq-derived object and generates an array of restriction site fragments. - cuts_seq_at() does the same but this time generates an array of restriction site coordinates. and methods which are proper to the list of enzymes: - is_available() says if a particular enzyme is in the list. - available_list() gives the list of all enzymes or list of n-base cutters. Steve Chervitz already suggested in the module's documentation that is_available() "may be more appropriate for a REData.pm class", and I share his opinion. From a conceptual point of view, the existing RestrictionEnzyme.pm module corresponds to two object classes, not one. Here is an outline of my proposal: Separate RestrictionEnzyme in two classes: RestrictionEnzymeDBase (or whatever more appropriate): - members: the list of restriction enzymes. - methods: - constructor using hardwired list of enzymes OR user file OR URL. - add/remove enzyme to/from list (adding will be the equivalent of _make_custom() ). - member accessor methods: already existing methods: is_available(), available_list(). RestrictionEnzyme: - members: the same as now (_name, _seq, _site, _cuts_after). - methods: - constructor (equivalent to the constructor calling the _make_standard() sub). - already existing accessor methods. - already existing methods: cut_seq, cuts_seq_at, etc. This design, apart from being more "correct", will facilitate any future extensions of the two modules. The drawback in separating RestrictionEnzyme in two classes is that all code using RestrictionEnzyme.pm will have to be modified. Perhaps we should take advantage of the imminent release of the 0.7 version and decide to proceed in the redesigning. If we change the design this will also be the opportunity to slightly change/extend its public interface to add small new functionalities such as being able to add and use asymmetric cutters and enzymes which cut outside the recognition site (perhaps just incorporating small changes now in order to be in time for the 0.7 release and leaving extensions for afterwards, especially if I do this alone based on what we decide). Tell me what you think about it: - First of all, is redesigning possible or are we obliged to maintain compatibility ? In the latter case I will just add functionality, maintaining the poor design of the module. - If redesigning is possible, please make comments/suggestions. Paul-Christophe From jason@chg.mc.duke.edu Mon Jan 29 14:02:33 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 09:02:33 -0500 (EST) Subject: [Bioperl-l] flexible warning/exception in SeqIO In-Reply-To: Message-ID: On Sun, 28 Jan 2001, Ewan Birney wrote: > On Sun, 28 Jan 2001, Hilmar Lapp wrote: > > > This is on our tasklist. To reiterate briefly the background, we > > had a discussion a while ago that there are many applications > > which would rather lose an entry of a databank file or a feature > > of an entry than choking due to an exception being thrown. The > > reason for such exceptions are entries which are either > > misformatted or contain syntax not yet understood by BioPerl > > (there will be significantly less though due to the new location > > model). > > > > The conclusion was that we want to have some flexibility on the > > client side, who can turn such incidents into exceptions if he/she > > wants to, but the default would be to only warn. > > > > I'm not sure but as I understood the changes to RootI every object > > has the ability to turn warn() into throw() by saying > > $obj->verbose(2). Is that right, and if so, do people agree that > > this fulfills the requirements in SeqIO warn/throw flexibility > > (which implies that the SeqIO code only warn()s). > > I agree. So the SeqIO code should ->warn in recoverable positions and > ->throw on utterly non-recoverable positions This is exactly what I think as well. It gives the most flexibility. I think with RichSeq we can handle things like parsing optional qualifiers (bug #160 -- PID) from GenBank format and any other lost features. > > > > > > If people agree, this point becomes light green. Grun ist gut. > > > > Hilmar > > -- > > ----------------------------------------------------------------- > > Hilmar Lapp email: hlapp@gmx.net > > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > > ----------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From hlapp@gmx.net Mon Jan 29 17:24:03 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 09:24:03 -0800 Subject: [Bioperl-l] Genscan exon frame computation Message-ID: <3A75A733.AA342904@gmx.net> A revisit of this is on the task list. I had a discussion a while ago with Mark Dalphin, because he claimed that he managed to figured out the exon frame based on start coordinate and frame value. I still don't fully understand his code sample, as he was also using his own definition of frame. Still, the discussion let me see how one can figure out the frame. I've enclosed the relevant code section of my implementation below. Whoever feels in the position please review and double-check. This will add a frame attribute to each individual exon, which makes it possible to deliberately shuffle exons from one prediction (for those who aren't aware: Genscan with default parameters outputs only exons in the 'optimal path'; there may be other exons which also achieve very good scores and the output of which can be triggered by -subopt). Things still to do in this respect comprise of a rigorous test (take all exons of each prediction, translate them individually in the frame they've been assigned, and check that there are no intervening stops) and an adaptation of cds() in GeneStructure.pm (when concatenating exons, make sure that the frame of one and frame/phase of the previous match, and if not, fill with Ns). If anyone volunteers to add the test to Genpred.t I'd be really glad. This does not involve module design, just plain application coding, and anyone literate in Perl/Bioperl should be able to jump in here. Comments welcome, esp. regarding the cds() comment I made above. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- # Figure out the frame of this exon. This is NOT the frame # given by Genscan, which is the absolute frame of the base # starting the first predicted complete codon. By comparing # to the absolute frame of the first base we can compute the # offset of the first complete codon to the first base of the # exon, which determines the frame of the exon. my $cod_offset; if($predobj->strand() == 1) { $cod_offset = $flds[6] - (($predobj->start()-1) % 3); # Possible values are -2, -1, 0, 1, 2. -1 and -2 correspond # to offsets 2 and 1, resp. Offset 3 is the same as 0. $cod_offset += 3 if($cod_offset < 1); } else { # On the reverse strand the Genscan frame also refers to # the first base of the first complete codon, but viewed # from forward, which is the third base viewed from # reverse. # Note that end() is in fact start() here because we always # annotate in forward direction (otherwise we wouldn't need # strand()). $cod_offset = $flds[6] - (($predobj->end()-3) % 3); # Possible values are -2, -1, 0, 1, 2. Due to the reverse # situation, {2,-1} and {1,-2} correspond to offsets # 1 and 2, resp. Offset 3 is the same as 0. $cod_offset -= 3 if($cod_offset >= 0); $cod_offset = -$cod_offset; } # Offsets 2 and 1 correspond to frame 1 and 2 (frame of exon # is the frame of the first base relative to the exon, or the # number of bases the first codon is missing). $predobj->frame(3 - $cod_offset); From insana@ebi.ac.uk Mon Jan 29 17:48:19 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Mon, 29 Jan 2001 17:48:19 +0000 (GMT) Subject: [Bioperl-l] LiveSeq back working In-Reply-To: Message-ID: LiveSeq is back working now. The BioPerl loader was not working anymore because of the SplitLocation change. It was using the subfeature method. Joseph Insana From mwilkinson@gene.pbi.nrc.ca Mon Jan 29 17:38:24 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Mon, 29 Jan 2001 11:38:24 -0600 Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm Message-ID: <3A75AA90.1C3F92EC@gene.pbi.nrc.ca> Dear Group, I just cvs-updated and noticed that SeqFeature::Generic does not appear to be functional anymore. It is calling on Bio/Location/Simple.pm (line 122), which apparently does not exist. Is it just my installation which is wonky, or is this a genuine bug? any advice appreciated. cheers all! M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From jason@chg.mc.duke.edu Mon Jan 29 18:05:34 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 13:05:34 -0500 (EST) Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm In-Reply-To: <3A75AA90.1C3F92EC@gene.pbi.nrc.ca> Message-ID: you need to do % cvs update -d to get newly created directories. On Mon, 29 Jan 2001, Mark Wilkinson wrote: > Dear Group, > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > to be functional anymore. It is calling on Bio/Location/Simple.pm > (line 122), which apparently does not exist. Is it just my installation > which is wonky, or is this a genuine bug? > > any advice appreciated. > > cheers all! > > M > > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Mon Jan 29 18:05:59 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 13:05:59 -0500 (EST) Subject: [Bioperl-l] LiveSeq back working In-Reply-To: Message-ID: Thanks for fixing this, I wasn't sure where to go to look. On Mon, 29 Jan 2001, Joseph Insana wrote: > LiveSeq is back working now. > The BioPerl loader was not working anymore because of the SplitLocation > change. It was using the subfeature method. > > Joseph Insana > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From birney@ebi.ac.uk Mon Jan 29 18:14:14 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Mon, 29 Jan 2001 18:14:14 +0000 (GMT) Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm In-Reply-To: <3A75AA90.1C3F92EC@gene.pbi.nrc.ca> Message-ID: On Mon, 29 Jan 2001, Mark Wilkinson wrote: > Dear Group, > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > to be functional anymore. It is calling on Bio/Location/Simple.pm > (line 122), which apparently does not exist. Is it just my installation > which is wonky, or is this a genuine bug? cvs update -d > > any advice appreciated. > > cheers all! > > M > > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From darrochi@dcs.gla.ac.uk Mon Jan 29 18:40:07 2001 From: darrochi@dcs.gla.ac.uk (Iain Darroch) Date: Mon, 29 Jan 2001 18:40:07 +0000 (GMT) Subject: [Bioperl-l] Bio Framework and XML Message-ID: Hi All, I am currently looking at ways of integrating biological systems. I saw mentioned in some of the documentation that a Bio-Object Framework was proposed. Also that XML could be used in meta data for describing bioinformatics objects. I was wondering what the current situation of both these were. Has anyone implemented parsers yet? Thanks in advance Iain From jason@chg.mc.duke.edu Mon Jan 29 19:33:59 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 14:33:59 -0500 (EST) Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature Message-ID: What is the feeling here, we have this old way of doing things which included using the value 'EXPAND' to determine if we should expand the start/end space for a feature when adding a sub_SeqFeature to a feature? I think this should likely be better modeled through a SplitLocationI which is just a container of LocationObjects. So I propose to remove all references to 'EXPAND' which means removing the method _expand_region and updating add_sub_Feature to deal with adding the locations. Similarly the flush_sub_SeqFeature should flush the locations, but I'm not sure about what the start/end should be reset to... I also had to update FeaturePair to add the method location() which delegates to feature1()->location() otherwise things won't work correctly. start/end are defined by feature1 object so location should also reside in feature1. Jason Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Mon Jan 29 21:09:40 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Mon, 29 Jan 2001 13:09:40 -0800 Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature References: Message-ID: <3A75DC14.529E0072@gnf.org> Jason Stajich wrote: > > What is the feeling here, we have this old way of doing things which > included using the value 'EXPAND' to determine if we should expand the > start/end space for a feature when adding a sub_SeqFeature to a feature? > > I think this should likely be better modeled through a SplitLocationI > which is just a container of LocationObjects. So I propose to remove all > references to 'EXPAND' which means removing the method _expand_region and > updating add_sub_Feature to deal with adding the locations. Similarly Can't we keep a separate method for coping with region extension due to a new subfeature, in whatever way the extension is done? As far as I can remember I had a good reason to put it into its own method, I needed it separately from add_sub_SeqFeature(). Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jason@chg.mc.duke.edu Mon Jan 29 21:26:21 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Mon, 29 Jan 2001 16:26:21 -0500 (EST) Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature In-Reply-To: <3A75DC14.529E0072@gnf.org> Message-ID: On Mon, 29 Jan 2001, Hilmar Lapp wrote: > Jason Stajich wrote: > > > > What is the feeling here, we have this old way of doing things which > > included using the value 'EXPAND' to determine if we should expand the > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > I think this should likely be better modeled through a SplitLocationI > > which is just a container of LocationObjects. So I propose to remove all > > references to 'EXPAND' which means removing the method _expand_region and > > updating add_sub_Feature to deal with adding the locations. Similarly > > Can't we keep a separate method for coping with region extension due > to a new subfeature, in whatever way the extension is done? As far as > I can remember I had a good reason to put it into its own method, I > needed it separately from add_sub_SeqFeature(). I guess it is more sane to let SeqFeature::Generic handle the common case and the split location case will need to be handled elsewhere. In the special case of a feature with multiple locations that feature (or object creating it) will take care of updating the location object to point to a splitlocation object. For example, if we choose to have CDS be represented as a SplitLocation with the exons being the parts in the join(...) statement. This will have to be negotiated by the object creating the Gene/CDS object. Okay so no changes to check in for Generic. > > Hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp@gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From lapp@gnf.org Mon Jan 29 21:57:13 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Mon, 29 Jan 2001 13:57:13 -0800 Subject: [Bioperl-l] RetrictionEnzyme.pm: a proposal References: <5.0.2.1.2.20010129100143.00b32138@pop.wanadoo.fr> Message-ID: <3A75E739.3E0EC94E@gnf.org> Paul-Christophe Varoutas wrote: > > Tell me what you think about it: > - First of all, is redesigning possible or are we obliged to maintain > compatibility ? In the latter case I will just add functionality, > maintaining the poor design of the module. > - If redesigning is possible, please make comments/suggestions. > First of all, keeping compatibility is a very good thing. Every user of your software will appreciate it if he/she knows that this is taken seriously. In general, my opinion is if there's no strong reason to break compatibility, then don't break it. On the other hand, if there is a good reason, then don't hesitate. This means, yes, redesigning is possible, but a nicer design by itself is not a good reason to break compatibility. If the existing design is sort of prohibitive for adding certain new functionality, this might justify breaking compatibility. An example is the new location model, but in fact Jason could manage to keep compatibility. I suggest that you carefully examine whether you indeed can't redesign and at the same time keep compatibility. Based on your proposal I don't see the prohibitive point yet. As for the release, this issue is not on the task list, which means that you are on your own. There's a deadline next week, and we don't want to lose focus. If you finish the code and submit an accompanying rigorous test in t/* on time, it can make it into the release though, provided that there are no objections should you introduce incompatibilities. As a last remark, a design that isn't prepared very well for an extension one has in mind is not necessarily poor. It may just have been perfect for its original scope. And: I really think that there is no such thing as a "correct" design. Design may be bad or may be good, generic or tailored, or whatever, it just depends on your viewpoint, that is, on the particular problem you want to solve. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 07:02:45 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 23:02:45 -0800 Subject: [Bioperl-l] missing use statements References: <5.0.2.1.2.20010129204942.00b32638@mailhost.curie.fr> Message-ID: <3A766715.45FA0EEC@gmx.net> Paul-Christophe Varoutas wrote: > > so I just added one line at the beginning of the module to load Bio::Seq: > > use Bio::Seq; > Thanks for pointing this out. The reason this became necessary all of a sudden was probably that I removed the respective lines from SeqIO.pm, because there was no obvious reason to keep them. Since I still think that the 'use' statements are better in those files where the modules are really used, I left it that way and added the necessary use statements to all other SeqIO modules (which probably would all have complained sooner or later). > and edited the @ISA array initialization line: > > @ISA = qw(Bio::SeqIO Bio::Seq); > We don't want SeqIO modules to inherit from Bio::Seq. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 07:04:21 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 23:04:21 -0800 Subject: [Bioperl-l] Root::Object in bioxml.pm Message-ID: <3A766775.F21CDB21@gmx.net> SeqIO::bioxml.pm still inherits from Root::Object. Is there a particular reason that this one's an exception? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 07:10:26 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Mon, 29 Jan 2001 23:10:26 -0800 Subject: [Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/888 References: <200101291951.f0TJptp29320@pw600a.bioperl.org> Message-ID: <3A7668E2.5E40F6A5@gmx.net> bioperl-bugs@bioperl.org wrote: > > Generic Features created from a GFF string do not > record Frame information, and when dumping the feature > out as GFF it is invariably reported as frame = 0. > > The problem is multi-fold: > > (1) the _from_gff_string and _from_gff2_string > subroutines in Generic.pm do not contain any code to handle the > recording of Frame information in the feature object > > (2) GFF allows a "." as the frame (meaning info not available), > while $Feature only allows 0,1, or 2. Thus it isn't clear how a > GFF frame of "." should be recorded. My first thought was that a > value of undef might return "." in a call to SeqFeatureI::gff_string, > however... > > (3) ...it appears that even if there is no frame information > available in a Feature object, it nevertheless passes the > $Feature->can('frame') test in SeqFeatureI::gff_string > and returns a (default??) value of 0 for the $Feature->frame call > (though there *is* code there to assign the frame to > "." if it fails the ->can test...) > > I am willing to fix this problem myself, but I would appreciate having > a consensus from the group about which level of the problem needs to be > fixed to keep everyone else's code happy. > I think that frame information should be consistent between GFF representation and object representation. '.' is equivalent to undef, and otherwise the frame should be 0, 1, or 2, regardless of object or GFF string. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Tue Jan 30 09:14:42 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 09:14:42 +0000 (GMT) Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature In-Reply-To: Message-ID: On Mon, 29 Jan 2001, Jason Stajich wrote: > What is the feeling here, we have this old way of doing things which > included using the value 'EXPAND' to determine if we should expand the > start/end space for a feature when adding a sub_SeqFeature to a feature? > > I think this should likely be better modeled through a SplitLocationI > which is just a container of LocationObjects. So I propose to remove all > references to 'EXPAND' which means removing the method _expand_region and > updating add_sub_Feature to deal with adding the locations. Similarly the > flush_sub_SeqFeature should flush the locations, but I'm not sure about > what the start/end should be reset to... I guess agree (I am wincing at every one of these decisions you know. It just pains me to see us have to handle this object complexity in essentially simple objects. Bugger-it! I know there is no way out here, but .... it goes against the grain). > > I also had to update FeaturePair to add the method location() which > delegates to feature1()->location() otherwise things won't work correctly. > start/end are defined by feature1 object so location should also reside > in feature1. > That is the consistent route here... > Jason > > Jason Stajich > jason@chg.mc.duke.edu > Center for Human Genetics > Duke University Medical Center > http://www.chg.duke.edu/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Tue Jan 30 09:41:50 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 09:41:50 +0000 (GMT) Subject: [Bioperl-l] Root::Object in bioxml.pm In-Reply-To: <3A766775.F21CDB21@gmx.net> Message-ID: On Mon, 29 Jan 2001, Hilmar Lapp wrote: > SeqIO::bioxml.pm still inherits from Root::Object. Is there a > particular reason that this one's an exception? > I think this is a dead object? Brad.....??? > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 30 13:48:16 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 30 Jan 2001 08:48:16 -0500 (EST) Subject: [Bioperl-l] Re: Root::Object in bioxml.pm In-Reply-To: <3A766775.F21CDB21@gmx.net> Message-ID: I skipped it because I thought it was to be remove for the release, Brad Marshall would know. On Mon, 29 Jan 2001, Hilmar Lapp wrote: > SeqIO::bioxml.pm still inherits from Root::Object. Is there a > particular reason that this one's an exception? > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From ydzhang@iastate.edu Tue Jan 30 14:44:25 2001 From: ydzhang@iastate.edu (Yuandan Zhang) Date: Tue, 30 Jan 2001 08:44:25 -0600 Subject: [Bioperl-l] Re: Bioperl-l digest, Vol 1 #200 - 15 msgs In-Reply-To: <200101300917.f0U9HLp20264@pw600a.bioperl.org> Message-ID: <4.2.0.58.20010130084139.00ad6560@ydzhang.mail.iastate.edu> Hi, I am new to bioperl, very patinate in it. Is there any tutorial materials available or any collection of example scripts for beginners to make a start? Yuandan At 04:17 AM 1/30/01 -0500, you wrote: >Send Bioperl-l mailing list submissions to > bioperl-l@bioperl.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://bioperl.org/mailman/listinfo/bioperl-l >or, via email, send a message with subject or body 'help' to > bioperl-l-request@bioperl.org > >You can reach the person managing the list at > bioperl-l-admin@bioperl.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Bioperl-l digest..." > > >Today's Topics: > > 1. Genscan exon frame computation (Hilmar Lapp) > 2. LiveSeq back working (Joseph Insana) > 3. SeqFeature::Generic broken? no Location::Simple.pm (Mark Wilkinson) > 4. Re: SeqFeature::Generic broken? no Location::Simple.pm (Jason Stajich) > 5. Re: LiveSeq back working (Jason Stajich) > 6. Re: SeqFeature::Generic broken? no Location::Simple.pm (Ewan Birney) > 7. Bio Framework and XML (Iain Darroch) > 8. Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > 9. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Hilmar Lapp) > 10. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > 11. Re: RetrictionEnzyme.pm: a proposal (Hilmar Lapp) > 12. Re: missing use statements (Hilmar Lapp) > 13. Root::Object in bioxml.pm (Hilmar Lapp) > 14. Re: [Bioperl-guts-l] Notification: incoming/888 (Hilmar Lapp) > 15. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Ewan Birney) > >--__--__-- > >Message: 1 >Date: Mon, 29 Jan 2001 09:24:03 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Bioperl >Subject: [Bioperl-l] Genscan exon frame computation > >A revisit of this is on the task list. I had a discussion a while >ago with Mark Dalphin, because he claimed that he managed to >figured out the exon frame based on start coordinate and frame >value. > >I still don't fully understand his code sample, as he was also >using his own definition of frame. Still, the discussion let me >see how one can figure out the frame. I've enclosed the relevant >code section of my implementation below. Whoever feels in the >position please review and double-check. > >This will add a frame attribute to each individual exon, which >makes it possible to deliberately shuffle exons from one >prediction (for those who aren't aware: Genscan with default >parameters outputs only exons in the 'optimal path'; there may be >other exons which also achieve very good scores and the output of >which can be triggered by -subopt). > >Things still to do in this respect comprise of a rigorous test >(take all exons of each prediction, translate them individually in >the frame they've been assigned, and check that there are no >intervening stops) and an adaptation of cds() in GeneStructure.pm >(when concatenating exons, make sure that the frame of one and >frame/phase of the previous match, and if not, fill with Ns). > >If anyone volunteers to add the test to Genpred.t I'd be really >glad. This does not involve module design, just plain application >coding, and anyone literate in Perl/Bioperl should be able to jump >in here. > >Comments welcome, esp. regarding the cds() comment I made above. > > Hilmar >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > ># Figure out the frame of this exon. This is NOT the frame ># given by Genscan, which is the absolute frame of the base ># starting the first predicted complete codon. By comparing ># to the absolute frame of the first base we can compute the ># offset of the first complete codon to the first base of the ># exon, which determines the frame of the exon. >my $cod_offset; >if($predobj->strand() == 1) { > $cod_offset = $flds[6] - (($predobj->start()-1) % 3); > # Possible values are -2, -1, 0, 1, 2. -1 and -2 correspond > # to offsets 2 and 1, resp. Offset 3 is the same as 0. > $cod_offset += 3 if($cod_offset < 1); >} else { > # On the reverse strand the Genscan frame also refers to > # the first base of the first complete codon, but viewed > # from forward, which is the third base viewed from > # reverse. > # Note that end() is in fact start() here because we always > # annotate in forward direction (otherwise we wouldn't need > # strand()). > $cod_offset = $flds[6] - (($predobj->end()-3) % 3); > # Possible values are -2, -1, 0, 1, 2. Due to the reverse > # situation, {2,-1} and {1,-2} correspond to offsets > # 1 and 2, resp. Offset 3 is the same as 0. > $cod_offset -= 3 if($cod_offset >= 0); > $cod_offset = -$cod_offset; >} ># Offsets 2 and 1 correspond to frame 1 and 2 (frame of exon ># is the frame of the first base relative to the exon, or the ># number of bases the first codon is missing). >$predobj->frame(3 - $cod_offset); > >--__--__-- > >Message: 2 >Date: Mon, 29 Jan 2001 17:48:19 +0000 (GMT) >From: Joseph Insana >Reply-To: insana@ebi.ac.uk >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] LiveSeq back working > >LiveSeq is back working now. >The BioPerl loader was not working anymore because of the SplitLocation >change. It was using the subfeature method. > >Joseph Insana > > >--__--__-- > >Message: 3 >Date: Mon, 29 Jan 2001 11:38:24 -0600 >From: Mark Wilkinson >Organization: PBI-NRC >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > >Dear Group, > >I just cvs-updated and noticed that SeqFeature::Generic does not appear >to be functional anymore. It is calling on Bio/Location/Simple.pm >(line 122), which apparently does not exist. Is it just my installation >which is wonky, or is this a genuine bug? > >any advice appreciated. > >cheers all! > >M > > >-- >--- >Dr. Mark Wilkinson >Bioinformatics Group >National Research Council of Canada >Plant Biotechnology Institute >110 Gymnasium Place >Saskatoon, SK >Canada > > > > >--__--__-- > >Message: 4 >Date: Mon, 29 Jan 2001 13:05:34 -0500 (EST) >From: Jason Stajich >To: Mark Wilkinson >cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > >you need to do >% cvs update -d >to get newly created directories. > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > Dear Group, > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > (line 122), which apparently does not exist. Is it just my installation > > which is wonky, or is this a genuine bug? > > > > any advice appreciated. > > > > cheers all! > > > > M > > > > > > -- > > --- > > Dr. Mark Wilkinson > > Bioinformatics Group > > National Research Council of Canada > > Plant Biotechnology Institute > > 110 Gymnasium Place > > Saskatoon, SK > > Canada > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > >--__--__-- > >Message: 5 >Date: Mon, 29 Jan 2001 13:05:59 -0500 (EST) >From: Jason Stajich >To: Joseph Insana >cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] LiveSeq back working > >Thanks for fixing this, I wasn't sure where to go to look. > >On Mon, 29 Jan 2001, Joseph Insana wrote: > > > LiveSeq is back working now. > > The BioPerl loader was not working anymore because of the SplitLocation > > change. It was using the subfeature method. > > > > Joseph Insana > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > >--__--__-- > >Message: 6 >Date: Mon, 29 Jan 2001 18:14:14 +0000 (GMT) >From: Ewan Birney >To: Mark Wilkinson >cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > Dear Group, > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > (line 122), which apparently does not exist. Is it just my installation > > which is wonky, or is this a genuine bug? > > >cvs update -d > > > > > > any advice appreciated. > > > > cheers all! > > > > M > > > > > > -- > > --- > > Dr. Mark Wilkinson > > Bioinformatics Group > > National Research Council of Canada > > Plant Biotechnology Institute > > 110 Gymnasium Place > > Saskatoon, SK > > Canada > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >----------------------------------------------------------------- >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 >. >----------------------------------------------------------------- > > >--__--__-- > >Message: 7 >Date: Mon, 29 Jan 2001 18:40:07 +0000 (GMT) >From: Iain Darroch >To: bioperl-l@bioperl.org >Subject: [Bioperl-l] Bio Framework and XML > >Hi All, > >I am currently looking at ways of integrating biological systems. I saw >mentioned in some of the documentation that a Bio-Object Framework was >proposed. Also that XML could be used in meta data for describing >bioinformatics objects. > >I was wondering what the current situation of both these were. > >Has anyone implemented parsers yet? > >Thanks in advance > >Iain > > > > >--__--__-- > >Message: 8 >Date: Mon, 29 Jan 2001 14:33:59 -0500 (EST) >From: Jason Stajich >To: Bioperl >Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >What is the feeling here, we have this old way of doing things which >included using the value 'EXPAND' to determine if we should expand the >start/end space for a feature when adding a sub_SeqFeature to a feature? > >I think this should likely be better modeled through a SplitLocationI >which is just a container of LocationObjects. So I propose to remove all >references to 'EXPAND' which means removing the method _expand_region and >updating add_sub_Feature to deal with adding the locations. Similarly the >flush_sub_SeqFeature should flush the locations, but I'm not sure about >what the start/end should be reset to... > >I also had to update FeaturePair to add the method location() which >delegates to feature1()->location() otherwise things won't work correctly. >start/end are defined by feature1 object so location should also reside >in feature1. > >Jason > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > > >--__--__-- > >Message: 9 >Date: Mon, 29 Jan 2001 13:09:40 -0800 >From: Hilmar Lapp >Organization: GNF >To: Jason Stajich >Cc: Bioperl >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >Jason Stajich wrote: > > > > What is the feeling here, we have this old way of doing things which > > included using the value 'EXPAND' to determine if we should expand the > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > I think this should likely be better modeled through a SplitLocationI > > which is just a container of LocationObjects. So I propose to remove all > > references to 'EXPAND' which means removing the method _expand_region and > > updating add_sub_Feature to deal with adding the locations. Similarly > >Can't we keep a separate method for coping with region extension due >to a new subfeature, in whatever way the extension is done? As far as >I can remember I had a good reason to put it into its own method, I >needed it separately from add_sub_SeqFeature(). > > Hilmar >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp@gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > >--__--__-- > >Message: 10 >Date: Mon, 29 Jan 2001 16:26:21 -0500 (EST) >From: Jason Stajich >To: Hilmar Lapp >cc: Bioperl >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >On Mon, 29 Jan 2001, Hilmar Lapp wrote: > > > Jason Stajich wrote: > > > > > > What is the feeling here, we have this old way of doing things which > > > included using the value 'EXPAND' to determine if we should expand the > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > I think this should likely be better modeled through a SplitLocationI > > > which is just a container of LocationObjects. So I propose to remove all > > > references to 'EXPAND' which means removing the method _expand_region and > > > updating add_sub_Feature to deal with adding the locations. Similarly > > > > Can't we keep a separate method for coping with region extension due > > to a new subfeature, in whatever way the extension is done? As far as > > I can remember I had a good reason to put it into its own method, I > > needed it separately from add_sub_SeqFeature(). > >I guess it is more sane to let SeqFeature::Generic handle the common case >and the split location case will need to be handled elsewhere. > >In the special case of a feature with multiple locations that feature (or >object creating it) will take care of updating the location object to >point to a splitlocation object. For example, if we choose to have CDS be >represented as a SplitLocation with the exons being the parts in the >join(...) statement. This will have to be negotiated by the object >creating the Gene/CDS object. > >Okay so no changes to check in for Generic. > > > > > Hilmar > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp@gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >Jason Stajich >jason@chg.mc.duke.edu >Center for Human Genetics >Duke University Medical Center >http://www.chg.duke.edu/ > > > >--__--__-- > >Message: 11 >Date: Mon, 29 Jan 2001 13:57:13 -0800 >From: Hilmar Lapp >Organization: GNF >To: Paul-Christophe Varoutas >Cc: bioperl-l@bioperl.org >Subject: Re: [Bioperl-l] RetrictionEnzyme.pm: a proposal > >Paul-Christophe Varoutas wrote: > > > > Tell me what you think about it: > > - First of all, is redesigning possible or are we obliged to maintain > > compatibility ? In the latter case I will just add functionality, > > maintaining the poor design of the module. > > - If redesigning is possible, please make comments/suggestions. > > > >First of all, keeping compatibility is a very good thing. Every user >of your software will appreciate it if he/she knows that this is taken >seriously. > >In general, my opinion is if there's no strong reason to break >compatibility, then don't break it. On the other hand, if there is a >good reason, then don't hesitate. > >This means, yes, redesigning is possible, but a nicer design by itself >is not a good reason to break compatibility. If the existing design is >sort of prohibitive for adding certain new functionality, this might >justify breaking compatibility. An example is the new location model, >but in fact Jason could manage to keep compatibility. I suggest that >you carefully examine whether you indeed can't redesign and at the >same time keep compatibility. Based on your proposal I don't see the >prohibitive point yet. > >As for the release, this issue is not on the task list, which means >that you are on your own. There's a deadline next week, and we don't >want to lose focus. If you finish the code and submit an accompanying >rigorous test in t/* on time, it can make it into the release though, >provided that there are no objections should you introduce >incompatibilities. > >As a last remark, a design that isn't prepared very well for an >extension one has in mind is not necessarily poor. It may just have >been perfect for its original scope. And: I really think that there is >no such thing as a "correct" design. Design may be bad or may be good, >generic or tailored, or whatever, it just depends on your viewpoint, >that is, on the particular problem you want to solve. > > Hilmar >-- >------------------------------------------------------------- >Hilmar Lapp email: lapp@gnf.org >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >------------------------------------------------------------- > >--__--__-- > >Message: 12 >Date: Mon, 29 Jan 2001 23:02:45 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Bioperl >Subject: Re: [Bioperl-l] missing use statements > >Paul-Christophe Varoutas wrote: > > > > so I just added one line at the beginning of the module to load Bio::Seq: > > > > use Bio::Seq; > > > >Thanks for pointing this out. The reason this became necessary all >of a sudden was probably that I removed the respective lines from >SeqIO.pm, because there was no obvious reason to keep them. Since >I still think that the 'use' statements are better in those files >where the modules are really used, I left it that way and added >the necessary use statements to all other SeqIO modules (which >probably would all have complained sooner or later). > > > and edited the @ISA array initialization line: > > > > @ISA = qw(Bio::SeqIO Bio::Seq); > > > >We don't want SeqIO modules to inherit from Bio::Seq. > > Hilmar >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > >--__--__-- > >Message: 13 >Date: Mon, 29 Jan 2001 23:04:21 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Bioperl >CC: Jason Stajich >Subject: [Bioperl-l] Root::Object in bioxml.pm > >SeqIO::bioxml.pm still inherits from Root::Object. Is there a >particular reason that this one's an exception? > > Hilmar >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > >--__--__-- > >Message: 14 >Date: Mon, 29 Jan 2001 23:10:26 -0800 >From: Hilmar Lapp >Organization: Nereis 4 >To: Mark Wilkinson >CC: bioperl-l@bioperl.org >Subject: [Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/888 > >bioperl-bugs@bioperl.org wrote: > > > > Generic Features created from a GFF string do not > > record Frame information, and when dumping the feature > > out as GFF it is invariably reported as frame = 0. > > > > The problem is multi-fold: > > > > (1) the _from_gff_string and _from_gff2_string > > subroutines in Generic.pm do not contain any code to handle the > > recording of Frame information in the feature object > > > > (2) GFF allows a "." as the frame (meaning info not available), > > while $Feature only allows 0,1, or 2. Thus it isn't clear how a > > GFF frame of "." should be recorded. My first thought was that a > > value of undef might return "." in a call to SeqFeatureI::gff_string, > > however... > > > > (3) ...it appears that even if there is no frame information > > available in a Feature object, it nevertheless passes the > > $Feature->can('frame') test in SeqFeatureI::gff_string > > and returns a (default??) value of 0 for the $Feature->frame call > > (though there *is* code there to assign the frame to > > "." if it fails the ->can test...) > > > > I am willing to fix this problem myself, but I would appreciate having > > a consensus from the group about which level of the problem needs to be > > fixed to keep everyone else's code happy. > > > >I think that frame information should be consistent between GFF >representation and object representation. '.' is equivalent to >undef, and otherwise the frame should be 0, 1, or 2, regardless of >object or GFF string. > > Hilmar > >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- > >--__--__-- > >Message: 15 >Date: Tue, 30 Jan 2001 09:14:42 +0000 (GMT) >From: Ewan Birney >To: Jason Stajich >cc: Bioperl >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > >On Mon, 29 Jan 2001, Jason Stajich wrote: > > > What is the feeling here, we have this old way of doing things which > > included using the value 'EXPAND' to determine if we should expand the > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > I think this should likely be better modeled through a SplitLocationI > > which is just a container of LocationObjects. So I propose to remove all > > references to 'EXPAND' which means removing the method _expand_region and > > updating add_sub_Feature to deal with adding the locations. Similarly the > > flush_sub_SeqFeature should flush the locations, but I'm not sure about > > what the start/end should be reset to... > >I guess agree (I am wincing at every one of these decisions you know. It >just pains me to see us have to handle this object complexity in >essentially simple objects. Bugger-it! I know there is no way out here, >but .... it goes against the grain). > > > > > I also had to update FeaturePair to add the method location() which > > delegates to feature1()->location() otherwise things won't work correctly. > > start/end are defined by feature1 object so location should also reside > > in feature1. > > > >That is the consistent route here... > > > > Jason > > > > Jason Stajich > > jason@chg.mc.duke.edu > > Center for Human Genetics > > Duke University Medical Center > > http://www.chg.duke.edu/ > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > >----------------------------------------------------------------- >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 >. >----------------------------------------------------------------- > > > >--__--__-- > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@bioperl.org >http://bioperl.org/mailman/listinfo/bioperl-l > > >End of Bioperl-l Digest -- Yuandan Zhang From birney@ebi.ac.uk Tue Jan 30 15:08:35 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 15:08:35 +0000 (GMT) Subject: [Bioperl-l] Re: Bioperl-l digest, Vol 1 #200 - 15 msgs In-Reply-To: <4.2.0.58.20010130084139.00ad6560@ydzhang.mail.iastate.edu> Message-ID: On Tue, 30 Jan 2001, Yuandan Zhang wrote: > Hi, > I am new to bioperl, very patinate in it. Is there any tutorial materials > available or any collection of example scripts for beginners to make a start? A tutorial will be availble in the 0.7 release (due to be branched soon). > > Yuandan > > At 04:17 AM 1/30/01 -0500, you wrote: > >Send Bioperl-l mailing list submissions to > > bioperl-l@bioperl.org > > > >To subscribe or unsubscribe via the World Wide Web, visit > > http://bioperl.org/mailman/listinfo/bioperl-l > >or, via email, send a message with subject or body 'help' to > > bioperl-l-request@bioperl.org > > > >You can reach the person managing the list at > > bioperl-l-admin@bioperl.org > > > >When replying, please edit your Subject line so it is more specific > >than "Re: Contents of Bioperl-l digest..." > > > > > >Today's Topics: > > > > 1. Genscan exon frame computation (Hilmar Lapp) > > 2. LiveSeq back working (Joseph Insana) > > 3. SeqFeature::Generic broken? no Location::Simple.pm (Mark Wilkinson) > > 4. Re: SeqFeature::Generic broken? no Location::Simple.pm (Jason Stajich) > > 5. Re: LiveSeq back working (Jason Stajich) > > 6. Re: SeqFeature::Generic broken? no Location::Simple.pm (Ewan Birney) > > 7. Bio Framework and XML (Iain Darroch) > > 8. Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > > 9. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Hilmar Lapp) > > 10. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Jason Stajich) > > 11. Re: RetrictionEnzyme.pm: a proposal (Hilmar Lapp) > > 12. Re: missing use statements (Hilmar Lapp) > > 13. Root::Object in bioxml.pm (Hilmar Lapp) > > 14. Re: [Bioperl-guts-l] Notification: incoming/888 (Hilmar Lapp) > > 15. Re: Bio::SeqFeature::Generic add_sub_SeqFeature (Ewan Birney) > > > >--__--__-- > > > >Message: 1 > >Date: Mon, 29 Jan 2001 09:24:03 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Bioperl > >Subject: [Bioperl-l] Genscan exon frame computation > > > >A revisit of this is on the task list. I had a discussion a while > >ago with Mark Dalphin, because he claimed that he managed to > >figured out the exon frame based on start coordinate and frame > >value. > > > >I still don't fully understand his code sample, as he was also > >using his own definition of frame. Still, the discussion let me > >see how one can figure out the frame. I've enclosed the relevant > >code section of my implementation below. Whoever feels in the > >position please review and double-check. > > > >This will add a frame attribute to each individual exon, which > >makes it possible to deliberately shuffle exons from one > >prediction (for those who aren't aware: Genscan with default > >parameters outputs only exons in the 'optimal path'; there may be > >other exons which also achieve very good scores and the output of > >which can be triggered by -subopt). > > > >Things still to do in this respect comprise of a rigorous test > >(take all exons of each prediction, translate them individually in > >the frame they've been assigned, and check that there are no > >intervening stops) and an adaptation of cds() in GeneStructure.pm > >(when concatenating exons, make sure that the frame of one and > >frame/phase of the previous match, and if not, fill with Ns). > > > >If anyone volunteers to add the test to Genpred.t I'd be really > >glad. This does not involve module design, just plain application > >coding, and anyone literate in Perl/Bioperl should be able to jump > >in here. > > > >Comments welcome, esp. regarding the cds() comment I made above. > > > > Hilmar > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > ># Figure out the frame of this exon. This is NOT the frame > ># given by Genscan, which is the absolute frame of the base > ># starting the first predicted complete codon. By comparing > ># to the absolute frame of the first base we can compute the > ># offset of the first complete codon to the first base of the > ># exon, which determines the frame of the exon. > >my $cod_offset; > >if($predobj->strand() == 1) { > > $cod_offset = $flds[6] - (($predobj->start()-1) % 3); > > # Possible values are -2, -1, 0, 1, 2. -1 and -2 correspond > > # to offsets 2 and 1, resp. Offset 3 is the same as 0. > > $cod_offset += 3 if($cod_offset < 1); > >} else { > > # On the reverse strand the Genscan frame also refers to > > # the first base of the first complete codon, but viewed > > # from forward, which is the third base viewed from > > # reverse. > > # Note that end() is in fact start() here because we always > > # annotate in forward direction (otherwise we wouldn't need > > # strand()). > > $cod_offset = $flds[6] - (($predobj->end()-3) % 3); > > # Possible values are -2, -1, 0, 1, 2. Due to the reverse > > # situation, {2,-1} and {1,-2} correspond to offsets > > # 1 and 2, resp. Offset 3 is the same as 0. > > $cod_offset -= 3 if($cod_offset >= 0); > > $cod_offset = -$cod_offset; > >} > ># Offsets 2 and 1 correspond to frame 1 and 2 (frame of exon > ># is the frame of the first base relative to the exon, or the > ># number of bases the first codon is missing). > >$predobj->frame(3 - $cod_offset); > > > >--__--__-- > > > >Message: 2 > >Date: Mon, 29 Jan 2001 17:48:19 +0000 (GMT) > >From: Joseph Insana > >Reply-To: insana@ebi.ac.uk > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] LiveSeq back working > > > >LiveSeq is back working now. > >The BioPerl loader was not working anymore because of the SplitLocation > >change. It was using the subfeature method. > > > >Joseph Insana > > > > > >--__--__-- > > > >Message: 3 > >Date: Mon, 29 Jan 2001 11:38:24 -0600 > >From: Mark Wilkinson > >Organization: PBI-NRC > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > > > >Dear Group, > > > >I just cvs-updated and noticed that SeqFeature::Generic does not appear > >to be functional anymore. It is calling on Bio/Location/Simple.pm > >(line 122), which apparently does not exist. Is it just my installation > >which is wonky, or is this a genuine bug? > > > >any advice appreciated. > > > >cheers all! > > > >M > > > > > >-- > >--- > >Dr. Mark Wilkinson > >Bioinformatics Group > >National Research Council of Canada > >Plant Biotechnology Institute > >110 Gymnasium Place > >Saskatoon, SK > >Canada > > > > > > > > > >--__--__-- > > > >Message: 4 > >Date: Mon, 29 Jan 2001 13:05:34 -0500 (EST) > >From: Jason Stajich > >To: Mark Wilkinson > >cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > > > >you need to do > >% cvs update -d > >to get newly created directories. > > > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > > > Dear Group, > > > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > > (line 122), which apparently does not exist. Is it just my installation > > > which is wonky, or is this a genuine bug? > > > > > > any advice appreciated. > > > > > > cheers all! > > > > > > M > > > > > > > > > -- > > > --- > > > Dr. Mark Wilkinson > > > Bioinformatics Group > > > National Research Council of Canada > > > Plant Biotechnology Institute > > > 110 Gymnasium Place > > > Saskatoon, SK > > > Canada > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > >--__--__-- > > > >Message: 5 > >Date: Mon, 29 Jan 2001 13:05:59 -0500 (EST) > >From: Jason Stajich > >To: Joseph Insana > >cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] LiveSeq back working > > > >Thanks for fixing this, I wasn't sure where to go to look. > > > >On Mon, 29 Jan 2001, Joseph Insana wrote: > > > > > LiveSeq is back working now. > > > The BioPerl loader was not working anymore because of the SplitLocation > > > change. It was using the subfeature method. > > > > > > Joseph Insana > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > >--__--__-- > > > >Message: 6 > >Date: Mon, 29 Jan 2001 18:14:14 +0000 (GMT) > >From: Ewan Birney > >To: Mark Wilkinson > >cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm > > > >On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > > > Dear Group, > > > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > > (line 122), which apparently does not exist. Is it just my installation > > > which is wonky, or is this a genuine bug? > > > > > >cvs update -d > > > > > > > > > > any advice appreciated. > > > > > > cheers all! > > > > > > M > > > > > > > > > -- > > > --- > > > Dr. Mark Wilkinson > > > Bioinformatics Group > > > National Research Council of Canada > > > Plant Biotechnology Institute > > > 110 Gymnasium Place > > > Saskatoon, SK > > > Canada > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >----------------------------------------------------------------- > >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > >. > >----------------------------------------------------------------- > > > > > >--__--__-- > > > >Message: 7 > >Date: Mon, 29 Jan 2001 18:40:07 +0000 (GMT) > >From: Iain Darroch > >To: bioperl-l@bioperl.org > >Subject: [Bioperl-l] Bio Framework and XML > > > >Hi All, > > > >I am currently looking at ways of integrating biological systems. I saw > >mentioned in some of the documentation that a Bio-Object Framework was > >proposed. Also that XML could be used in meta data for describing > >bioinformatics objects. > > > >I was wondering what the current situation of both these were. > > > >Has anyone implemented parsers yet? > > > >Thanks in advance > > > >Iain > > > > > > > > > >--__--__-- > > > >Message: 8 > >Date: Mon, 29 Jan 2001 14:33:59 -0500 (EST) > >From: Jason Stajich > >To: Bioperl > >Subject: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >What is the feeling here, we have this old way of doing things which > >included using the value 'EXPAND' to determine if we should expand the > >start/end space for a feature when adding a sub_SeqFeature to a feature? > > > >I think this should likely be better modeled through a SplitLocationI > >which is just a container of LocationObjects. So I propose to remove all > >references to 'EXPAND' which means removing the method _expand_region and > >updating add_sub_Feature to deal with adding the locations. Similarly the > >flush_sub_SeqFeature should flush the locations, but I'm not sure about > >what the start/end should be reset to... > > > >I also had to update FeaturePair to add the method location() which > >delegates to feature1()->location() otherwise things won't work correctly. > >start/end are defined by feature1 object so location should also reside > >in feature1. > > > >Jason > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > > > >--__--__-- > > > >Message: 9 > >Date: Mon, 29 Jan 2001 13:09:40 -0800 > >From: Hilmar Lapp > >Organization: GNF > >To: Jason Stajich > >Cc: Bioperl > >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >Jason Stajich wrote: > > > > > > What is the feeling here, we have this old way of doing things which > > > included using the value 'EXPAND' to determine if we should expand the > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > I think this should likely be better modeled through a SplitLocationI > > > which is just a container of LocationObjects. So I propose to remove all > > > references to 'EXPAND' which means removing the method _expand_region and > > > updating add_sub_Feature to deal with adding the locations. Similarly > > > >Can't we keep a separate method for coping with region extension due > >to a new subfeature, in whatever way the extension is done? As far as > >I can remember I had a good reason to put it into its own method, I > >needed it separately from add_sub_SeqFeature(). > > > > Hilmar > >-- > >------------------------------------------------------------- > >Hilmar Lapp email: lapp@gnf.org > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >------------------------------------------------------------- > > > >--__--__-- > > > >Message: 10 > >Date: Mon, 29 Jan 2001 16:26:21 -0500 (EST) > >From: Jason Stajich > >To: Hilmar Lapp > >cc: Bioperl > >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >On Mon, 29 Jan 2001, Hilmar Lapp wrote: > > > > > Jason Stajich wrote: > > > > > > > > What is the feeling here, we have this old way of doing things which > > > > included using the value 'EXPAND' to determine if we should expand the > > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > > > I think this should likely be better modeled through a SplitLocationI > > > > which is just a container of LocationObjects. So I propose to remove all > > > > references to 'EXPAND' which means removing the method _expand_region and > > > > updating add_sub_Feature to deal with adding the locations. Similarly > > > > > > Can't we keep a separate method for coping with region extension due > > > to a new subfeature, in whatever way the extension is done? As far as > > > I can remember I had a good reason to put it into its own method, I > > > needed it separately from add_sub_SeqFeature(). > > > >I guess it is more sane to let SeqFeature::Generic handle the common case > >and the split location case will need to be handled elsewhere. > > > >In the special case of a feature with multiple locations that feature (or > >object creating it) will take care of updating the location object to > >point to a splitlocation object. For example, if we choose to have CDS be > >represented as a SplitLocation with the exons being the parts in the > >join(...) statement. This will have to be negotiated by the object > >creating the Gene/CDS object. > > > >Okay so no changes to check in for Generic. > > > > > > > > Hilmar > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp@gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >Jason Stajich > >jason@chg.mc.duke.edu > >Center for Human Genetics > >Duke University Medical Center > >http://www.chg.duke.edu/ > > > > > > > >--__--__-- > > > >Message: 11 > >Date: Mon, 29 Jan 2001 13:57:13 -0800 > >From: Hilmar Lapp > >Organization: GNF > >To: Paul-Christophe Varoutas > >Cc: bioperl-l@bioperl.org > >Subject: Re: [Bioperl-l] RetrictionEnzyme.pm: a proposal > > > >Paul-Christophe Varoutas wrote: > > > > > > Tell me what you think about it: > > > - First of all, is redesigning possible or are we obliged to maintain > > > compatibility ? In the latter case I will just add functionality, > > > maintaining the poor design of the module. > > > - If redesigning is possible, please make comments/suggestions. > > > > > > >First of all, keeping compatibility is a very good thing. Every user > >of your software will appreciate it if he/she knows that this is taken > >seriously. > > > >In general, my opinion is if there's no strong reason to break > >compatibility, then don't break it. On the other hand, if there is a > >good reason, then don't hesitate. > > > >This means, yes, redesigning is possible, but a nicer design by itself > >is not a good reason to break compatibility. If the existing design is > >sort of prohibitive for adding certain new functionality, this might > >justify breaking compatibility. An example is the new location model, > >but in fact Jason could manage to keep compatibility. I suggest that > >you carefully examine whether you indeed can't redesign and at the > >same time keep compatibility. Based on your proposal I don't see the > >prohibitive point yet. > > > >As for the release, this issue is not on the task list, which means > >that you are on your own. There's a deadline next week, and we don't > >want to lose focus. If you finish the code and submit an accompanying > >rigorous test in t/* on time, it can make it into the release though, > >provided that there are no objections should you introduce > >incompatibilities. > > > >As a last remark, a design that isn't prepared very well for an > >extension one has in mind is not necessarily poor. It may just have > >been perfect for its original scope. And: I really think that there is > >no such thing as a "correct" design. Design may be bad or may be good, > >generic or tailored, or whatever, it just depends on your viewpoint, > >that is, on the particular problem you want to solve. > > > > Hilmar > >-- > >------------------------------------------------------------- > >Hilmar Lapp email: lapp@gnf.org > >GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >------------------------------------------------------------- > > > >--__--__-- > > > >Message: 12 > >Date: Mon, 29 Jan 2001 23:02:45 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Bioperl > >Subject: Re: [Bioperl-l] missing use statements > > > >Paul-Christophe Varoutas wrote: > > > > > > so I just added one line at the beginning of the module to load Bio::Seq: > > > > > > use Bio::Seq; > > > > > > >Thanks for pointing this out. The reason this became necessary all > >of a sudden was probably that I removed the respective lines from > >SeqIO.pm, because there was no obvious reason to keep them. Since > >I still think that the 'use' statements are better in those files > >where the modules are really used, I left it that way and added > >the necessary use statements to all other SeqIO modules (which > >probably would all have complained sooner or later). > > > > > and edited the @ISA array initialization line: > > > > > > @ISA = qw(Bio::SeqIO Bio::Seq); > > > > > > >We don't want SeqIO modules to inherit from Bio::Seq. > > > > Hilmar > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > >--__--__-- > > > >Message: 13 > >Date: Mon, 29 Jan 2001 23:04:21 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Bioperl > >CC: Jason Stajich > >Subject: [Bioperl-l] Root::Object in bioxml.pm > > > >SeqIO::bioxml.pm still inherits from Root::Object. Is there a > >particular reason that this one's an exception? > > > > Hilmar > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > >--__--__-- > > > >Message: 14 > >Date: Mon, 29 Jan 2001 23:10:26 -0800 > >From: Hilmar Lapp > >Organization: Nereis 4 > >To: Mark Wilkinson > >CC: bioperl-l@bioperl.org > >Subject: [Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/888 > > > >bioperl-bugs@bioperl.org wrote: > > > > > > Generic Features created from a GFF string do not > > > record Frame information, and when dumping the feature > > > out as GFF it is invariably reported as frame = 0. > > > > > > The problem is multi-fold: > > > > > > (1) the _from_gff_string and _from_gff2_string > > > subroutines in Generic.pm do not contain any code to handle the > > > recording of Frame information in the feature object > > > > > > (2) GFF allows a "." as the frame (meaning info not available), > > > while $Feature only allows 0,1, or 2. Thus it isn't clear how a > > > GFF frame of "." should be recorded. My first thought was that a > > > value of undef might return "." in a call to SeqFeatureI::gff_string, > > > however... > > > > > > (3) ...it appears that even if there is no frame information > > > available in a Feature object, it nevertheless passes the > > > $Feature->can('frame') test in SeqFeatureI::gff_string > > > and returns a (default??) value of 0 for the $Feature->frame call > > > (though there *is* code there to assign the frame to > > > "." if it fails the ->can test...) > > > > > > I am willing to fix this problem myself, but I would appreciate having > > > a consensus from the group about which level of the problem needs to be > > > fixed to keep everyone else's code happy. > > > > > > >I think that frame information should be consistent between GFF > >representation and object representation. '.' is equivalent to > >undef, and otherwise the frame should be 0, 1, or 2, regardless of > >object or GFF string. > > > > Hilmar > > > >-- > >----------------------------------------------------------------- > >Hilmar Lapp email: hlapp@gmx.net > >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > >----------------------------------------------------------------- > > > >--__--__-- > > > >Message: 15 > >Date: Tue, 30 Jan 2001 09:14:42 +0000 (GMT) > >From: Ewan Birney > >To: Jason Stajich > >cc: Bioperl > >Subject: Re: [Bioperl-l] Bio::SeqFeature::Generic add_sub_SeqFeature > > > >On Mon, 29 Jan 2001, Jason Stajich wrote: > > > > > What is the feeling here, we have this old way of doing things which > > > included using the value 'EXPAND' to determine if we should expand the > > > start/end space for a feature when adding a sub_SeqFeature to a feature? > > > > > > I think this should likely be better modeled through a SplitLocationI > > > which is just a container of LocationObjects. So I propose to remove all > > > references to 'EXPAND' which means removing the method _expand_region and > > > updating add_sub_Feature to deal with adding the locations. Similarly the > > > flush_sub_SeqFeature should flush the locations, but I'm not sure about > > > what the start/end should be reset to... > > > >I guess agree (I am wincing at every one of these decisions you know. It > >just pains me to see us have to handle this object complexity in > >essentially simple objects. Bugger-it! I know there is no way out here, > >but .... it goes against the grain). > > > > > > > > I also had to update FeaturePair to add the method location() which > > > delegates to feature1()->location() otherwise things won't work correctly. > > > start/end are defined by feature1 object so location should also reside > > > in feature1. > > > > > > >That is the consistent route here... > > > > > > > Jason > > > > > > Jason Stajich > > > jason@chg.mc.duke.edu > > > Center for Human Genetics > > > Duke University Medical Center > > > http://www.chg.duke.edu/ > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >----------------------------------------------------------------- > >Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > >. > >----------------------------------------------------------------- > > > > > > > >--__--__-- > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@bioperl.org > >http://bioperl.org/mailman/listinfo/bioperl-l > > > > > >End of Bioperl-l Digest > > > > > -- > Yuandan Zhang > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 18:28:59 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 10:28:59 -0800 Subject: [Bioperl-l] Status 0.7 Message-ID: <3A7707EB.C318F6EF@gmx.net> We're rapidly approaching next Monday's deadline for the 0.7 code freeze. I've indicated on the tasklist where in the sequence of tasks I define the freeze to be. In essence, to be realistic the freeze will in fact be a functionality freeze, that is, on Monday next week all tasks on the list before the freeze should be completed up to the stage of remaining bug-fixes (i.e. dark green). Tasks not completed until then are likely to be dropped. As I said, bug fixes and documentation additions/fixes (I consider every piece of added documentation essentially a bug-fix, because missing documentation constitutes a bug) are exempt from the freeze. I suggest, however, that these fixes begin immediately after the freeze, and do not take longer than 1 week. In parallel to fixing known bugs (known from the bug-tracker) the package shall be tested on various systems and against the projects we want to be compatible with (Mac, Win32, Perl 5.004, Ensembl, bioperl-gui, bioperl-corba, which will certainly reveal additional bugs/problems. The goal is to have the code branch-ready within one week after the freeze, the quicker the better. Just to note the obvious: to keep the release phase as free as possible from unnecessary interference, I will not accept module or code submissions from the point of freeze until actually branching off the release. The situation actually doesn't look bad, the patchwork carpet is more and more greenish. The remaining sore points are o RichSeq interface, implementation, and adoption by parsers (Ewan) o SeqAnalysisParser/SeqFeatureProducer revisit (Hilmar & Jason, Ewan) o Transcript/GeneStructure & frame-aware cds() (Hilmar) o BioCorba 0.2 interoperability (Jason) Others than the three of us mentioned probably can't sensibly jump in any of these. However, you can provide support by testing code, looking through documentation, pointing out errors and undocumented methods etc, and, most importantly, development-wise by implementing tests which is BTW a good way of learning the package. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 18:34:43 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 10:34:43 -0800 Subject: [Bioperl-l] SeqFeatureI review Message-ID: <3A770943.8A3A89EA@gmx.net> I think I posted this already but can't dig it up any more. This is on our tasklist. Are there any other issues in SeqFeatureI we wanted to revisit apart from location-related stuff? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 30 19:08:36 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 30 Jan 2001 13:08:36 -0600 (CST) Subject: [Bioperl-l] SeqFeatureI review In-Reply-To: <3A770943.8A3A89EA@gmx.net> Message-ID: Has the Err problem been fixed? Bugs were posted numerous times. I think the error came from Root::Object - so it is irrelevant, correct? On Tue, 30 Jan 2001, Hilmar Lapp wrote: > I think I posted this already but can't dig it up any more. This > is on our tasklist. Are there any other issues in SeqFeatureI we > wanted to revisit apart from location-related stuff? > > Hilmar > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From hlapp@gmx.net Tue Jan 30 19:20:51 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:20:51 -0800 Subject: [Bioperl-l] SeqFeatureI review References: Message-ID: <3A771413.44154823@gmx.net> David Block wrote: > > Has the Err problem been fixed? Bugs were posted numerous times. I think > the error came from Root::Object - so it is irrelevant, correct? > I'm not sure. Can you dig up such a report, point to the respective number in the bug tracker? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 30 19:26:35 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 30 Jan 2001 13:26:35 -0600 (CST) Subject: [Bioperl-l] SeqFeatureI review In-Reply-To: <3A771413.44154823@gmx.net> Message-ID: One of them was # 855. It still bites us. We also got mail off-list from others who were unable to use 6.2 because of it. On Tue, 30 Jan 2001, Hilmar Lapp wrote: > David Block wrote: > > > > Has the Err problem been fixed? Bugs were posted numerous times. I think > > the error came from Root::Object - so it is irrelevant, correct? > > > > I'm not sure. Can you dig up such a report, point to the > respective number in the bug tracker? > > Hilmar > -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From hlapp@gmx.net Tue Jan 30 19:35:46 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:35:46 -0800 Subject: [Bioperl-l] Bio::Root::Object cleanup Message-ID: <3A771792.DB06ACA6@gmx.net> In an attempt to tidy up our transition to Bio::Root::RootI I added a note about deprecation to Root/Object.pm and a warning to _initialize(). The warning will be suppressed for modules of which we know that they're in love with Root::Object. These revealed some to me unexpected modules. In total, the following modules contain a 'use Bio::Root::Object'statement: Bio/Root/Global.pm (*) Bio/Root/Err.pm (*) Bio/Root/IOManager.pm (*) Bio/Root/Utilities.pm (*) Bio/Root/Vector.pm (*) Bio/Root/Xref.pm (*) Bio/Search/Hit/HitI.pm (?) Bio/SeqIO/bioxml.pm (?) Bio/Tools/Blast/Sbjct.pm (*) Bio/Tools/Blast/HSP.pm (*) Bio/Tools/Blast/Run/LocalBlast.pm (*) Bio/Tools/AlignFactory.pm Bio/Tools/IUPAC.pm Bio/Tools/SeqAnal.pm (*) Bio/Tools/SeqPattern.pm Bio/Tools/WWW.pm Bio/Tools/PPSEARCH/Parse.pm Bio/Tools/PRFSCAN/Parse.pm Bio/Tools/PRINTS/Parse.pm Those marked with (*) are obvious. Those marked with (?) are likely to be absent from the release. I presently have no overview of the others. Jason, did you leave them out on purpose? In addition, the Variation code contains the line Bio/Variation/IO.pm: return Bio::Root::Object::new($class, %param); Heikki, I don't know about the context, just wanted to make sure this is indispensable. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 19:41:32 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:41:32 -0800 Subject: [Bioperl-l] LiveSeq tests warn Message-ID: <3A7718EC.736A4F85@gmx.net> Just to let you know, I'm getting warnings on my machine from LiveSeq.t and Mutator.t. Could you check whether this might indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) Hilmar t/LiveSeq...........Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1202. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1207. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1215. Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 380. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 385. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 393. ok t/Mutator...........Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1202. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1207. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/SeqI.pm line 1215. Argument "LiveSeq" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 380. Argument "ARRAY" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 385. Argument "HASH" isn't numeric in ne at blib/lib/Bio/LiveSeq/Gene.pm line 393. ok -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Tue Jan 30 19:49:22 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Tue, 30 Jan 2001 11:49:22 -0800 Subject: [Bioperl-l] SeqFeatureI review References: Message-ID: <3A771AC2.F61D9FBB@gmx.net> David Block wrote: > > One of them was # 855. It still bites us. We also got mail off-list from > others who were unable to use 6.2 because of it. > This one should be gone, first because it was fixed, second, because Err.pm shouldn't be used in many modules (basically only the Blast modules are left). Does it really still exist in a main-trunk checkout? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From jason@chg.mc.duke.edu Tue Jan 30 20:40:38 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Tue, 30 Jan 2001 15:40:38 -0500 (EST) Subject: [Bioperl-l] Bio::Root::Object cleanup In-Reply-To: <3A771792.DB06ACA6@gmx.net> Message-ID: On Tue, 30 Jan 2001, Hilmar Lapp wrote: > In an attempt to tidy up our transition to Bio::Root::RootI I > added a note about deprecation to Root/Object.pm and a warning to > _initialize(). The warning will be suppressed for modules of which > we know that they're in love with Root::Object. > > These revealed some to me unexpected modules. In total, the > following modules contain a 'use Bio::Root::Object'statement: I skipped most of SteveC's modules initially because he likes to utilize the functionality he built into Bio::Root::Object and Bio::Root::Global (understandably). below is log of what I just checked in. > > Bio/Root/Global.pm (*) > Bio/Root/Err.pm (*) > Bio/Root/IOManager.pm (*) > Bio/Root/Utilities.pm (*) > Bio/Root/Vector.pm (*) > Bio/Root/Xref.pm (*) > Bio/Search/Hit/HitI.pm (?) I believe the entire Search dir is to be removed per Aaron Mackey saying that it is not in a usuable state and probably never will.... > Bio/SeqIO/bioxml.pm (?) I can fix it if we are definitely keeping it, I am under the impression it is to be trashed... not sure though. > Bio/Tools/Blast/Sbjct.pm (*) > Bio/Tools/Blast/HSP.pm (*) > Bio/Tools/Blast/Run/LocalBlast.pm (*) > Bio/Tools/AlignFactory.pm fixed > Bio/Tools/IUPAC.pm fixed > Bio/Tools/SeqAnal.pm (*) > Bio/Tools/SeqPattern.pm fixed > Bio/Tools/WWW.pm dependance on Bio::Root::Global and the $AUTHORITY variable which steve had this coded to his old stanford email address, I removed this dependance and hardcoded the $AUTHORITY var to be local to WWW.pm and have the value 'nobody@localhost' if anyone is using it they probably should speak up.... > Bio/Tools/PPSEARCH/Parse.pm depends on old Bio::SeqFeatureSet which no longer exists, I'm not sure what to do here > Bio/Tools/PRFSCAN/Parse.pm ditto > Bio/Tools/PRINTS/Parse.pm ditto > > Those marked with (*) are obvious. Those marked with (?) are > likely to be absent from the release. I presently have no overview > of the others. Jason, did you leave them out on purpose? > > In addition, the Variation code contains the line > Bio/Variation/IO.pm: return Bio::Root::Object::new($class, > %param); > Heikki, I don't know about the context, just wanted to make sure > this is indispensable. > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From mwilkinson@gene.pbi.nrc.ca Tue Jan 30 21:35:03 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Tue, 30 Jan 2001 15:35:03 -0600 Subject: [Bioperl-l] GO ontology browser module available Message-ID: <3A773386.CF87115A@gene.pbi.nrc.ca> Dear Group, Most of you will be familiar with the GO consortium project of putting together a common nomenclature for genome annotation. As part of the development of Workbench, I have thrown together a fairly simplistic Gene Ontology ("GO") parser/browser widget. It is able to parse the XML files available on the GO website, clean up the XML to make it compatible with the XML::Parser module (available from CPAN), and then dump the resulting hash using Data::Dumper. The dumped file can then be read into the GO_browser (which is an extension of a Tk::Text widget) and browsed as if it were a directory window, with double-clicks to navigate up and down the tree, and color coding of what are 'branches' and what are 'leaves'. Middle-clicks can be trapped in the external Tk::MainWindow to extract the selected ontology term and definition. It is more or less a "plug in" module, similar in design to SeqCanvas - you create a Text widget, pass the Text widget to GO_Browser->new and it gives you back a browsable GO ontology. Parsing the GO ontology files themselves takes about 4-5 minues each, but this only has to be done once per GO release; the resulting hash-dump can be slurped into the GO_browser widget in a couple of seconds. I parse the GO ontology tree only to the point where GO-terms end and hard gene-names, examples, and bibliographic data begin. This could easily be modified, however, as you wish. Because this module doesn't really "fit" anywhere in the current BioPerl structure, and because the .xml files that it is based on are still quite fluid (and thus the module will likely have to be tweaked quite extensively until things settle down), I don't feel that it is worth adding into the BioPerl repository at this time. However, I would be glad to share it with anyone who might find it useful, with all the usual disclaimers :-) Let me know, Cheers all! M -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From birney@ebi.ac.uk Tue Jan 30 21:57:48 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 21:57:48 +0000 (GMT) Subject: [Bioperl-l] RichSeqI Message-ID: To prove to hilmar that I am doing the RichSeqI stuff, I have committed the interface. Basically this is a trivial recasting of the "additional support" currently in Seq.pm which I will move out into Bio::Seq::RichSeq.pm currently the interface looks like... =head1 NAME Bio::Seq::RichSeqI - RichSeq interface, mainly for database orientated sequences =head1 SYNOPSIS @secondary = $richseq->get_secondary_accessions; $division = $richseq->division; $mol = $richseq->molecule; @dates = $richseq->get_dates; $seq_version = $richseq->seq_version; =head1 DESCRIPTION This interface extends the Bio::SeqI interface to give additional functionality to sequences with richer data sources, in particular from database sequences (EMBL, GenBank and Swissprot). Kris, Jason, Hilmar --- comments? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Tue Jan 30 22:38:16 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 30 Jan 2001 22:38:16 +0000 (GMT) Subject: [Bioperl-l] RichSeq Message-ID: I have committed Bio::Seq::RichaSeq which implement the interface. I have adapted embl, genbank and swiss IO to work with it.... all very painless... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From lapp@gnf.org Tue Jan 30 23:05:52 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 30 Jan 2001 15:05:52 -0800 Subject: [Bioperl-l] RichSeqI References: Message-ID: <3A7748D0.CD638437@gnf.org> Ewan Birney wrote: > > To prove to hilmar that I am doing the RichSeqI stuff, I have committed > the interface. Basically this is a trivial recasting of the "additional > support" currently in Seq.pm which I will move out into > Bio::Seq::RichSeq.pm > > currently the interface looks like... > > =head1 NAME > > Bio::Seq::RichSeqI - RichSeq interface, mainly for database orientated > sequences > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > > > =head1 DESCRIPTION > > This interface extends the Bio::SeqI interface to give additional > functionality to sequences with richer data sources, in particular from > database sequences (EMBL, GenBank and Swissprot). > > Kris, Jason, Hilmar --- comments? > Sounds good. This is really the right direction. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Tue Jan 30 23:09:44 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 30 Jan 2001 15:09:44 -0800 Subject: [Bioperl-l] RichSeq References: Message-ID: <3A7749B8.39CE1934@gnf.org> Ewan Birney wrote: > > I have committed Bio::Seq::RichaSeq which implement the interface. I ----^--- Typo? > have > adapted embl, genbank and swiss IO to work with it.... > > all very painless... > Cool. I'm really glad that this makes it into the release. The reddish colors on the patchwork carpet retreat. Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From lapp@gnf.org Tue Jan 30 23:10:38 2001 From: lapp@gnf.org (Hilmar Lapp) Date: Tue, 30 Jan 2001 15:10:38 -0800 Subject: [Bioperl-l] GO ontology browser module available References: <3A773386.CF87115A@gene.pbi.nrc.ca> Message-ID: <3A7749EE.576E0D9D@gnf.org> Mark Wilkinson wrote: > > Because this module doesn't really "fit" anywhere in the current BioPerl > structure, and because the .xml files that it is based on are still > quite fluid (and thus the module will likely have to be tweaked quite > extensively until things settle down), I don't feel that it is worth > adding into the BioPerl repository at this time. However, I would be > glad to share it with anyone who might find it useful, with all the > usual disclaimers :-) > > Let me know, > Wouldn't it make sense to add it to bioperl-gui? Hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp@gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dblock@gene.pbi.nrc.ca Tue Jan 30 23:22:16 2001 From: dblock@gene.pbi.nrc.ca (David Block) Date: Tue, 30 Jan 2001 17:22:16 -0600 (CST) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To: <3A7749EE.576E0D9D@gnf.org> Message-ID: On Tue, 30 Jan 2001, Hilmar Lapp wrote: > Mark Wilkinson wrote: > > > > Because this module doesn't really "fit" anywhere in the current BioPerl > > structure, and because the .xml files that it is based on are still > > quite fluid (and thus the module will likely have to be tweaked quite > > extensively until things settle down), I don't feel that it is worth > > adding into the BioPerl repository at this time. However, I would be > > glad to share it with anyone who might find it useful, with all the > > usual disclaimers :-) > > > > Let me know, > > > > Wouldn't it make sense to add it to bioperl-gui? > > Hilmar > Inasmuch as it is completely separate from SeqCanvas, and we are still thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater than SeqCanvas, maybe. Mark? I think it would be okay. -- David Block dblock@gene.pbi.nrc.ca http://bioinfo.pbi.nrc.ca/dblock/wiki Plant Biotechnology Institute National Research Council of Canada Saskatoon, Saskatchewan From insana@ebi.ac.uk Wed Jan 31 00:08:03 2001 From: insana@ebi.ac.uk (Joseph Insana) Date: Wed, 31 Jan 2001 00:08:03 +0000 (GMT) Subject: [Bioperl-l] Re: LiveSeq tests warn In-Reply-To: <3A7718EC.736A4F85@gmx.net> Message-ID: > Just to let you know, I'm getting warnings on my machine from > LiveSeq.t and Mutator.t. Could you check whether this might > indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) Strange, I have nothing like that. Hmmmm. It seems it's complaining because I used "ne" instead than "!=" to test for something to be -1 or not -1. My perl is not complaining. I am running perl v5.6.0 on linux 2.4.0. Try please putting "!=" instead than "ne" and see if it gets fixed. Joseph From icarus@caffeine.doit.wisc.edu Wed Jan 31 00:17:39 2001 From: icarus@caffeine.doit.wisc.edu (Christopher Solomon) Date: Tue, 30 Jan 2001 18:17:39 -0600 (CST) Subject: [Bioperl-l] introduction Message-ID: Heyas. I've been looking over the bioperl site for a few weeks and thought it was about time I signed up for the mailing list. I'm eager to start learning about bioperl and how to use computation in general to solve biological problems. As a person who is interested in this subject, I'd like to hear a little about what your backgrounds are. To be fair, I'll tell you a little of mine. I got my B.S. in Biology from U of Illinois, then went to grad school in Molecular Biology at U of Wisconsin. After two years in the grad program, I got sick of it and left. I then got a job on campus for the university help desk. The next year introduced me to linux and system administration. Which brought me out here to California. Now I'm doing perl development for an internet company. I would eventually like to get back into science, but doing computational biology or bioinformatics. I figured getting involved with the bioperl project was as good a way to start as any, so here I am. I'd love to help out with anything I can, so if there are any lingering jobs nobody seems to want, well, I might take a crack, or any modules or such that just needs some cleaning up, I'm willing to help out where I can. So please tell me a little about yourselves and what (if anything) bio and/or perl has to do with your current employment situation. Christopher Solomon Jr. Application Developer ValueClick, Inc. icarus@caffeine.doit.wisc.edu From petertait@sympatico.ca Wed Jan 31 05:05:02 2001 From: petertait@sympatico.ca (Peter Tait) Date: Tue, 30 Jan 2001 21:05:02 -0800 Subject: [Bioperl-l] quit Message-ID: <3A779CFE.2D861F2D@sympatico.ca> I would like to quit bioperl. Thanks From hlapp@gmx.net Wed Jan 31 08:18:29 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 00:18:29 -0800 Subject: [Bioperl-l] RichSeqI References: <3A7748D0.CD638437@gnf.org> Message-ID: <3A77CA55.87E06E1@gmx.net> The interface looks slim, in fact very slim. Intentional or did you forget to commit? Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 31 08:51:44 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 00:51:44 -0800 Subject: [Bioperl-l] Bio::Root::Object cleanup References: Message-ID: <3A77D220.1D20E5E1@gmx.net> Jason Stajich wrote: > > > Bio/Tools/PPSEARCH/Parse.pm > depends on old Bio::SeqFeatureSet which no longer exists, I'm not sure > what to do here > > Bio/Tools/PRFSCAN/Parse.pm > ditto > > Bio/Tools/PRINTS/Parse.pm > ditto These modules are obviously not being maintained, nor are they functional, let alone test scripts. Does anyone know what the intended destiny for these modules is? The author seems to be Evgueni; is he still with EBI? There was another module Tools::SeqWords which escaped my grep; I fixed it to inherit from RootI and fixed also a couple of other bugs there. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 31 09:08:46 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 01:08:46 -0800 Subject: [Bioperl-l] Bio::Factory Message-ID: <3A77D61E.CF9D1413@gmx.net> In an attempt to address revisit/finalization of the SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept the design change Ewan proposed couple of weeks ago: ------ Why not have Bio::SeqAnalysisParserFactoryI $parser = $factory->create_parser(-fh => \*FILE); Bio::SeqAnalyisParserI while( $next_feature = $parser->next_feature ) { } same number of functions defined. Twice the number of interfaces, but these are the interfaces I would argue we want. An implementation could implement ParserFactoryI and ParserI in the same module if so wished. ------ For the factory interface I propose to open a new directory Bio::Factory, first to avoid cluttering of other directories, and second because there are many places in BioPerl that can eventually take advantage of a factory design (basically, wherever hard-coded object creation occurs, e.g. in SeqIO::* etc), so that directory hopefully won't stay empty for long. Any objections? If not, I'll give it a go soon. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 09:26:51 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 09:26:51 +0000 (GMT) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To: Message-ID: On Tue, 30 Jan 2001, David Block wrote: > On Tue, 30 Jan 2001, Hilmar Lapp wrote: > > > Mark Wilkinson wrote: > > > > > > Because this module doesn't really "fit" anywhere in the current BioPerl > > > structure, and because the .xml files that it is based on are still > > > quite fluid (and thus the module will likely have to be tweaked quite > > > extensively until things settle down), I don't feel that it is worth > > > adding into the BioPerl repository at this time. However, I would be > > > glad to share it with anyone who might find it useful, with all the > > > usual disclaimers :-) > > > > > > Let me know, > > > > > > > Wouldn't it make sense to add it to bioperl-gui? > > > > Hilmar > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > than SeqCanvas, maybe. Mark? I think it would be okay. Sounds like the right place to me.... > > -- > David Block > dblock@gene.pbi.nrc.ca > http://bioinfo.pbi.nrc.ca/dblock/wiki > Plant Biotechnology Institute > National Research Council of Canada > Saskatoon, Saskatchewan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 09:32:43 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 09:32:43 +0000 (GMT) Subject: [Bioperl-l] Bio::Factory In-Reply-To: <3A77D61E.CF9D1413@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > In an attempt to address revisit/finalization of the > SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept > the design change Ewan proposed couple of weeks ago: > > ------ > Why not have > > Bio::SeqAnalysisParserFactoryI > > $parser = $factory->create_parser(-fh => \*FILE); > > Bio::SeqAnalyisParserI > > while( $next_feature = $parser->next_feature ) { > > } > > same number of functions defined. Twice the number of interfaces, > but > these are the interfaces I would argue we want. > > An implementation could implement ParserFactoryI and ParserI in > the > same module if so wished. > ------ > > For the factory interface I propose to open a new directory > Bio::Factory, first to avoid cluttering of other directories, and > second because there are many places in BioPerl that can > eventually take advantage of a factory design (basically, wherever > hard-coded object creation occurs, e.g. in SeqIO::* etc), so that > directory hopefully won't stay empty for long. > > Any objections? If not, I'll give it a go soon. This sounds really good.... Definitely needed/wanted... > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 31 13:56:21 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 08:56:21 -0500 (EST) Subject: [Bioperl-l] Bio::Factory In-Reply-To: <3A77D61E.CF9D1413@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > In an attempt to address revisit/finalization of the > SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept > the design change Ewan proposed couple of weeks ago: > > ------ > Why not have > > Bio::SeqAnalysisParserFactoryI > > $parser = $factory->create_parser(-fh => \*FILE); > > Bio::SeqAnalyisParserI > > while( $next_feature = $parser->next_feature ) { > > } > > same number of functions defined. Twice the number of interfaces, > but > these are the interfaces I would argue we want. > > An implementation could implement ParserFactoryI and ParserI in > the > same module if so wished. > ------ > > For the factory interface I propose to open a new directory > Bio::Factory, first to avoid cluttering of other directories, and > second because there are many places in BioPerl that can > eventually take advantage of a factory design (basically, wherever > hard-coded object creation occurs, e.g. in SeqIO::* etc), so that > directory hopefully won't stay empty for long. > > Any objections? If not, I'll give it a go soon. Great idea and it is a good place to put these things and can help cleanup some of the clutter for sure. > > Hilmar > > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From jason@chg.mc.duke.edu Wed Jan 31 14:01:23 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 09:01:23 -0500 (EST) Subject: [Bioperl-l] RichSeqI In-Reply-To: Message-ID: On Tue, 30 Jan 2001, Ewan Birney wrote: > > To prove to hilmar that I am doing the RichSeqI stuff, I have committed > the interface. Basically this is a trivial recasting of the "additional > support" currently in Seq.pm which I will move out into > Bio::Seq::RichSeq.pm > > > currently the interface looks like... > > > =head1 NAME > > Bio::Seq::RichSeqI - RichSeq interface, mainly for database orientated > sequences > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > > > =head1 DESCRIPTION > > This interface extends the Bio::SeqI interface to give additional > functionality to sequences with richer data sources, in particular from > database sequences (EMBL, GenBank and Swissprot). > > > Kris, Jason, Hilmar --- comments? We have static set of methods for handling the fields you describe above as well as a set of dynamic methods (via AUTOLOAD) to deal with things like PID (bug #160), genbankid. Or does most of that get wrapped into secondard_accessions? I guess are there any other fields we are missing? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From heikki@ebi.ac.uk Wed Jan 31 15:33:57 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:33:57 +0000 Subject: [Bioperl-l] more fuzziness checked in References: Message-ID: <3A783065.9A629C67@ebi.ac.uk> Jason Stajich wrote: > > more robust fuzzy and split feature handling checked in. > > FTHelper will try and see if start==end, if it does and there is no > splitlocation delimiter then the code will return just a single number > representing the location ie > > variation 500 > /allele="C" > /allele="T" > I am just back from an one week holiday. I'll catch up with the list in a day or two. Jason, In case you really are going to use the above format, it is not valid according to The DDBJ/EMBL/GenBank Feature Table Definition. The allele qualifier gives a common name of the allele in free text, e.g.: /allele="adh1-1" In general there is the rule that there should not be identical feature keys on the same location, but 'variation' is an exception. When we are dealing with SNPs whe do not generally know which of the alleles are present in that particular sequence the SNP is mapped to (unless you want to check the sequence). The correct way to represent diallelic variation in DDBJ/EMBL/GenBank feature table is to repeat the feature key for each allele and use /replace qualifier. variation 500 /replace="C" variation 500 /replace="T" It is ugly but that's what they (EMBL database people) told me to do a few weeks ago when I was writing the to_FTHelper method to SNPs in EnsEMBL. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From paul-christophe.varoutas@curie.fr Wed Jan 31 15:19:06 2001 From: paul-christophe.varoutas@curie.fr (Paul-Christophe Varoutas) Date: Wed, 31 Jan 2001 16:19:06 +0100 Subject: [Bioperl-l] Re: LiveSeq tests warn In-Reply-To: References: <3A7718EC.736A4F85@gmx.net> Message-ID: <5.0.2.1.2.20010131160802.00a62a98@mailhost.curie.fr> I guess you are talking about the small bug I fixed yesterday in /Bio/LiveSeq/SeqI.pm and Bio/LiveSeq/Gene.pm: http://bioperl.org/pipermail/bioperl-guts-l/2001-January/002957.html (I committed after Hilmar's mail and before Joseph's answer). Paul-Christophe At 00:08 31/01/2001 +0000, Joseph Insana wrote: > > Just to let you know, I'm getting warnings on my machine from > > LiveSeq.t and Mutator.t. Could you check whether this might > > indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) > >Strange, I have nothing like that. >Hmmmm. It seems it's complaining because I used "ne" instead than "!=" >to test for something to be -1 or not -1. >My perl is not complaining. >I am running perl v5.6.0 on linux 2.4.0. > >Try please putting "!=" instead than "ne" and see if it gets fixed. > >Joseph At 11:41 30/01/2001 -0800, Hilmar Lapp wrote: >Just to let you know, I'm getting warnings on my machine from >LiveSeq.t and Mutator.t. Could you check whether this might >indicate an error? (I'm running Perl 5.005_03 on Linux 2.2.10.) > > Hilmar > >t/LiveSeq...........Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1202. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1207. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1215. >Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 380. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 385. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 393. >ok > >t/Mutator...........Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1202. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1207. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/SeqI.pm line 1215. >Argument "LiveSeq" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 380. >Argument "ARRAY" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 385. >Argument "HASH" isn't numeric in ne at >blib/lib/Bio/LiveSeq/Gene.pm line 393. >ok > > >-- >----------------------------------------------------------------- >Hilmar Lapp email: hlapp@gmx.net >GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 >----------------------------------------------------------------- >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@bioperl.org >http://bioperl.org/mailman/listinfo/bioperl-l From mwilkinson@gene.pbi.nrc.ca Wed Jan 31 15:25:51 2001 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Wed, 31 Jan 2001 09:25:51 -0600 Subject: [Bioperl-l] GO ontology browser module available References: Message-ID: <3A782E7E.5EEC35CA@gene.pbi.nrc.ca> Ewan Birney wrote: > > > Wouldn't it make sense to add it to bioperl-gui? > > > > > > Hilmar > > > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > > than SeqCanvas, maybe. Mark? I think it would be okay. > > Sounds like the right place to me.... indeed - that was where I intended to put it when it was a little more "polished"... I am just hesitant to use the BioPerl CVS repository to store my half-baked code. There are several things which "don't work right" (tm). I think a lot of this has to do with the fact that I can not get my hands on the GO.dtd - it isn't available on the GO website, though all of the other XML files are (yet they reference the DTD in these same XML files). Neither do I receive a response to inquiries sent to the consortium e-mail address. The consequence is that XML::Parser doesn't know what to do with the HTML-like formatting tags that they are using in some of their "free text", and in some cases tries to treat them as sub-level tags (for example, what should be a subscript or superscript will become a sub-element of the preceeding word, so Carbon14 parses as $GO->{Carbon}->{14}... which is ridiculous of course....). In addition they use HTML designations for the greek alpha, beta, gamma, and so on, preceeded with an ampersand and ending with a semicolon These can not be parsed by XML::Parser *at all* unless it is specifically told that these are going to be #CDATA elements... which requires a DTD.... which I don't have. So, GO_Browser (for the time being) hacks away at the XML in its first parsing pass, replacing these tags with things that will not break XML::Parser, and then reads from this hacked data. As a result, what you get is not "strict" GO ontology, but a slightly modified version of the same.... which effectively defeats the purpose of GO which is that everyone should use a consensus nomenclature. :-( In any case, after all that griping, I am perfectly willing to cvs add this module to bioperl-gui, so long as I am not judged too harshly by it - I know it's a hack!! :-) I'll get on to that later this afternoon. b.t.w. If anyone can assist me in getting ahold of a GO.dtd please speak up! It would make my miserable life a bit brighter!! -- --- Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From heikki@ebi.ac.uk Wed Jan 31 15:43:53 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:43:53 +0000 Subject: [Bioperl-l] RetrictionEnzyme.pm: a proposal References: <5.0.2.1.2.20010129100143.00b32138@pop.wanadoo.fr> Message-ID: <3A7832B9.ED647AAC@ebi.ac.uk> Paul-Christophe, Please have a look at Bio::Variation::VariantI::restriction_changes, too. I would have prefered to use Bio::Tools::RestrictionEnzyme but decided not to depend on it as I found it too complicated. It would be great not to have to duplicate restriction enzyme lists and functionality. If you come up with a solution I'd be happy to remove or modify the restriction_changes method. -Heikki Paul-Christophe Varoutas wrote: > > Yesterday I studied RestrictionEnzyme.pm more in depth. I haven't yet added > the methods I wanted to, because in my opinion it is far more urgent for > this module to get some redesigning. > > The module somewhat suffers of poor design, and just adding methods to it > will just worsen the situation. > > RestrictionEnzyme has methods which are proper to the restriction enzymes: > - seq() is the accessor method to the enzyme's recognition sequence. > - cut_seq() "cuts" a Bio::Seq-derived object and generates an array of > restriction site fragments. > - cuts_seq_at() does the same but this time generates an array of > restriction site coordinates. > > and methods which are proper to the list of enzymes: > - is_available() says if a particular enzyme is in the list. > - available_list() gives the list of all enzymes or list of n-base cutters. > > Steve Chervitz already suggested in the module's documentation that > is_available() "may be more appropriate for a REData.pm class", and I share > his opinion. From a conceptual point of view, the existing > RestrictionEnzyme.pm module corresponds to two object classes, not one. > > Here is an outline of my proposal: > > Separate RestrictionEnzyme in two classes: > > RestrictionEnzymeDBase (or whatever more appropriate): > - members: the list of restriction enzymes. > - methods: > - constructor using hardwired list of enzymes OR user file OR URL. > - add/remove enzyme to/from list (adding will be the equivalent of > _make_custom() ). > - member accessor methods: already existing methods: is_available(), > available_list(). > > RestrictionEnzyme: > - members: the same as now (_name, _seq, _site, _cuts_after). > - methods: > - constructor (equivalent to the constructor calling the > _make_standard() sub). > - already existing accessor methods. > - already existing methods: cut_seq, cuts_seq_at, etc. > > This design, apart from being more "correct", will facilitate any future > extensions of the two modules. The drawback in separating RestrictionEnzyme > in two classes is that all code using RestrictionEnzyme.pm will have to be > modified. > > Perhaps we should take advantage of the imminent release of the 0.7 version > and decide to proceed in the redesigning. If we change the design this will > also be the opportunity to slightly change/extend its public interface to > add small new functionalities such as being able to add and use asymmetric > cutters and enzymes which cut outside the recognition site (perhaps just > incorporating small changes now in order to be in time for the 0.7 release > and leaving extensions for afterwards, especially if I do this alone based > on what we decide). > > Tell me what you think about it: > - First of all, is redesigning possible or are we obliged to maintain > compatibility ? In the latter case I will just add functionality, > maintaining the poor design of the module. > - If redesigning is possible, please make comments/suggestions. > > Paul-Christophe > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 15:51:18 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 15:51:18 +0000 Subject: [Bioperl-l] SeqFeature::Generic broken? no Location::Simple.pm References: Message-ID: <3A783476.5DE01D5D@ebi.ac.uk> I read "A Really Good Book" recently about CVS and found out that you can put in your home directory a .cvsrc file with for example following lines: update -d cvs -q -z9 After that 'cvs update' is automatically expanded to 'cvs -q -z9 update -d'! -Heikki Ewan Birney wrote: > > On Mon, 29 Jan 2001, Mark Wilkinson wrote: > > > Dear Group, > > > > I just cvs-updated and noticed that SeqFeature::Generic does not appear > > to be functional anymore. It is calling on Bio/Location/Simple.pm > > (line 122), which apparently does not exist. Is it just my installation > > which is wonky, or is this a genuine bug? > > cvs update -d > > > > > any advice appreciated. > > > > cheers all! > > > > M > > > > > > -- > > --- > > Dr. Mark Wilkinson > > Bioinformatics Group > > National Research Council of Canada > > Plant Biotechnology Institute > > 110 Gymnasium Place > > Saskatoon, SK > > Canada > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 16:52:20 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 16:52:20 +0000 Subject: [Bioperl-l] Incompatibility with Perl v5.6.0 [Fwd: XML::Parse test fails] Message-ID: <3A7842C4.6F05ED9D@ebi.ac.uk> It might be worth adding this into release notes of the upcoming 0.7 release. As a result Bio::Variation XML input and output does not work under Perl v5.6.0. We have to pray that 5.6.1 will be out soon. -Heikki David Megginson wrote: > > Heikki Lehvaslaiho writes: > > > I recently upgraded to Perl v5.6.0. As result the XML::Parse test > > script fails and CPAN does not install it: > > There is a known bug in Perl 5.6 when passing array references. > > All the best, > > David > > -- > David Megginson david@megginson.com > http://www.megginson.com/ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Wed Jan 31 17:11:26 2001 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 31 Jan 2001 17:11:26 +0000 Subject: [Bioperl-l] Re: Bio::Root::Object cleanup References: <3A771792.DB06ACA6@gmx.net> Message-ID: <3A78473E.554A78C1@ebi.ac.uk> Hilmar Lapp wrote: ... > In addition, the Variation code contains the line > Bio/Variation/IO.pm: return Bio::Root::Object::new($class, > %param); > Heikki, I don't know about the context, just wanted to make sure > this is indispensable. It is not. I copied it over from Bio::SeqIO at some point. Removed. -Heikki > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From hlapp@gmx.net Wed Jan 31 17:58:44 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 09:58:44 -0800 Subject: [Bioperl-l] RichSeqI References: Message-ID: <3A785254.7E3A11BD@gmx.net> Ewan Birney wrote: > > =head1 SYNOPSIS > > @secondary = $richseq->get_secondary_accessions; > $division = $richseq->division; > $mol = $richseq->molecule; > @dates = $richseq->get_dates; > $seq_version = $richseq->seq_version; > What about species()? Just popped into my head. Right now a class implementing both SeqI and RichSeqI doesn't have to have that, even though it's present in probably most 'rich' databanks. What do you think about moving it, too? (It's now in Seq.pm.) Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From cjm@fruitfly.bdgp.berkeley.edu Wed Jan 31 18:02:05 2001 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Wed, 31 Jan 2001 10:02:05 -0800 (PST) Subject: [Bioperl-l] GO ontology browser module available In-Reply-To: <3A782E7E.5EEC35CA@gene.pbi.nrc.ca> Message-ID: Hi Mark Sorry you haven't heard back from us GO people, all the GO developers are working full time on another project at the moment, just keep at us and we'll respond eventually. We should fix the problem of the SGML embedded within XML - Brad, can you see to this? In the meantime, have you tried using either the flat files or the mysql database? there are perl modules for using either of these in the GO repository. As to where you deposit your code, I'd love to keep all the GO code together in one cvs repository. Unfortunately, the stanford cvs server is highly restricted. I was considering moving the perl software portion of GO away from the stanford cvs server into the Berkeley one, for this reason. Another option would be to use bioperl cvs for all of GO-perl, if people are willing. if anyone's interested the GO module docs are here: http://www.fruitfly.org/annot/go/database/modules/GO::AppHandle.html On Wed, 31 Jan 2001, Mark Wilkinson wrote: > Ewan Birney wrote: > > > > > Wouldn't it make sense to add it to bioperl-gui? > > > > > > > > Hilmar > > > > > > > Inasmuch as it is completely separate from SeqCanvas, and we are still > > > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater > > > than SeqCanvas, maybe. Mark? I think it would be okay. > > > > Sounds like the right place to me.... > > indeed - that was where I intended to put it when it was a little more > "polished"... I am just hesitant to use the BioPerl CVS repository to store my > half-baked code. > > There are several things which "don't work right" (tm). I think a lot of this > has to do with the fact that I can not get my hands on the GO.dtd - it isn't > available on the GO website, though all of the other XML files are (yet they > reference the DTD in these same XML files). Neither do I receive a response to > inquiries sent to the consortium e-mail address. > > The consequence is that XML::Parser doesn't know what to do with the HTML-like > formatting tags that they are using in some of their "free text", and in some > cases tries to treat them as sub-level tags (for example, what should be a > subscript or superscript will become a sub-element of the preceeding word, so > Carbon14 parses as $GO->{Carbon}->{14}... which is ridiculous of > course....). In addition they use HTML designations for the greek alpha, beta, > gamma, and so on, preceeded with an ampersand and ending with a semicolon These > can not be parsed by XML::Parser *at all* unless it is specifically told that > these are going to be #CDATA elements... which requires a DTD.... which I don't > have. > > So, GO_Browser (for the time being) hacks away at the XML in its first parsing > pass, replacing these tags with things that will not break XML::Parser, and then > reads from this hacked data. As a result, what you get is not "strict" GO > ontology, but a slightly modified version of the same.... which effectively > defeats the purpose of GO which is that everyone should use a consensus > nomenclature. :-( > > In any case, after all that griping, I am perfectly willing to cvs add this > module to bioperl-gui, so long as I am not judged too harshly by it - I know it's > a hack!! :-) > > I'll get on to that later this afternoon. > > b.t.w. If anyone can assist me in getting ahold of a GO.dtd please speak up! It > would make my miserable life a bit brighter!! > > > -- > --- > Dr. Mark Wilkinson > Bioinformatics Group > National Research Council of Canada > Plant Biotechnology Institute > 110 Gymnasium Place > Saskatoon, SK > Canada > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From birney@ebi.ac.uk Wed Jan 31 18:10:21 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 18:10:21 +0000 (GMT) Subject: [Bioperl-l] RichSeqI In-Reply-To: <3A785254.7E3A11BD@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > Ewan Birney wrote: > > > > =head1 SYNOPSIS > > > > @secondary = $richseq->get_secondary_accessions; > > $division = $richseq->division; > > $mol = $richseq->molecule; > > @dates = $richseq->get_dates; > > $seq_version = $richseq->seq_version; > > > > What about species()? Just popped into my head. Right now a class > implementing both SeqI and RichSeqI doesn't have to have that, > even though it's present in probably most 'rich' databanks. What > do you think about moving it, too? (It's now in Seq.pm.) Hmmmm. I would guess it would go to SeqI. It should be somewhere. I'm agnostic. If we move it out to RichSeq genbank/embl IO have to be able to generate dummy Species lines... > > Hilmar > -- > ----------------------------------------------------------------- > Hilmar Lapp email: hlapp@gmx.net > GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 > ----------------------------------------------------------------- > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From hlapp@gmx.net Wed Jan 31 19:10:30 2001 From: hlapp@gmx.net (Hilmar Lapp) Date: Wed, 31 Jan 2001 11:10:30 -0800 Subject: [Bioperl-l] Bio::Factory::SeqAnalysisParserFactoryI Message-ID: <3A786326.D0DCBFFE@gmx.net> Interface committed. Check out the documentation. If you approve it, I'll add the implementation. The obvious question with regard to SeqFeatureProducer is what will happen to the add_features() method. In principle the implementation is simple enough to just dismiss it; as we already felt a couple of times it doesn't really add that much value. So, let me know what you think. Hilmar -------- Original Message -------- Subject: Bio::Factory Date: Wed, 31 Jan 2001 01:08:46 -0800 From: Hilmar Lapp Organization: Nereis 4 To: Bioperl In an attempt to address revisit/finalization of the SeqAnalysisParser/SeqFeatureProducer stuff, I suggest to accept the design change Ewan proposed couple of weeks ago: ------ Why not have Bio::SeqAnalysisParserFactoryI $parser = $factory->create_parser(-fh => \*FILE); Bio::SeqAnalyisParserI while( $next_feature = $parser->next_feature ) { } same number of functions defined. Twice the number of interfaces, but these are the interfaces I would argue we want. An implementation could implement ParserFactoryI and ParserI in the same module if so wished. ------ For the factory interface I propose to open a new directory Bio::Factory, first to avoid cluttering of other directories, and second because there are many places in BioPerl that can eventually take advantage of a factory design (basically, wherever hard-coded object creation occurs, e.g. in SeqIO::* etc), so that directory hopefully won't stay empty for long. Any objections? If not, I'll give it a go soon. Hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hlapp@gmx.net GNF, San Diego, Ca. 92122 phone: +1 858 812 1757 ----------------------------------------------------------------- From birney@ebi.ac.uk Wed Jan 31 19:15:37 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 19:15:37 +0000 (GMT) Subject: [Bioperl-l] Re: Bio::Factory::SeqAnalysisParserFactoryI In-Reply-To: <3A786326.D0DCBFFE@gmx.net> Message-ID: On Wed, 31 Jan 2001, Hilmar Lapp wrote: > Interface committed. Check out the documentation. If you approve > it, I'll add the implementation. > > The obvious question with regard to SeqFeatureProducer is what > will happen to the add_features() method. In principle the > implementation is simple enough to just dismiss it; as we already > felt a couple of times it doesn't really add that much value. So, > let me know what you think. > I don't like the add_features method much myself... Jason? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@chg.mc.duke.edu Wed Jan 31 20:17:10 2001 From: jason@chg.mc.duke.edu (Jason Stajich) Date: Wed, 31 Jan 2001 15:17:10 -0500 (EST) Subject: [Bioperl-l] Re: Bio::Factory::SeqAnalysisParserFactoryI In-Reply-To: Message-ID: kill it, that's fine. We should instead be providing better example scripts rather than wrapping something that simple into an object since all the work is done by the Seq object. On Wed, 31 Jan 2001, Ewan Birney wrote: > On Wed, 31 Jan 2001, Hilmar Lapp wrote: > > > Interface committed. Check out the documentation. If you approve > > it, I'll add the implementation. > > > > The obvious question with regard to SeqFeatureProducer is what > > will happen to the add_features() method. In principle the > > implementation is simple enough to just dismiss it; as we already > > felt a couple of times it doesn't really add that much value. So, > > let me know what you think. > > > > I don't like the add_features method much myself... Jason? > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From krbou@pgsgent.be Wed Jan 31 21:43:09 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Wed, 31 Jan 2001 22:43:09 +0100 Subject: [Bioperl-l] Cruft in module documentation ? Message-ID: <20010131224309.B24431@gryzo.pgsgent.be> In testing the documentation (SYNOPSIS) part I already fixed some errors (more to come during the coming days), but I don't know what to do with this one (I guess it can be removed). The SYNOPSIS for Bio::Annotation contains [ ...] # # Making an annotation object from scratch # $ann = Bio::Pfam::Annotation->new(); $ann->description("Description text"); print "Annotation description is ", $ann->description, "\n"; I can't find any reference to Bio::Pfam::Annotation, is this a remainder of history ? Kris, From birney@ebi.ac.uk Wed Jan 31 22:03:29 2001 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 31 Jan 2001 22:03:29 +0000 (GMT) Subject: [Bioperl-l] Cruft in module documentation ? In-Reply-To: <20010131224309.B24431@gryzo.pgsgent.be> Message-ID: On Wed, 31 Jan 2001, Kris Boulez wrote: > In testing the documentation (SYNOPSIS) part I already fixed some errors > (more to come during the coming days), but I don't know what to do with > this one (I guess it can be removed). > The SYNOPSIS for Bio::Annotation contains > > [ ...] > # > # Making an annotation object from scratch > # > > $ann = Bio::Pfam::Annotation->new(); > > $ann->description("Description text"); > print "Annotation description is ", $ann->description, "\n"; > > > I can't find any reference to Bio::Pfam::Annotation, is this a remainder > of history ? This is historical cruft. s/Pfam:://g; > > Kris, > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From krbou@pgsgent.be Wed Jan 31 22:32:21 2001 From: krbou@pgsgent.be (Kris Boulez) Date: Wed, 31 Jan 2001 23:32:21 +0100 Subject: [Bioperl-l] Cruft in module documentation ? In-Reply-To: ; from birney@ebi.ac.uk on Wed, Jan 31, 2001 at 10:03:29PM +0000 References: <20010131224309.B24431@gryzo.pgsgent.be> Message-ID: <20010131233221.A24783@gryzo.pgsgent.be> Quoting Ewan Birney (birney@ebi.ac.uk): > On Wed, 31 Jan 2001, Kris Boulez wrote: > > > > > > > I can't find any reference to Bio::Pfam::Annotation, is this a remainder > > of history ? > > This is historical cruft. s/Pfam:://g; > Done. Kris, From Cox, Greg" I know that there are some people on the BioPerl list who went into the same trouble and managed to have some success. Please reply directly to Greg, as it wasn't me who had the question. Hilmar -------- Original Message -------- Subject: [Biojava-l] WinCVS and SSH Date: Wed, 31 Jan 2001 14:08:06 -0500 From: "Cox, Greg" To: biojava-l@biojava.org I'm having problems convincing WinCVS and SSH to play nicely together. I followed the instructions on WinCVS' page, but I can't log in. I can login with ssh (I'm using ssh-1.2.14-win32bin) without typing a password, but when I try to login to cvs, I get, "Set the password authentication first in the preferences !" Did anyone else run across this, and how did you fix it? Greg _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l