[Bioperl-l] Use of Bio namespace

Keith James kdj@sanger.ac.uk
10 Oct 2000 14:07:04 +0100


>>>>> "Ewan" == Ewan Birney <birney@ebi.ac.uk> writes:

    Ewan> I would hope that sequence/feature stuff could be merged
    Ewan> inside bioperl but there

    Ewan> 	(a) maybe good reasons not to

    Ewan> 	(b) you may not want to ;)

    Ewan> both of which are sensible compliants.

I've been having a look at making my Sequence class SeqI compliant and
my Feature class SeqFeatureI compliant. It's been a bit tricky trying
to work out what is the best way to treat fuzzy ranges (which I've
supported) in a bioperl Seq.

    Ewan> I guess - hmmmmmm - this is hard. I suspect the right thing
    Ewan> to do is

    Ewan> 	- for really different stuff, eg Ecology, it should
    Ewan> get its own top-level namespace.

    Ewan> 	- for similar stuff, people should negoiate a
    Ewan> namespace that can be kept separate for their work, for
    Ewan> example, I could imagine

    Ewan> 	Bio::Expression::

    Ewan> being given out to a separate expression focused
    Ewan> group. Bio::TreeOfLife would be another one.

    Ewan> 	I guess anything molecular biology orientated should
    Ewan> end up inside Bio:: but by no means handled by Bioperl.

    Ewan> I certainly don't want to stop anyone submitting anything to
    Ewan> CPAN, so make a proposal for what you want to submit or how
    Ewan> you would best like it done.

Okay. It's good to know roughly where things are going even if none of
our modules are released (if I put things under Bio:: on my or the
Pathogen Sequencing Unit's local Perl lib).

    Ewan> I would also encourage you to

    Ewan> 	- if possible, work with bioperl or criticise bioperl
    Ewan> if it wasn't good enough for what you wanted to do.

It seems like bad form to criticise when I haven't contributed very
much to bioperl (if I don't like it, I should fix it...). I had a go
at hacking bioperl a while back but found my limitations (never
written a Perl module, knew nothing about OO coding) so I needed to
write some stuff from scratch to see how it all worked.

Stuff I wanted was:

 Non-fussy but fairly complete EMBL parsing

 Terse, but intuitive manipulation of feature qualifiers in scripts

 Features with & without sequence

 Clone, trim, reverse-complement sequences with all the features
 attached

 Fuzzy ranges (parsed from EMBL, supported in other operations)

 Low memory Blast parsing

 Fasta search output parsing

I'm in a better position to work on bioperl now, but still find a lot
of it hard to follow (esp. where the methods have no documentation -
this isn't just me, I know others who have been discouraged from
working on it for this reason).

As I'm sure you can appreciate, there is the time aspect to this as
well. Annotation projects need to keep to deadlines and if writing a
new module is significantly quicker than modifying an existing one,
that's the way it goes.

To be honest, these modules were not originally intended for release
(hence their cutesy and non-CPAN acceptable names). However they have
since been used in some scripts (cos we've found them easier than
bioperl) which we now need to distribute, so the issue has come up. I
would prefer to integrate at some point, if possible.

cheers,

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA