[Bioperl-l] Re: No joins
Hilmar Lapp
hlapp@gnf.org
Fri, 16 Aug 2002 10:52:44 -0700
Can someone point me to a sensible location string (fuzzy or not) that is not round-tripped semantically correct by the following code: (you need a recent CVS snapshot for this to work)
#!/usr/local/bin/perl -w
use Bio::Factory::FTLocationFactory;
die "give location string as first arg" unless @ARGV;
my $locfact = Bio::Factory::FTLocationFactory->new();
my $loc = $locfact->from_string($ARGV[0]);
print $loc->to_FTstring(),"\n";
The semantically correct caveat means that complement(join(...)) will come out as join(complement(...),complement(...),...). The sensibility caveat means without nested complements and nested joins.
The idea is to give you the best of both worlds, as far as reasonably possible. If you happen to intend to do something with fuzzies other than yanking them back out as strings, you can, but you don't have to. If you want to strip your feature annotation from features with fuzzy locations such that they're not even attempted to be parsed, here's how to do it:
1) Create a module, say NoFuzziesLocFact.pm.
2) Make it inherit off Bio::Factory::FTLocationFactory.
3) Override method from_string() as follows:
sub from_string {
my ($self,$locstr) = @_;
return undef if $self->is_fuzzy_loc($locstr);
return $self->SUPER::from_string($locstr);
}
4) Implement is_fuzzy_loc($locstr) to return TRUE or FALSE, by e.g. regexp matching (*)
5) Whenever you instantiate a SeqIO stream, add the argument
-location_factory => NoFuzziesLocFact->new()
6) When parsing the stream, ignore the warnings, or call $seqio->verbose(-1)
What I'm trying to say is the layout is quite flexible and doesn't (shouldn't) lock you into either of the fields.
(*) There's a certain complication though: (126^130) is a fuzzy location, whereas (126^127) is not.
-hilmar
> -----Original Message-----
> From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
> Sent: Friday, August 16, 2002 8:15 AM
> To: lstein@cshl.org
> Cc: Ewan Birney; brian.king@animorphics.net; bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Re: No joins
>
>
> If you talk to the curation teams, either for EMBL or SWISSPROT, they
> always say that fuzzies are purely for human consumption and
> come with a
> public health warning stronger than that attached to any
> tobacco products.
>
> IMHO the only point in representing them in all their goarey
> detail is
> that some people want to be able to round-trip fuzzies through our
> object models, and the effort associated with producing an
> API that does
> much more than this is not worth the hastle.
>
> Matthew
>
> Lincoln Stein wrote:
> > Can someone point me to an example of an algorithm that
> makes use of fuzzies,
> > either from the ASCII art representation or from ASN.1. Or
> are the fuzzies
> > intended only for human consumption?
> >
> > Lincoln
> >
> > On Thursday 15 August 2002 12:36 pm, Ewan Birney wrote:
> >
> >>Brian -
> >>
> >>
> >>I would certainly agree with you that joins are bad, and in
> fact Bioperl
> >>originally had a heirarchical feature only system and joins
> implicitly
> >>went into these cases.
> >>
> >>
> >>
> >>However as more people used it being able to store and
> process 100% of
> >>EMBL/GenBank became a priority, and we bolted on the
> location stuff -
> >>location stuff was really driven in by the fuzzies (aaaah,
> the fuzzies)
> >>which are distinctly hard to handle inside heirarichal
> features (what does
> >>biojava do with the fuzzies?) but most fuzzies are also
> joins, (in fact
> >>alot of joins have fuzzy ends) so... it became the defacto
> way to handle
> >>joins.
> >>
> >>
> >>Of course the frustrating thing is that noone *can* use the
> fuzzies but
> >>the semantic interpretation of fuzzies is just...
> impossible to remain
> >>cosnsistent across more than 2 records. Fuzzies are for
> human warm-fuzzy
> >>feelings that the data format is representing everything
> they know and is
> >>just a semantic mire for computers.
> >>
> >>
> >>
> >>I agree it gives us so much semantic rope to hang ourselves
> with it is
> >>scary. But there is not an obvious ideal solution:
> >>
> >> - somehow represent all things inside hierarchial
> features, including
> >>the fuzzies (brain-ache)
> >>
> >> - not handle 100% of Genbank (means a large number of
> uses cases fail)
> >>
> >>
> >>
> >>If there is something obvious I am missing here, shout, but this is
> >>somewhere between rock-and-hard-place in my experience.
> >>
> >>
> >>
> >>Practical question - what does BioJava do with the Fuzzies?
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>-----------------------------------------------------------------
> >>Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> >><birney@ebi.ac.uk>.
> >>-----------------------------------------------------------------
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l@bioperl.org
> >>http://bioperl.org/mailman/listinfo/bioperl-l
> >
> >
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>