[Bioperl-l] my bioperl-db hacks
Hilmar Lapp
hlapp at gmx.net
Tue Dec 30 14:14:09 EST 2003
Note that bioperl-db 0.1 has been outdated since about a year now. It
won't work with the present biosql schema either. In order to use 0.1
you will also need to use a pre-Singapore version of biosql.
The current and interoperating versions of bioperl-db and biosql are
the respective cvs HEADs.
-hilmar
On Tuesday, December 30, 2003, at 09:00 AM, T.D. Houfek wrote:
> I'm monkeying around with bioperl-db 0.1, trying to see what I can get
> it to do. I set about following some instructions that tell
> you how to use the "load_seqdatabase.pl" script to fill your bioperl
> database with sequence from a swissprot release file. (I am using
> sprot42.dat). This did not work for me initally, but I made some
> vicious hacks to the code and now the script seems to work more or
> less. It's this "more or less" I'd like comments on... I suspect other
> things may have broken because of what I have done, and that someone
> who
> knows the code can help me to find a more stable solution.
>
> I think the problem is arising when in parsing the sprot42.dat file,
> Bioperl encounters a record with a feature whose location must be
> expressed as a Bio::Location::Fuzzy object. The inline documentation
> of
> biosqldb-mysql indicates that Fuzzy objects are not supported yet
> (but gives you an idea of where you could start if you wished to do
> so).
>
> Anyway, I first encountered an exception around line 169, of
> Bio/DB/SQL/SeqLocationAdaptor.pm where a check is made to see whether
> $location->isa() isa the righta kinda of object.
>
> I just added the Fuzzy objects to the list of invited guests:
>
> # --start snippet ---------------------
> if( $location->isa('Bio::Location::SplitLocationI') ) {
> my $rank = 1;
> foreach my $sub ( $location->sub_Location ) {
> $self->_store_component($sub,$seqfeature_id,$rank);
> $rank++;
> }
> } elsif( $location->isa('Bio::Location::Simple') ) {
> $self->_store_component($location,$seqfeature_id,1);
> } elsif( $location->isa('Bio::Location::Fuzzy') ) {
> $self->_store_component($location,$seqfeature_id,1);
> } else {
> $self->throw("Not a simple location nor a split nor a
> fuzzy. Says its a $location->type. Yikes");
>
> }
> # -- end snippet ----------------------
>
>
> Once I fixed this the only thing that broke was around line 208.
> Probably because of the normal behavior supporting Fuzzy locations (but
> of course I mention it in case it is bad behavior) some locations
> passing
> through this section of code were missing either starts or ends. The
> $start and $end variables were set to the null string, and the SQL
> insert
> sequence they were passed into failed. Failure in depositing one entry
> would terminate the script (but did not undo prior inserts).
>
> With a two-line hack circa 208 I sidestepped outright failures. I just
> made forced uninitialized endpoints to be zero:
>
> # -- start snippet -------
>
> unless ($end) { $end=0; } ## ADDED THESE TWO
> unless ($start) { $start=0; } ## LINES HERE
>
> my $sth = $self->prepare("insert into seqfeature_location
> (seqfeature_location_id,seqfeature_id,seq_start,seq_end,seq_strand,loca
> tion_rank) VALUES (NULL,$seqfeature_id,$start,$end,$strand,$rank)");
>
> # -- end snippet ---------
>
> Of course all I have really done is provide for a completely buggy
> persistence of Fuzzy objects.
>
> My guess is that SeqLocationAdaptor needs to be upgraded to handle the
> Fuzzy locations that Bioperl wants to make out of the Swissprot input.
> Is anyone already undertaking this? Does anyone have any insight
> about what
> problems this hack of mine will cause downstream?
>
>
> -------------------------------
> T.D. Houfek
> (email sound-alike: tdhoufek-AT-unity-DOT-ncsu-DOT-edu
> bioinformatics development lead
> Tobacco Genome Initiative
> North Carolina State University
> -------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list