[Bioperl-l] New package to compare two SeqI-implementing objects

Mark A. Jensen maj at fortinbras.us
Mon Feb 1 03:47:05 UTC 2010


Daniel-- this sounds interesting and useful, I +1 it. Your intuition about
in-memory vs streaming sounds correct to me; features can be many, and
diffing many (MANY) sequences may bork. Maybe our feature-rich users
can chime in. (...however, I did just hear about a magic spell called 
'File::Map',
might check that out on CPAN.)
cheers- MAJ
----- Original Message ----- 
From: "Daniel Renfro" <bluecurio at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 31, 2010 10:22 PM
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects


> Hello all,
>
> A colleague and I have been working on a (Bio)Perl package to compare two
> Seq objects. This is in response to a need we found in our lab -- we wanted
> to see the changes to GenBank files through time, but wanted an automated
> way to do this. This led to what I'm calling the SeqDiff.pm package. I
> thought it would be a good idea to inform the community and get some
> feedback.
>
> The package takes two Seq objects as arguments, arbitrarily called "old" and
> "new." It then matches the features from the old object with the new object.
> This is done based on some criteria -- in our case we decided the features
> must be of the same type (have the same primary_tag) and have at least one
> matching database cross-reference (db_xref) in common.  The left-over
> features (ones that did not have a match) are dropped into arrays called
> "lost" and "gained." The matching is done in about NlogN time, as each
> matching pair are removed from subsequent searches.
>
> The matched features and iterated through and the differences are
> calculated. Each feature is examined recursively and any differences are
> reported. Optionally you can give the new() method a flag so that everything
> is returned (differences and similarities.) You can set callbacks for
> different types of objects (like anything that isa('Bio::LocationI')) if you
> want a custom comparison for specific BioPerl objects. This comparison step
> is the computationally slow part, and currently everything is held in
> memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
> next() and last() methods.
>
> Maybe this was a little verbose, but that is the SeqDiff package in a
> nutshell. I hope to soon release v1.0. If you have any questions or comments
> I'd love to hear them.
>
> -Daniel Renfro
>
> Hu Lab Research Associate
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4055
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 




More information about the Bioperl-l mailing list