[Bioperl-l] Reading sequences without parsing them

Karger, Amir AKarger@CuraGen.com
Mon, 16 Jul 2001 10:04:44 -0400


Amir Karger
Curagen Corporation 

> -----Original Message-----
> 
> Amir - Either you don't understand the bioperl objects very 
> well or I don't understand what you want to do very well.
> 
> Of course we keep a record of the sequence string when 
> reading in sequence
> data, that would defeat the purpose of parsing the data in 
> the first place.

I apologize for not being clear. The problem is that I was translating Seq
objects as "sequence", when it's really a sequence plus annotation. (Is
there a different word I should use for this? How about "entry"?) 

> > I want to create a fingerprint for each sequence I read in, 
> > so that when
> > updated versions of the database come in, I can check to see if the
> > fingerprint changed before I bother doing all the work of parsing &
> > otherwise analyzing the sequence. 
> 
> See the PrimarySeqI::seq method.

I realize that if I just want to fingerprint the sequence, that's easy,
using Seq->seq. But I'm interested in more than just the sequence
information; I would like to know if sequence *or* annotation information
has changed. So I'd like to fingerprint the entire entry, i.e., the whole
string that gets printed out when you do RichSeq->write_seq.
 
> Before we go any further, most major sequence databases have 
> version numbers now so you should not even need to be doing this yourself,

> can't you verify the version number instead of doing this or are you
getting 
> sequence from someone who does not do this?

Well, take Swiss-Prot for example, Obviously, they're going to have a new
version every once in a while. But when they do, I'd prefer to update only
those entries that have changed (and add any new entries) rather than
parsing all 80000 entries and examining the sequence and each annotation to
see if it has changed.
  
So, sorry about the lack of clarity. Do a s/sequence/entry/g on my original
email.

-Amir