[Bioperl-l] Reading sequences without parsing them
Ewan Birney
birney@ebi.ac.uk
Mon, 16 Jul 2001 15:20:14 +0100 (BST)
On Mon, 16 Jul 2001, Karger, Amir wrote:
>
> So, sorry about the lack of clarity. Do a s/sequence/entry/g on my original
> email.
>
There is not an in built way to do this inside Bioperl nicely.
options
(a) use IO::String but that will be dependent on the bioperl write_seq
differences - ie, this is not what you want as when we change bioperl
write_seq for a format you will think all your sequences have updates
(b) trust the in built accession.version system for sequences not
annotations
(c) trust the Date line for annotation updates (available in swissprot,
embl , genbank)
If you are paranoid you will need to write your own Digest::MD5 system
based around a string from // to // in the files. This could perhaps
become quite a nice system integrated into the SeqIO system: for example,
I could imagine a complex system like:
# fictional class
use Bio::DB::AutoUpdate.pm;
$auto = Bio::DB::AutoUpdate->new( -file => 'some/file',
-md5 => '/some/place/with/md5',
-record => '//',
-seqio => 'swiss'
-update => 1 # means update md5 on reading
);
# auto update complies to the implict SeqIO interface of next_seq
# but only gives back new MD5 entries
while( (my $updated_entry = $auto->next_seq()) ) {
# do something with updated
}
the MD5 is probably best implemented as a DBM file.
If you wrote something like this that would be great! If you wait 6 months
or so I'll probably get bored on a train sometime and might do it
assumming half a ton of other interesting things are not happening ;)
any other thoughts from people?
> -Amir
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------