[Bioperl-l] Reading sequences without parsing them

Ewan Birney birney@ebi.ac.uk
Mon, 16 Jul 2001 17:15:56 +0100 (BST)


On Mon, 16 Jul 2001, Andrew Dalke wrote:

> Elia:
> >I wonder if I am on the right track, but doesn't it sound very much like
> >this problem would benefit very much from the biopython solution, where
> >the parser only parses some parts speeding up the finding of updated ones?
> 
> Yes, it does.  You can tell the biopython parser to generate
> events for the sequence and feature blocks and get all the text from
> those areas to generate your fingerprint.
> 
> The usefulness of this approach depends on how many trivial
> changes occur in the database record.  Suppose there are none.
> >From my timings with SWISS-PROT, reading only a few records (needed
> for FASTA) is about 60% faster than reading all the records needed
> for the full object model.  You'll need to do two passes over
> the record, so
>   time to check = 0.4 T
>   time to check then parse = 1.4 T
> 
> x * 0.4 + (1-x) * (1 + 0.4) == 1 when x == 40%, so you only get
> a win if fewer than 40% of the records have changed.


But the win is not in the parsing but what you do with the parsed
object. Anyway, I think this will eventually be post 1.0 fodder not
pre. To be hashed out at BOSC.




 > 
>                     Andrew
>                     dalke@dalkescientific.com
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------