[Bioperl-l] Reading sequences without parsing them
Ewan Birney
birney@ebi.ac.uk
Mon, 16 Jul 2001 17:15:56 +0100 (BST)
On Mon, 16 Jul 2001, Andrew Dalke wrote:
> Elia:
> >I wonder if I am on the right track, but doesn't it sound very much like
> >this problem would benefit very much from the biopython solution, where
> >the parser only parses some parts speeding up the finding of updated ones?
>
> Yes, it does. You can tell the biopython parser to generate
> events for the sequence and feature blocks and get all the text from
> those areas to generate your fingerprint.
>
> The usefulness of this approach depends on how many trivial
> changes occur in the database record. Suppose there are none.
> >From my timings with SWISS-PROT, reading only a few records (needed
> for FASTA) is about 60% faster than reading all the records needed
> for the full object model. You'll need to do two passes over
> the record, so
> time to check = 0.4 T
> time to check then parse = 1.4 T
>
> x * 0.4 + (1-x) * (1 + 0.4) == 1 when x == 40%, so you only get
> a win if fewer than 40% of the records have changed.
But the win is not in the parsing but what you do with the parsed
object. Anyway, I think this will eventually be post 1.0 fodder not
pre. To be hashed out at BOSC.
>
> Andrew
> dalke@dalkescientific.com
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------