[Biopython] matching sequences from fasta files
    Ivan Rossi 
    ivan at biodec.com
       
    Wed Mar 10 11:15:38 UTC 2010
    
    
  
On Wed, 10 Mar 2010, Peter wrote:
> For the special case of looking for perfect matches, you would be fine
> with just Python - depending on your data files, you may be able to
> match on the record identifiers
Don't trust that. We have seen many many times the sequence change over 
time (in different releases of the databases) while keeping the same id.
it is much more robust to compare SHA1 (or MD5) hashes of the sequence, or 
do string comparisons.
> or simply do string comparisons of the sequences.
This is OK.
--
Ivan Rossi, PhD - ivan AT biodec dot com OR ivan dot rossi3 AT unibo dot it
BioDec Srl, Via Calzavecchio 20/2, I-40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com
    
    
More information about the Biopython
mailing list