[Bioperl-l] parsing protein accession numbers and types from >fasta headers

Wed Sep 13 10:50:08 UTC 2006

I'd like to write a script to parse fasta headers of fasta-formatted protein 
databases and get protein accession numbers and identifiers (uniprot, IPI, 
gi, Refseq, ensembl...). The idea is building a simple local database that 
relates an accession number for protein sequence with all valid identifiers 
and the fasta files from where they weher obtained at my system, or 
checking, for instance, if an uniprot accession exists for a given gi. 
However, the structure of the fasta header is quite variable depending on 
the source. Any suggestions?

_________________________________________________________________
Horóscopo, tarot, numerología... Escucha lo que te dicen los astros. 
http://astrocentro.msn.es/