[Bioperl-l] World's smallest format converter

Mike Muratet muratem at eng.uah.edu
Fri Sep 15 15:37:34 UTC 2006


This subject comes up periodically, but I have a question about the design 
approach. I have a simple script that reads a Genbank record, checks the 
species data, and writes it out as Fasta that I use to build 
species-specific BLAST databases from Genbank files. I noted that some 
times the Fasta header contains locus rather than accession. I looked at 
the source for the SeqIO methods and the default for writing Fasta is the 
display id, which defaults to locus when reading Genbank. Many times 
the locus equals the accession, but sometimes it does not. There is a 
comment in genbank.pm "there can be multiple accessions". Does anybody 
have any experience with this, and what happens if there are? Was locus 
picked for the display id because it is more likely to be unique? I see 
that one can select which flavor of id gets printed in the fasta header, 
but I'm curious about what to expect if I select 'accession'.



More information about the Bioperl-l mailing list