[Bioperl-l] World's smallest format converter
Mike Muratet
muratem at eng.uah.edu
Fri Sep 15 15:37:34 UTC 2006
Greetings
This subject comes up periodically, but I have a question about the design
approach. I have a simple script that reads a Genbank record, checks the
species data, and writes it out as Fasta that I use to build
species-specific BLAST databases from Genbank files. I noted that some
times the Fasta header contains locus rather than accession. I looked at
the source for the SeqIO methods and the default for writing Fasta is the
display id, which defaults to locus when reading Genbank. Many times
the locus equals the accession, but sometimes it does not. There is a
comment in genbank.pm "there can be multiple accessions". Does anybody
have any experience with this, and what happens if there are? Was locus
picked for the display id because it is more likely to be unique? I see
that one can select which flavor of id gets printed in the fasta header,
but I'm curious about what to expect if I select 'accession'.
Thanks
Mike
More information about the Bioperl-l
mailing list