[BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ?

Peter biopython at maubp.freeserve.co.uk
Wed Jan 5 09:39:08 UTC 2011


On Wed, Jan 5, 2011 at 8:58 AM, 徐朋 <xupeng86 at gmail.com> wrote:
>
> Is there any tools can convert a bacteria_accession number( hole genome) to
> ffn format( gene multi fasta) ,

You can download *.ffn files from the NCBI's FTP site, e.g.
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/

If you want most/all of the available genomes as ffn files, I would
just download them all as a gzipped file:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.ffn.tar.gz

Alternatively, you can probably do this via the NCBI Entrez API.
I've not tried through. My guess is you'd need to map the genome
accession to a list of gene IDs (using ELink), then fetch them
as FASTA entries (using EFetch).

> or  can convert sequence in biosql to genbank files ?
>
> Many thanks!

If you have loaded the genomes into a BioSQL database (e.g.
from the GenBank files), then you can easily get the genomes
back again as SeqRecord objects, and save those as GenBank
files. However, in order to get the nucleotide sequences of the
genes you would have to use the SeqFeature objects and their
extract method.

Peter




More information about the BioSQL-l mailing list