[Bioperl-l] Downloading multiple contigs using bioperl
Chris Fields
cjfields at uiuc.edu
Mon Sep 18 16:13:37 UTC 2006
> Hello,
> I think this might be a simple question - but I'm yet a novice...
>
> Is there any way I can download, automatically and at once, all contigs of
> a
> given genome in Genebank, and ideally merge them all into one file? Or do
> I
> have to download every contig separately in order to receive the full
> genome?
>
> In the latter case, is there some sort of list that provides the
> identifiers
> of all contigs of the genome I'm interested in?
>
> Thank you very much,
> Schragi
It depends on the type of sequence record. WGS files contain WGS line
annotation which gives a range of sequence records that can be retrieved:
LOCUS AAFC03000000 131728 rc DNA linear MAM
28-AUG-2006
DEFINITION Bos taurus whole genome shotgun sequencing project.
ACCESSION AAFC00000000
VERSION AAFC00000000.3 GI:112180191
KEYWORDS WGS.
....
FEATURES Location/Qualifiers
source 1..131728
/organism="Bos taurus"
/mol_type="genomic DNA"
/isolate="L1 Dominette 01449"
/db_xref="taxon:9913"
/sex="female"
/note="breed: Hereford"
WGS AAFC03000001-AAFC03131728
WGS_SCAFLD CM000177-CM000206
WGS_SCAFLD CH974204-CH980624
//
The WGS line is the range of single sequences and the scaffolds represent
different scaffold or supercontig builds. The contig files contain the list
of subsequences for the build (which can be pretty complex), but these
aren't necessary if you want the sequence itself. That can be retrieved
directly from GenBank using Bio::DB::GenBank with the default settings; if
you use the web Entrez interface you can get the full sequences by selecting
the format 'GenBank(full)'.
Depending on what you are after, you may be better off downloading the
sequences via ftp, though. Some of these files are very large (~100 MB or
more). Retrieval via Bio::DB::GenBank converts everything into BioPerl
objects before saving, so these files may take a long time if they work at
all.
Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list