[Biopython] fetching chromosome sizes without gff file?
Tommy Carstensen
tommy.carstensen at gmail.com
Wed Mar 22 19:15:17 UTC 2017
Hi Peter,
I ended up doing it like this:
import urllib.request
import operator
url = '
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.14_GRCh37.p13/GCA_000001405.14_GRCh37.p13_assembly_report.txt
'
*with* urllib.request.urlopen(url) *as* response:
d_lengths = {}
*for* l *in* filter(
## Skip if Sequence-Role is not assembled-molecule.
*lambda* x: x[1] == 'assembled-molecule',
## Split line string into a list.
map(operator.methodcaller('split', '\t'),
## Skip header/comment lines and strip newline characters.
map(str.rstrip, filter(
*lambda* x: x[0] != '#',
## Decode with utf-8 from bytes to string.
map(bytes.decode, response))))):
chrom = l[0]
length = l[9]
d_lengths[chrom] = length
On Wed, 22 Mar 2017 at 16:59 Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hmm.
>
> Using the NCBI Entrez API, you could certainly download these as
> FASTA or GenBank files, either of which would give you the length.
> But I don't think that offers GFF files.
>
> I don't work on model organisms, but I'd suggest ENSEMBL might
> be a good bet - but we don't yet have a Biopython module for that?
>
> http://www.ensembl.org/
> https://github.com/biopython/biopython/issues/512
>
> It might be worth looking at bioservices for this?
>
> https://github.com/cokelaer/bioservices
>
> Peter
>
> On Wed, Mar 22, 2017 at 4:24 PM, Tommy Carstensen
> <tommy.carstensen at gmail.com> wrote:
> > Is it possible to get the chromosome lengths in maize (Zea mays), zebra
> fish
> > and humans with Biopython without having the relevant gff files? How
> would I
> > go about doing that? Basically I just want to be able to fetch the gff by
> > typing in species='homo sapiens' and build=37 or something like that
> without
> > having to worry about URLs.
> >
> > Could Biopython also give me the position of the pseudoautosomal regions
> on
> > the X chromosome in Homo sapiens?
> >
> > Thanks,
> > Tommy
> >
> > _______________________________________________
> > Biopython mailing list - Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170322/1a62a689/attachment.html>
More information about the Biopython
mailing list