[Biopython] fetching chromosome sizes without gff file?

Peter Cock p.j.a.cock at googlemail.com
Wed Mar 22 21:34:17 UTC 2017


Hi Tommy,

I'm glad you've found a solution, and thank you for sharing it here.

Peter


On Wed, Mar 22, 2017 at 7:15 PM, Tommy Carstensen
<tommy.carstensen at gmail.com> wrote:
> Hi Peter,
>
> I ended up doing it like this:
>
> import urllib.request
>
> import operator
>
> url =
> 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.14_GRCh37.p13/GCA_000001405.14_GRCh37.p13_assembly_report.txt'
>
> with urllib.request.urlopen(url) as response:
>
>     d_lengths = {}
>
>     for l in filter(
>
>         ## Skip if Sequence-Role is not assembled-molecule.
>
>         lambda x: x[1] == 'assembled-molecule',
>
>         ## Split line string into a list.
>
>         map(operator.methodcaller('split', '\t'),
>
>             ## Skip header/comment lines and strip newline characters.
>
>             map(str.rstrip, filter(
>
>             lambda x: x[0] != '#',
>
>             ## Decode with utf-8 from bytes to string.
>
>             map(bytes.decode, response))))):
>
>         chrom = l[0]
>
>         length = l[9]
>
>         d_lengths[chrom] = length
>
>
> On Wed, 22 Mar 2017 at 16:59 Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> Hmm.
>>
>> Using the NCBI Entrez API, you could certainly download these as
>> FASTA or GenBank files, either of which would give you the length.
>> But I don't think that offers GFF files.
>>
>> I don't work on model organisms, but I'd suggest ENSEMBL might
>> be a good bet - but we don't yet have a Biopython module for that?
>>
>> http://www.ensembl.org/
>> https://github.com/biopython/biopython/issues/512
>>
>> It might be worth looking at bioservices for this?
>>
>> https://github.com/cokelaer/bioservices
>>
>> Peter
>>
>> On Wed, Mar 22, 2017 at 4:24 PM, Tommy Carstensen
>> <tommy.carstensen at gmail.com> wrote:
>> > Is it possible to get the chromosome lengths in maize (Zea mays), zebra
>> > fish
>> > and humans with Biopython without having the relevant gff files? How
>> > would I
>> > go about doing that? Basically I just want to be able to fetch the gff
>> > by
>> > typing in species='homo sapiens' and build=37 or something like that
>> > without
>> > having to worry about URLs.
>> >
>> > Could Biopython also give me the position of the pseudoautosomal regions
>> > on
>> > the X chromosome in Homo sapiens?
>> >
>> > Thanks,
>> > Tommy
>> >
>> > _______________________________________________
>> > Biopython mailing list  -  Biopython at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list