[Biopython] renaming IDs in gff3 with BCBio.GFF
Mic
mictadlo at gmail.com
Thu Jan 23 05:11:24 UTC 2020
Hi,
I wrote a script which should changes IDs in a GFF3 file. Unfortunately,
the below script has two problems.
1. It attaches the new ID to `Parent` which leads that `Parent` contains
the old and new id. How is it possible to keep only the new one?
2. How is it possible to access the chromosome name?
This how to run the script `*python renameIDgff3.py --gff3 braker_utr.gff3
--prefix ACTG --output braker_utr-newID.gff3*`
* #!/usr/bin/python3 import click from BCBio.GFF import GFFExaminer
from BCBio import GFF @click.command() @click.option('--gff3',
help="Provide GFF3 file", required=True) @click.option('--prefix',
help="e.g. ASSCTG", required=True) @click.option('--output', help="Keep
GFF3 file", required=True) def run(gff3, prefix, output):
print("Hello") with open(output, "w") as out_handle: for
rec in GFF.parse(gff3): for count, feature
in enumerate(rec.features): print("count", count)
print(feature)
print(feature.qualifiers.get("Name"))
print(feature.sub_features)
print(feature.sub_features[0].qualifiers.get("Name"))
print("!!!!change") feature.qualifiers["ID"] = prefix +
str(count).zfill(6)
print(feature.sub_features[0].qualifiers["ID"])
id_extension = feature.sub_features[0].qualifiers["ID"][0].split('.')[1]
feature.sub_features[0].qualifiers["ID"] = prefix +
str(count).zfill(6) + '.' + id_extension
print(feature.sub_features[0].qualifiers["Parent"])
print("-----------") GFF.write([rec], out_handle)
if __name__ == '__main__': run()*
Input file:
* NbV1Ch08 AUGUSTUS gene 7015 29794 0.01 - . ID=g1; NbV1Ch08 AUGUSTUS
mRNA 7015 29794 0.01 - . ID=g1.t1;Parent=g1 NbV1Ch08 AUGUSTUS
transcription_end_site 7015 7015 . - . Parent=g1.t1; NbV1Ch08 AUGUSTUS
three_prime_utr 7015 8531 0.2 - . ID=g1.t1.3UTR1;Parent=g1.t1 NbV1Ch08
AUGUSTUS exon 7015 8747 . - . ID=g1.t1.exon1;Parent=g1.t1; NbV1Ch08
AUGUSTUS stop_codon 8532 8534 . - 0 Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS
8532 8747 0.31 - 0 ID=g1.t1.CDS1;Parent=g1.t1 NbV1Ch08 AUGUSTUS intron
8748 9191 0.49 - . Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 9192 9342 0.66 -
1 ID=g1.t1.CDS2;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 9192 9342 . - .
ID=g1.t1.exon2;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 9343 9915 0.58 - .
Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 9916 10006 0.71 - 2
ID=g1.t1.CDS3;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 9916 10006 . - .
ID=g1.t1.exon3;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 10007 10101 0.74 -
. Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 10102 10201 0.78 - 0
ID=g1.t1.CDS4;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 10102 10201 . - .
ID=g1.t1.exon4;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 10202 10712 0.8 -
. Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 10713 11107 0.11 - 2
ID=g1.t1.CDS5;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 10713 11107 . - .
ID=g1.t1.exon5;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 11108 11569 0.07 -
. Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 11570 12151 0.09 - 2
ID=g1.t1.CDS6;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 11570 12151 . - .
ID=g1.t1.exon6;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 12152 12588 0.34 -
. Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 12589 12717 0.39 - 2
ID=g1.t1.CDS7;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 12589 12717 . - .
ID=g1.t1.exon7;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 12718 12789 0.42 -
. Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 12790 13075 0.39 - 0
ID=g1.t1.CDS8;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 12790 13075 . - .
ID=g1.t1.exon8;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 13076 14832 0.51 -
. Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 14833 15009 0.39 - 0
ID=g1.t1.CDS9;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 14833 15009 . - .
ID=g1.t1.exon9;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 15010 15278 0.59 -
. Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 15279 15415 0.56 - 2
ID=g1.t1.CDS10;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 15279 15415 . - .
ID=g1.t1.exon10;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 15416 15487 0.58
- . Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 15488 15612 0.96 - 1
ID=g1.t1.CDS11;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 15488 15612 . - .
ID=g1.t1.exon11;Parent=g1.t1; NbV1Ch08 AUGUSTUS intron 15613 15706 0.96
- . Parent=g1.t1; NbV1Ch08 AUGUSTUS CDS 15707 15957 0.98 - 0
ID=g1.t1.CDS12;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 15707 15958 . - .
ID=g1.t1.exon12;Parent=g1.t1; NbV1Ch08 AUGUSTUS start_codon 15955 15957
. - 0 Parent=g1.t1; NbV1Ch08 AUGUSTUS five_prime_utr 15958 15958 0.99 -
. ID=g1.t1.5UTR1;Parent=g1.t1 NbV1Ch08 AUGUSTUS five_prime_utr 27458
28250 0.37 - . ID=g1.t1.5UTR2;Parent=g1.t1 NbV1Ch08 AUGUSTUS exon 27458
28250 . - . ID=g1.t1.exon13;Parent=g1.t1; NbV1Ch08 AUGUSTUS
five_prime_utr 29272 29794 0.08 - . ID=g1.t1.5UTR3;Parent=g1.t1 NbV1Ch08
AUGUSTUS exon 29272 29794 . - . ID=g1.t1.exon14;Parent=g1.t1; NbV1Ch08
AUGUSTUS transcription_start_site 29794 29794 . - . Parent=g1.t1;*
Output file:
* ##gff-version 3 ##sequence-region NbV1Ch08 1 129222376 NbV1Ch08
AUGUSTUS gene 7015 29794 0.01 - . ID=ACTG000000 NbV1Ch08 AUGUSTUS mRNA
7015 29794 0.01 - . ID=ACTG000000.t1;Parent=g1,ACTG000000 NbV1Ch08
AUGUSTUS transcription_end_site 7015 7015 . - . Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS three_prime_utr 7015 8531 0.2 - .
ID=g1.t1.3UTR1;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS exon 7015
8747 . - . ID=g1.t1.exon1;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
stop_codon 8532 8534 . - 0 Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
CDS 8532 8747 0.31 - 0 ID=g1.t1.CDS1;Parent=g1.t1,ACTG000000.t1 NbV1Ch08
AUGUSTUS intron 8748 9191 0.49 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08
AUGUSTUS CDS 9192 9342 0.66 - 1 ID=g1.t1.CDS2;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS exon 9192 9342 . - .
ID=g1.t1.exon2;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS intron 9343
9915 0.58 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS CDS 9916
10006 0.71 - 2 ID=g1.t1.CDS3;Parent=g1.t1,ACTG000000.t1 NbV1Ch08
AUGUSTUS exon 9916 10006 . - . ID=g1.t1.exon3;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS intron 10007 10101 0.74 - . Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS CDS 10102 10201 0.78 - 0
ID=g1.t1.CDS4;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS exon 10102
10201 . - . ID=g1.t1.exon4;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
intron 10202 10712 0.8 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
CDS 10713 11107 0.11 - 2 ID=g1.t1.CDS5;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS exon 10713 11107 . - .
ID=g1.t1.exon5;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS intron 11108
11569 0.07 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS CDS 11570
12151 0.09 - 2 ID=g1.t1.CDS6;Parent=g1.t1,ACTG000000.t1 NbV1Ch08
AUGUSTUS exon 11570 12151 . - . ID=g1.t1.exon6;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS intron 12152 12588 0.34 - . Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS CDS 12589 12717 0.39 - 2
ID=g1.t1.CDS7;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS exon 12589
12717 . - . ID=g1.t1.exon7;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
intron 12718 12789 0.42 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
CDS 12790 13075 0.39 - 0 ID=g1.t1.CDS8;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS exon 12790 13075 . - .
ID=g1.t1.exon8;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS intron 13076
14832 0.51 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS CDS 14833
15009 0.39 - 0 ID=g1.t1.CDS9;Parent=g1.t1,ACTG000000.t1 NbV1Ch08
AUGUSTUS exon 14833 15009 . - . ID=g1.t1.exon9;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS intron 15010 15278 0.59 - . Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS CDS 15279 15415 0.56 - 2
ID=g1.t1.CDS10;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS exon 15279
15415 . - . ID=g1.t1.exon10;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
intron 15416 15487 0.58 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
CDS 15488 15612 0.96 - 1 ID=g1.t1.CDS11;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS exon 15488 15612 . - .
ID=g1.t1.exon11;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS intron
15613 15706 0.96 - . Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS CDS
15707 15957 0.98 - 0 ID=g1.t1.CDS12;Parent=g1.t1,ACTG000000.t1 NbV1Ch08
AUGUSTUS exon 15707 15958 . - . ID=g1.t1.exon12;Parent=g1.t1,ACTG000000.t1
NbV1Ch08 AUGUSTUS start_codon 15955 15957 . - 0
Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS five_prime_utr 15958 15958
0.99 - . ID=g1.t1.5UTR1;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
five_prime_utr 27458 28250 0.37 - .
ID=g1.t1.5UTR2;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS exon 27458
28250 . - . ID=g1.t1.exon13;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
five_prime_utr 29272 29794 0.08 - .
ID=g1.t1.5UTR3;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS exon 29272
29794 . - . ID=g1.t1.exon14;Parent=g1.t1,ACTG000000.t1 NbV1Ch08 AUGUSTUS
transcription_start_site 29794 29794 . - . Parent=g1.t1,ACTG000000.t1*
Thank you in advance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20200123/3a216dcf/attachment-0003.htm>
More information about the Biopython
mailing list