[Biopython] gff3 file
p.j.a.cock at googlemail.com
Tue Jun 2 10:49:41 UTC 2015
On Tue, Jun 2, 2015 at 11:32 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Jun 2, 2015 at 11:11 AM, <Atteyet-Alla.Yassin at ukb.uni-bonn.de> wrote:
>> I would like to convert a gff file (which I recieved on converting a
>> sequence in Genbank format using bioperl) in table e.g. like the following
>> Seqname Source feature Start End Score Strand Frame Attributes
>> chr1 hg19_gold exon 67088326 67183780 0,000000 + . gene_id "AL139147.7";
>> transcript_id "AL139147.7"
>> In my gff file you will observe the following :
>> Lines are doubled i.e repeated e.g.
>> CP008802 Genbank gene 417 638 . + . ID=FB03_00010
>> CP008802 Genbank CDS 417 638 . + .
> I assume this is a continuation of your past email, i.e.
> You posted the full GFF file then:
> Note that these "repeated" GFF files are normal - you have a line
> describing a "gene" at 417..638, and a matching "CDS" at 417..638.
> In the original GenBank file there would also have been two entries
> for the "gene" and "CDS".
> So, given this example gene/CDS, what would you like to have
> in the output file? Maybe something like this?
> Seqname Source feature Start End Score Strand Frame Attributes
> CP008802 Genbank gene 417 638 0,000000 + . gene_id "FB03_00010";
> transcript_id "FB03_00010"
You've not explained this file format, so I am guessing here
(e.g. should start/end be counting from one, should the frame
be just plus or minus, should feature be of type "gene"?).
I would work from the original GenBank file rather than a
conversion to GFF which may introduce additional problems.
There's an example at the end of this email - but note this
does not handle complex locations like FB03_00005 which
appears to span the origin.
from Bio import SeqIO
with open("CP008802.txt", "w") as output:
for record in SeqIO.parse("CP008802.gbk", "genbank"):
print("Converting %s" % record.name)
for f in record.features:
if f.type != "gene":
locus_tag = f.qualifiers["locus_tag"]
if len(f.location.parts) > 1:
print("What should we do for %s (compound location)?
%s" % (locus_tag, f.location))
% (record.name, f.type,
f.location.start + 1, f.location.end,
More information about the Biopython