[Bioperl-l] generate ptt file from Genbank file

Torsten Seemann torsten.seemann at infotech.monash.edu.au
Mon Sep 18 00:36:48 UTC 2006


Rafi,

> I am trying to generate a .ptt file like the NCBI ptt file, which basically contains the gene co-ordiante information, its strand, name. I have a Genbank file from which i want to generate this ptt file.
> Is there any BioPerl module which can do the same, or any sample script which I can may be modify and use.
> Thanks in advance for your reply.

I don't think there is any BioPerl script to do it.
And Bio::FeatureIO doesn't support PTT - I will try and add it soon.
Until then, below is a sample script to work with!

Hope it helps,

--Torsten


#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;

# This script takes a GenBank file as input, and produces a
# NCBI PTT file (protein table) as output. A PTT file is
# a line based, tab separated format with fixed column types.
#
# Written by Torsten Seemann
# 18 September 2006

my $gbk = Bio::SeqIO->new(-fh=>\*STDIN, -format=>'genbank');
my $seq = $gbk->next_seq;
my @cds = grep { $_->primary_tag eq 'CDS' } $seq->get_SeqFeatures;

print $seq->description, " - 0..",$seq->length,"\n";
print scalar(@cds)," proteins\n";
print join("\t", qw(Location Strand Length PID Gene Synonym Code COG 
Product)),"\n";

for my $f (@cds) {
   my $gi = '-';
   $gi = $1 if tag($f, 'db_xref') =~ m/\bGI:(\d+)\b/;
   my $cog = '-';
   $cog = $1 if tag($f, 'product') =~ m/^(COG\S+)/;
   my @col = (
     $f->start.'..'.$f->end,
     $f->strand >= 0 ? '+' : '-',
     ($f->length/3)-1,
     $gi,
     tag($f, 'gene'),
     tag($f, 'locus_tag'),
     $cog,
     tag($f, 'product'),
   );
   print join("\t", @col), "\n";
}

sub tag {
   my($f, $tag) = @_;
   return '-' unless $f->has_tag($tag);
   return join(' ', $f->get_tag_values($tag));
}




More information about the Bioperl-l mailing list