[Bioperl-l] Write a fasta file with custom title line.

Siddhartha Basu basu at pharm.sunysb.edu
Thu Aug 31 19:54:26 UTC 2006


Staffa, Nick (NIH/NIEHS) [C] wrote:
> I would like to construct  title lines for the fasta sequences I want to right to a file. 
> I don't see in the documentation on-line for SeqIO or write_seq how to specify this. 
> Please point the way. 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 

Hi Nick,

You could use Bio::Seq::BaseSeqProcessor to customize the title line. 
Write your own title processing class which should inherit from 
Bio::Seq::BaseSeqProcessor overriding its "process_seq" method. For example,



#!/usr/bin/perl -w

use strict;
use Bio::SeqIO;
use MySeqProcessor;

die "no file given\n" unless @ARGV;

my $seqin = Bio::SeqIO->new(-file => $ARGV[0], -format => "swiss");
my $seqio = Bio::SeqIO->new(-format => "fasta");
my $pipe = MySeqProcessor->new(-source_stream => $seqio);

my $count = 0;
while (my $inseq = $seqin->next_seq()) {
    $pipe->write_seq($inseq);
    last if $count > 3;
    $count++;
}

$pipe->close();


## The title processing class ##

package MySeqProcessor;

use strict;
use Bio::Seq::BaseSeqProcessor;
use base qw(Bio::Seq::BaseSeqProcessor);

sub process_seq {
    my ($self,$seq) = @_;
## $seq is a Bio::PrimarySeqI complaint object

## As i understand to customize the fasta title you need to      	 
##manipulate the display_id and desc methods of Bio::PrimarySeq object
    $seq->display_id(int rand(100));
    $seq->desc(sprintf("%s [%d]",$seq->desc,$seq->length));

    $seq;
}

1;

The output

 >2770 Protein 108 precursor. [102]
MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQPQQSCTASLTGLNVCAPFLVPGSP
TASTECCNAVQSINHDCMCNTMRIAAQIPAQCNLPPLSCSAN
 >1683 10 kDa protein precursor (Clone PSAS10). [75]
MEKKSIAGLCFLFLVLFVAQEVVVQSEAKTCENLVDTYRGPCFTTGSCDDHCKNKEHLLS
GRCRDDVRCWCTRNC
 >1167 110 kDa antigen (PK110) (Fragment). [296]
FNSNMLRGSVCEEDVSLMTSIDNMIEEIDFYEKEIYKGSHSGGVIKGMDYDLEDDENDED
EMTEQMVEEVADHITQDMIDEVAHHVLDNITHDMAHMEEIVHGLSGDVTQIKEIVQKVNV
AVEKVKHIVETEETQKTVEPEQIEETQNTVEPEQTEETQKTVEPEQTEETQNTVEPEQIE
ETQKTVEPEQTEEAQKTVEPEQTEETQKTVEPEQTEETQKTVEPEQTEETQKTVEPEQTE
ETQKTVEPEQTEETQKTVEPEQTEETQKTVEPEQTEETQNTVEPEPTQETQNTVEP
 >860 104 kDa microneme-rhoptry antigen. [924]
MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLTAVEMAGVKYL
QVQHGSNVNIHRLVEGNVVIWENASTPLYTGAIVTNNDGPYMAYVEVLGDPNLQFFIKSG
DAWVTLSEHEYLAKLQEIRQAVHIESVFSLNMAFQLENNKYEVETHAKNGANMVTFIPRN
GHICKMVYHKNVRIYKATGNDTVTSVVGFFRGLRLLLINVFSIDDNGMMSNRYFQHVDDK
YVPISQKNYETGIVKLKDYKHAYHPVDLDIKDIDYTMFHLADATYHEPCFKIIPNTGFCI
TKLFDGDQVLYESFNPLIHCINEVHIYDRNNGSIICLHLNYSPPSYKAYLVLKDTGWEAT
THPLLEEKIEELQDQRACELDVNFISDKDLYVAALTNADLNYTMVTPRPHRDVIRVSDGS
EVLWYYEGLDNFLVCAWIYVSDGVASLVHLRIKDRIPANNDIYVLKGDLYWTRITKIQFT
QEIKRLVKKSKKKLAPITEEDSDKHDEPPEGPGASGLPPKAPGDKEGSEGHKGPSKGSDS
SKEGKKPGSGKKPGPAREHKPSKIPTLSKKPSGPKDPKHPRDPKEPRKSKSPRTASPTRR
PSPKLPQLSKLPKSTSPRSPPPPTRPSSPERPEGTKIIKTSKPPSPKPPFDPSFKEKFYD
DYSKAASRSKETKTTVVLDESFESILKETLPETPGTPFTTPRPVPPKRPRTPESPFEPPK
DPDSPSTSPSEFFTPPESKRTRFHETPADTPLPDVTAELFKEPDVTAETKSPDEAMKRPR
SPSEYEDTSPGDYPSLPMKRHRLERLRLTTTEMETDPGRMAKDASGKPVKLKRSKSFDDL
TTVELAPEPKASRIVVDDEGTEADDEETHPPEERQKTEVRRRRPPKKPSKSPRPSKPKKP
KKPDSAYIPSILAILVVSLIVGIL
             ------------------

-siddhartha





More information about the Bioperl-l mailing list