[Bioperl-l] Write a fasta file with custom title line.
Siddhartha Basu
basu at pharm.sunysb.edu
Thu Aug 31 19:54:26 UTC 2006
Staffa, Nick (NIH/NIEHS) [C] wrote:
> I would like to construct title lines for the fasta sequences I want to right to a file.
> I don't see in the documentation on-line for SeqIO or write_seq how to specify this.
> Please point the way.
>
>
> Nick Staffa
> Telephone: 919-316-4569 (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
Hi Nick,
You could use Bio::Seq::BaseSeqProcessor to customize the title line.
Write your own title processing class which should inherit from
Bio::Seq::BaseSeqProcessor overriding its "process_seq" method. For example,
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use MySeqProcessor;
die "no file given\n" unless @ARGV;
my $seqin = Bio::SeqIO->new(-file => $ARGV[0], -format => "swiss");
my $seqio = Bio::SeqIO->new(-format => "fasta");
my $pipe = MySeqProcessor->new(-source_stream => $seqio);
my $count = 0;
while (my $inseq = $seqin->next_seq()) {
$pipe->write_seq($inseq);
last if $count > 3;
$count++;
}
$pipe->close();
## The title processing class ##
package MySeqProcessor;
use strict;
use Bio::Seq::BaseSeqProcessor;
use base qw(Bio::Seq::BaseSeqProcessor);
sub process_seq {
my ($self,$seq) = @_;
## $seq is a Bio::PrimarySeqI complaint object
## As i understand to customize the fasta title you need to
##manipulate the display_id and desc methods of Bio::PrimarySeq object
$seq->display_id(int rand(100));
$seq->desc(sprintf("%s [%d]",$seq->desc,$seq->length));
$seq;
}
1;
The output
>2770 Protein 108 precursor. [102]
MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQPQQSCTASLTGLNVCAPFLVPGSP
TASTECCNAVQSINHDCMCNTMRIAAQIPAQCNLPPLSCSAN
>1683 10 kDa protein precursor (Clone PSAS10). [75]
MEKKSIAGLCFLFLVLFVAQEVVVQSEAKTCENLVDTYRGPCFTTGSCDDHCKNKEHLLS
GRCRDDVRCWCTRNC
>1167 110 kDa antigen (PK110) (Fragment). [296]
FNSNMLRGSVCEEDVSLMTSIDNMIEEIDFYEKEIYKGSHSGGVIKGMDYDLEDDENDED
EMTEQMVEEVADHITQDMIDEVAHHVLDNITHDMAHMEEIVHGLSGDVTQIKEIVQKVNV
AVEKVKHIVETEETQKTVEPEQIEETQNTVEPEQTEETQKTVEPEQTEETQNTVEPEQIE
ETQKTVEPEQTEEAQKTVEPEQTEETQKTVEPEQTEETQKTVEPEQTEETQKTVEPEQTE
ETQKTVEPEQTEETQKTVEPEQTEETQKTVEPEQTEETQNTVEPEPTQETQNTVEP
>860 104 kDa microneme-rhoptry antigen. [924]
MKFLILLFNILCLFPVLAADNHGVGPQGASGVDPITFDINSNQTGPAFLTAVEMAGVKYL
QVQHGSNVNIHRLVEGNVVIWENASTPLYTGAIVTNNDGPYMAYVEVLGDPNLQFFIKSG
DAWVTLSEHEYLAKLQEIRQAVHIESVFSLNMAFQLENNKYEVETHAKNGANMVTFIPRN
GHICKMVYHKNVRIYKATGNDTVTSVVGFFRGLRLLLINVFSIDDNGMMSNRYFQHVDDK
YVPISQKNYETGIVKLKDYKHAYHPVDLDIKDIDYTMFHLADATYHEPCFKIIPNTGFCI
TKLFDGDQVLYESFNPLIHCINEVHIYDRNNGSIICLHLNYSPPSYKAYLVLKDTGWEAT
THPLLEEKIEELQDQRACELDVNFISDKDLYVAALTNADLNYTMVTPRPHRDVIRVSDGS
EVLWYYEGLDNFLVCAWIYVSDGVASLVHLRIKDRIPANNDIYVLKGDLYWTRITKIQFT
QEIKRLVKKSKKKLAPITEEDSDKHDEPPEGPGASGLPPKAPGDKEGSEGHKGPSKGSDS
SKEGKKPGSGKKPGPAREHKPSKIPTLSKKPSGPKDPKHPRDPKEPRKSKSPRTASPTRR
PSPKLPQLSKLPKSTSPRSPPPPTRPSSPERPEGTKIIKTSKPPSPKPPFDPSFKEKFYD
DYSKAASRSKETKTTVVLDESFESILKETLPETPGTPFTTPRPVPPKRPRTPESPFEPPK
DPDSPSTSPSEFFTPPESKRTRFHETPADTPLPDVTAELFKEPDVTAETKSPDEAMKRPR
SPSEYEDTSPGDYPSLPMKRHRLERLRLTTTEMETDPGRMAKDASGKPVKLKRSKSFDDL
TTVELAPEPKASRIVVDDEGTEADDEETHPPEERQKTEVRRRRPPKKPSKSPRPSKPKKP
KKPDSAYIPSILAILVVSLIVGIL
------------------
-siddhartha
More information about the Bioperl-l
mailing list