[Bioperl-l] Help with Bio::SeqIO
Chris Fields
cjfields at uiuc.edu
Mon Nov 5 00:08:34 UTC 2007
Pass in (-nosort => 1) to spliced_seq:
print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
This ensures no sorting of sublocations occurs, if you want for
instance typical GenBank/EMBL 'join' behavior.
To the other devs: shouldn't -nosort be the default behavior when the
split location is a 'join'? In other words, should spliced_seq() be
modified to take into account the split location type when returning
sequence? GB/EMBL/DDBJ rel. notes indicate a 'join' explicitly
indicates the order of the sequences is important when joined
together; the current behavior is more like that for 'order'.
chris
On Nov 4, 2007, at 12:39 PM, download on demand wrote:
> Hi to all.
>
> I have a problem with a simplest script:
>
>
>
> use Bio::SeqIO;
> # get command-line arguments, or die with a usage statement
> my $usage = "x2y.pl infile infileformat outfile
> outfileformat\n";
> my $infile = shift or die $usage;
> my $infileformat = shift or die $usage;
> # my $outfile = shift or die $usage;
> my $outfileformat = shift or die $usage;
>
> # create one SeqIO object to read in,and another to write out
> my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
> '-format' => $infileformat);
> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
> '-format' => $outfileformat);
>
> # write each entry in the input file to the output file
> while (my $inseq = $seq_in->next_seq) {
>
> # $seq_out->write_seq($inseq); # Whole sequence not needed
>
> for my $feat_object ($inseq->get_SeqFeatures)
> {
> if ($feat_object->primary_tag eq "CDS")
> {
> print $feat_object->get_tag_values('product'),"\n";
> print
> $feat_object->location->start,"..",$feat_object->location->end,"\n";
> print $feat_object->spliced_seq->seq,"\n\n";
> }
> }
>
>
>
> The result seems OK to me, but in case of first CDS of
> NC_005213.gbk from
> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Nanoarchaeum_equitans/
> > the
> output is wrong:
>
> It is:
> hypothetical protein
> 1..490885
> TAAATGCGATTGCTATTAGAA..................................Truncated
> sequence...................................
>
> Should be:
> hypothetical protein
> 879..490883
> ATGCGATTGCTATTAGAA...................................Truncated
> sequence....................................TAA
>
>
>
> This CDS have an unnatural location string:
> CDS complement(join(490883..490885,1..879)), but
> spliced_seq
> should handle these things?
>
> Please help me!
> Best regards, N.
> _______________________________________________
>
More information about the Bioperl-l
mailing list