[Bioperl-l] Help with Bio::SeqIO
Hilmar Lapp
hlapp at duke.edu
Mon Nov 5 16:03:16 UTC 2007
I agree that there should be a meaningful default that results in
"doing the right thing" in most cases if the user doesn't intervene.
I'm not sure I understand all the details, but it sounds sorting or
not sorting should depend on the split location type unless the user
overrides it by argument. That's what you're suggesting, right?
-hilmar
On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
> Pass in (-nosort => 1) to spliced_seq:
>
> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>
> This ensures no sorting of sublocations occurs, if you want for
> instance typical GenBank/EMBL 'join' behavior.
>
> To the other devs: shouldn't -nosort be the default behavior when
> the split location is a 'join'? In other words, should spliced_seq
> () be modified to take into account the split location type when
> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join'
> explicitly indicates the order of the sequences is important when
> joined together; the current behavior is more like that for 'order'.
>
> chris
>
> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>
>> Hi to all.
>>
>> I have a problem with a simplest script:
>>
>>
>>
>> use Bio::SeqIO;
>> # get command-line arguments, or die with a usage statement
>> my $usage = "x2y.pl infile infileformat outfile
>> outfileformat\n";
>> my $infile = shift or die $usage;
>> my $infileformat = shift or die $usage;
>> # my $outfile = shift or die $usage;
>> my $outfileformat = shift or die $usage;
>>
>> # create one SeqIO object to read in,and another to write
>> out
>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>> '-format' => $infileformat);
>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>> '-format' => $outfileformat);
>>
>> # write each entry in the input file to the output file
>> while (my $inseq = $seq_in->next_seq) {
>>
>> # $seq_out->write_seq($inseq); # Whole sequence not needed
>>
>> for my $feat_object ($inseq->get_SeqFeatures)
>> {
>> if ($feat_object->primary_tag eq "CDS")
>> {
>> print $feat_object->get_tag_values('product'),"\n";
>> print
>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>> print $feat_object->spliced_seq->seq,"\n\n";
>> }
>> }
>>
>>
>>
>> The result seems OK to me, but in case of first CDS of
>> NC_005213.gbk from
>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>> Nanoarchaeum_equitans/> the
>> output is wrong:
>>
>> It is:
>> hypothetical protein
>> 1..490885
>> TAAATGCGATTGCTATTAGAA..................................Truncated
>> sequence...................................
>>
>> Should be:
>> hypothetical protein
>> 879..490883
>> ATGCGATTGCTATTAGAA...................................Truncated
>> sequence....................................TAA
>>
>>
>>
>> This CDS have an unnatural location string:
>> CDS complement(join(490883..490885,1..879)), but
>> spliced_seq
>> should handle these things?
>>
>> Please help me!
>> Best regards, N.
>> _______________________________________________
>>
>
>
>
More information about the Bioperl-l
mailing list