[Bioperl-l] Help with Bio::SeqIO

Hilmar Lapp hlapp at duke.edu
Mon Nov 5 16:03:16 UTC 2007


I agree that there should be a meaningful default that results in  
"doing the right thing" in most cases if the user doesn't intervene.  
I'm not sure I understand all the details, but it sounds sorting or  
not sorting should depend on the split location type unless the user  
overrides it by argument. That's what you're suggesting, right?

	-hilmar

On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:

> Pass in (-nosort => 1) to spliced_seq:
>
> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>
> This ensures no sorting of sublocations occurs, if you want for  
> instance typical GenBank/EMBL 'join' behavior.
>
> To the other devs: shouldn't -nosort be the default behavior when  
> the split location is a 'join'?  In other words, should spliced_seq 
> () be modified to take into account the split location type when  
> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'  
> explicitly indicates the order of the sequences is important when  
> joined together; the current behavior is more like that for 'order'.
>
> chris
>
> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>
>> Hi to all.
>>
>> I have a problem with a simplest script:
>>
>>
>>
>>          use Bio::SeqIO;
>>          # get command-line arguments, or die with a usage statement
>>          my $usage = "x2y.pl infile infileformat outfile  
>> outfileformat\n";
>>          my $infile = shift or die $usage;
>>          my $infileformat = shift or die $usage;
>> #         my $outfile = shift or die $usage;
>>          my $outfileformat = shift or die $usage;
>>
>>          # create one SeqIO object to read in,and another to write  
>> out
>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>                                       '-format' => $infileformat);
>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>                                        '-format' => $outfileformat);
>>
>>          # write each entry in the input file to the output file
>>          while (my $inseq = $seq_in->next_seq) {
>>
>> #            $seq_out->write_seq($inseq); # Whole sequence not needed
>>
>> for my $feat_object ($inseq->get_SeqFeatures)
>>     {
>>     if ($feat_object->primary_tag eq "CDS")
>>         {
>>         print $feat_object->get_tag_values('product'),"\n";
>>         print
>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>         print $feat_object->spliced_seq->seq,"\n\n";
>>         }
>>     }
>>
>>
>>
>> The result seems OK to me, but in case of first CDS of  
>> NC_005213.gbk from
>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ 
>> Nanoarchaeum_equitans/> the
>> output is wrong:
>>
>> It is:
>> hypothetical protein
>> 1..490885
>> TAAATGCGATTGCTATTAGAA..................................Truncated
>> sequence...................................
>>
>> Should be:
>> hypothetical protein
>> 879..490883
>> ATGCGATTGCTATTAGAA...................................Truncated
>> sequence....................................TAA
>>
>>
>>
>> This CDS have an unnatural location string:
>> CDS             complement(join(490883..490885,1..879)), but  
>> spliced_seq
>> should handle these things?
>>
>> Please help me!
>> Best regards, N.
>> _______________________________________________
>>
>
>
>







More information about the Bioperl-l mailing list