[Bioperl-l] Help with Bio::SeqIO

Mon Nov 5 17:41:27 UTC 2007

It may have something to do with remote locations or setting strand()  
in sublocations.  This may have popped up in relation to a LocationI  
code audit I proposed a while back on the list which I never got  
around to.  Oh well...

I at least managed getting a wiki page started in case we decided to  
make changes, with the intention of making it a HOWTO at some point:

http://www.bioperl.org/wiki/BioPerl_Locations

If we go through with the changes to spliced_seq(), should it be  
implemented for inclusion in v1.6 or wait until v1.7?

chris

On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote:

>
> At one point the location order was not respected/saved I believe.  
> I guess we will just assume the user will build up a SplitLocation  
> in order (i.e. add_SubLocation).  I'll try and remember if there  
> were any other particular reasons.
>
>
> -jason
> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> 	-hilmar
>>
>> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>>
>>> Pass in (-nosort => 1) to spliced_seq:
>>>
>>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>>
>>> This ensures no sorting of sublocations occurs, if you want for
>>> instance typical GenBank/EMBL 'join' behavior.
>>>
>>> To the other devs: shouldn't -nosort be the default behavior when
>>> the split location is a 'join'?  In other words, should spliced_seq
>>> () be modified to take into account the split location type when
>>> returning sequence?  GB/EMBL/DDBJ rel. notes indicate a 'join'
>>> explicitly indicates the order of the sequences is important when
>>> joined together; the current behavior is more like that for 'order'.
>>>
>>> chris
>>>
>>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>>
>>>> Hi to all.
>>>>
>>>> I have a problem with a simplest script:
>>>>
>>>>
>>>>
>>>>          use Bio::SeqIO;
>>>>          # get command-line arguments, or die with a usage  
>>>> statement
>>>>          my $usage = "x2y.pl infile infileformat outfile
>>>> outfileformat\n";
>>>>          my $infile = shift or die $usage;
>>>>          my $infileformat = shift or die $usage;
>>>> #         my $outfile = shift or die $usage;
>>>>          my $outfileformat = shift or die $usage;
>>>>
>>>>          # create one SeqIO object to read in,and another to write
>>>> out
>>>>          my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>>                                       '-format' => $infileformat);
>>>>          my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>>                                        '-format' =>  
>>>> $outfileformat);
>>>>
>>>>          # write each entry in the input file to the output file
>>>>          while (my $inseq = $seq_in->next_seq) {
>>>>
>>>> #            $seq_out->write_seq($inseq); # Whole sequence not  
>>>> needed
>>>>
>>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>>     {
>>>>     if ($feat_object->primary_tag eq "CDS")
>>>>         {
>>>>         print $feat_object->get_tag_values('product'),"\n";
>>>>         print
>>>> $feat_object->location->start,"..",$feat_object->location- 
>>>> >end,"\n";
>>>>         print $feat_object->spliced_seq->seq,"\n\n";
>>>>         }
>>>>     }
>>>>
>>>>
>>>>
>>>> The result seems OK to me, but in case of first CDS of
>>>> NC_005213.gbk from
>>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>>> Nanoarchaeum_equitans/> the
>>>> output is wrong:
>>>>
>>>> It is:
>>>> hypothetical protein
>>>> 1..490885
>>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>>> sequence...................................
>>>>
>>>> Should be:
>>>> hypothetical protein
>>>> 879..490883
>>>> ATGCGATTGCTATTAGAA...................................Truncated
>>>> sequence....................................TAA
>>>>
>>>>
>>>>
>>>> This CDS have an unnatural location string:
>>>> CDS             complement(join(490883..490885,1..879)), but
>>>> spliced_seq
>>>> should handle these things?
>>>>
>>>> Please help me!
>>>> Best regards, N.
>>>> _______________________________________________
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign