[Bioperl-l] Help with Bio::SeqIO
Jason Stajich
jason at bioperl.org
Mon Nov 5 17:07:10 UTC 2007
At one point the location order was not respected/saved I believe. I
guess we will just assume the user will build up a SplitLocation in
order (i.e. add_SubLocation). I'll try and remember if there were
any other particular reasons.
-jason
On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
> I agree that there should be a meaningful default that results in
> "doing the right thing" in most cases if the user doesn't intervene.
> I'm not sure I understand all the details, but it sounds sorting or
> not sorting should depend on the split location type unless the user
> overrides it by argument. That's what you're suggesting, right?
>
> -hilmar
>
> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>
>> Pass in (-nosort => 1) to spliced_seq:
>>
>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>
>> This ensures no sorting of sublocations occurs, if you want for
>> instance typical GenBank/EMBL 'join' behavior.
>>
>> To the other devs: shouldn't -nosort be the default behavior when
>> the split location is a 'join'? In other words, should spliced_seq
>> () be modified to take into account the split location type when
>> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join'
>> explicitly indicates the order of the sequences is important when
>> joined together; the current behavior is more like that for 'order'.
>>
>> chris
>>
>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>
>>> Hi to all.
>>>
>>> I have a problem with a simplest script:
>>>
>>>
>>>
>>> use Bio::SeqIO;
>>> # get command-line arguments, or die with a usage statement
>>> my $usage = "x2y.pl infile infileformat outfile
>>> outfileformat\n";
>>> my $infile = shift or die $usage;
>>> my $infileformat = shift or die $usage;
>>> # my $outfile = shift or die $usage;
>>> my $outfileformat = shift or die $usage;
>>>
>>> # create one SeqIO object to read in,and another to write
>>> out
>>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>> '-format' => $infileformat);
>>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>> '-format' => $outfileformat);
>>>
>>> # write each entry in the input file to the output file
>>> while (my $inseq = $seq_in->next_seq) {
>>>
>>> # $seq_out->write_seq($inseq); # Whole sequence not
>>> needed
>>>
>>> for my $feat_object ($inseq->get_SeqFeatures)
>>> {
>>> if ($feat_object->primary_tag eq "CDS")
>>> {
>>> print $feat_object->get_tag_values('product'),"\n";
>>> print
>>> $feat_object->location->start,"..",$feat_object->location->end,"\n";
>>> print $feat_object->spliced_seq->seq,"\n\n";
>>> }
>>> }
>>>
>>>
>>>
>>> The result seems OK to me, but in case of first CDS of
>>> NC_005213.gbk from
>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>> Nanoarchaeum_equitans/> the
>>> output is wrong:
>>>
>>> It is:
>>> hypothetical protein
>>> 1..490885
>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>> sequence...................................
>>>
>>> Should be:
>>> hypothetical protein
>>> 879..490883
>>> ATGCGATTGCTATTAGAA...................................Truncated
>>> sequence....................................TAA
>>>
>>>
>>>
>>> This CDS have an unnatural location string:
>>> CDS complement(join(490883..490885,1..879)), but
>>> spliced_seq
>>> should handle these things?
>>>
>>> Please help me!
>>> Best regards, N.
>>> _______________________________________________
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason at bioperl.org
More information about the Bioperl-l
mailing list