[Bioperl-l] Help with Bio::SeqIO
Chris Fields
cjfields at uiuc.edu
Mon Nov 5 17:41:27 UTC 2007
It may have something to do with remote locations or setting strand()
in sublocations. This may have popped up in relation to a LocationI
code audit I proposed a while back on the list which I never got
around to. Oh well...
I at least managed getting a wiki page started in case we decided to
make changes, with the intention of making it a HOWTO at some point:
http://www.bioperl.org/wiki/BioPerl_Locations
If we go through with the changes to spliced_seq(), should it be
implemented for inclusion in v1.6 or wait until v1.7?
chris
On Nov 5, 2007, at 11:07 AM, Jason Stajich wrote:
>
> At one point the location order was not respected/saved I believe.
> I guess we will just assume the user will build up a SplitLocation
> in order (i.e. add_SubLocation). I'll try and remember if there
> were any other particular reasons.
>
>
> -jason
> On Nov 5, 2007, at 11:03 AM, Hilmar Lapp wrote:
>
>> I agree that there should be a meaningful default that results in
>> "doing the right thing" in most cases if the user doesn't intervene.
>> I'm not sure I understand all the details, but it sounds sorting or
>> not sorting should depend on the split location type unless the user
>> overrides it by argument. That's what you're suggesting, right?
>>
>> -hilmar
>>
>> On Nov 4, 2007, at 7:08 PM, Chris Fields wrote:
>>
>>> Pass in (-nosort => 1) to spliced_seq:
>>>
>>> print $feat_object->spliced_seq(-no_sort =>1)->seq,"\n\n";
>>>
>>> This ensures no sorting of sublocations occurs, if you want for
>>> instance typical GenBank/EMBL 'join' behavior.
>>>
>>> To the other devs: shouldn't -nosort be the default behavior when
>>> the split location is a 'join'? In other words, should spliced_seq
>>> () be modified to take into account the split location type when
>>> returning sequence? GB/EMBL/DDBJ rel. notes indicate a 'join'
>>> explicitly indicates the order of the sequences is important when
>>> joined together; the current behavior is more like that for 'order'.
>>>
>>> chris
>>>
>>> On Nov 4, 2007, at 12:39 PM, download on demand wrote:
>>>
>>>> Hi to all.
>>>>
>>>> I have a problem with a simplest script:
>>>>
>>>>
>>>>
>>>> use Bio::SeqIO;
>>>> # get command-line arguments, or die with a usage
>>>> statement
>>>> my $usage = "x2y.pl infile infileformat outfile
>>>> outfileformat\n";
>>>> my $infile = shift or die $usage;
>>>> my $infileformat = shift or die $usage;
>>>> # my $outfile = shift or die $usage;
>>>> my $outfileformat = shift or die $usage;
>>>>
>>>> # create one SeqIO object to read in,and another to write
>>>> out
>>>> my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
>>>> '-format' => $infileformat);
>>>> my $seq_out = Bio::SeqIO->new('-fh' => \*STDOUT,
>>>> '-format' =>
>>>> $outfileformat);
>>>>
>>>> # write each entry in the input file to the output file
>>>> while (my $inseq = $seq_in->next_seq) {
>>>>
>>>> # $seq_out->write_seq($inseq); # Whole sequence not
>>>> needed
>>>>
>>>> for my $feat_object ($inseq->get_SeqFeatures)
>>>> {
>>>> if ($feat_object->primary_tag eq "CDS")
>>>> {
>>>> print $feat_object->get_tag_values('product'),"\n";
>>>> print
>>>> $feat_object->location->start,"..",$feat_object->location-
>>>> >end,"\n";
>>>> print $feat_object->spliced_seq->seq,"\n\n";
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>> The result seems OK to me, but in case of first CDS of
>>>> NC_005213.gbk from
>>>> here <ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
>>>> Nanoarchaeum_equitans/> the
>>>> output is wrong:
>>>>
>>>> It is:
>>>> hypothetical protein
>>>> 1..490885
>>>> TAAATGCGATTGCTATTAGAA..................................Truncated
>>>> sequence...................................
>>>>
>>>> Should be:
>>>> hypothetical protein
>>>> 879..490883
>>>> ATGCGATTGCTATTAGAA...................................Truncated
>>>> sequence....................................TAA
>>>>
>>>>
>>>>
>>>> This CDS have an unnatural location string:
>>>> CDS complement(join(490883..490885,1..879)), but
>>>> spliced_seq
>>>> should handle these things?
>>>>
>>>> Please help me!
>>>> Best regards, N.
>>>> _______________________________________________
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
>
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list