[Bioperl-l] Odd problem with get_tag_values
Adlai Burman
adlai at refenestration.com
Fri Feb 24 21:57:24 UTC 2012
On Feb 24, 2012, at 10:46 PM, Fields, Christopher J wrote:
> Using has_tag('gene') as a pre-screen works for me for both example seqs.
>
Me too :-)
Dobrou noc and cheers,
Adlai
> chris
>
> On Feb 24, 2012, at 3:33 PM, Adlai Burman wrote:
>
>> Thanks so much, Jason.
>> I will give that a try in after I get a few hours of much needed sleep :-)
>>
>>
>> On Feb 24, 2012, at 10:21 PM, Jason Stajich wrote:
>>
>>> not all CDS will be annotated with a 'gene' tag, this is due to variation in how annotation is done and that there is not a requirement that there be a gene tag for all CDS features.
>>>
>>> You can protect your query - we often do this when dealing with data from the wild by testing for has_tag first.
>>>
>>> my %strands;
>>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) {
>>> if( $cds->has_tag('gene') ) {
>>> my ($gene) = $cds->get_tag_values('gene'); # get the 1st one, this returns a list
>>> $strands{$gene} = $cds->strand;
>>> } else { # look in alternative places for a name, e.g. locus,
>>> ...
>>> }
>>> }
>>>
>>> An alternative is to loop through your list of tags in order of preference
>>>
>>> my %strands;
>>> for my $cds ( grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures ) {
>>> for my $tag ( qw(gene locus name product accession note) ) {
>>> if( $cds->has_tag($tag) ) {
>>> my ($name) = $cds->get_tag_values($tag); # get the 1st one, this returns a list
>>> $strands{$name} = $cds->strand;
>>> $seen = 1;
>>> last;
>>> }
>>> if( ! $seen ) {
>>> warn("not tag found for feature at ", $cds->location->to_FTstring, "\n");
>>> }
>>> }
>>>
>>> On Feb 24, 2012, at 12:43 PM, Adlai Burman wrote:
>>>
>>>> I have come across a perplexing problem with trying to parse sequence features into hashes from gb records. This is the minimal code which shows my problem:
>>>>
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use IO::String;
>>>> use Bio::Perl;
>>>> use Bio::SeqIO;
>>>> use IO::String;
>>>>
>>>> my @files = </Users/adlai/Dropbox/atrsh/*>;
>>>> foreach my $file(@files){
>>>>
>>>>
>>>> my @cds_features = grep {$_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures;
>>>> my %strands = map {$_->get_tag_values('gene'), $_->strand} @cds_features; ##This Is The Culprit.
>>>> .
>>>> .
>>>> .
>>>> #do nifty stuff
>>>> }
>>>>
>>>> For some files this approach works just fine.
>>>> For others the script dies immediately with the error message:
>>>>
>>>> ------------- EXCEPTION -------------
>>>> MSG: asking for tag value that does not exist gene
>>>> STACK Bio::SeqFeature::Generic::get_tag_values /Users/adlai/Downloads/BioPerl-1.6.1/Bio/SeqFeature/Generic.pm:517
>>>> STACK toplevel tosend.pl:16
>>>> -------------------------------------
>>>>
>>>> The difference in the files that parse and those that don't seems to be that the files that crash have "intron" and "exon" tags. They ALL have "gene" tags.
>>>> Does anyone know why this is a problem and what can be done to circumvent it?
>>>>
>>>> Thanks,
>>>> Adlai
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>>
>>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list