[Bioperl-l] strange error parsing a specific NCBI gff file
Sendu Bala
bix at sendu.me.uk
Wed Jun 28 08:25:52 UTC 2006
William Hsiao wrote:
>
> sub process_attributes {
> my $attr_string = shift;
> my @attributes = split (/\;/, $attr_string);
> my %attr;
> foreach (@attributes){
> my ($key, $value) = split /=/;
> if ($value=~/\:/){
> my ($subkey, $subvalue) = split (/:/, $value);
# assign hashref to $key, assign key => value pair to that
> $attr{$key}{$subkey}=$subvalue;
> }
> else{
# assign scalar $key
> $attr{$key}=$value;
> }
> }
> return \%attr;
> }
> NC_005966.1 RefSeq CDS 635836 636489 . - 0 locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1
> They generate an error: Can't use string
> ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
> The strange part is that all I have to do is replace the word
> "function" in front of "=adaptation%20to%20stress;" with another word
> or simply change it to functions or functio or Function, etc, then the
> line parses properly.
The problem is that these lines contain function=x twice, where the
second x contains a colon.
So your code first assigns $attr{function} = $scalar, and then tries to
do $attr{function}{before_colon} = "after_colon".
Normally the latter would auto-vivicate $attr{function} as a hash
reference: $attr{function} == HASH(xyz) and then set before_colon =>
after_colon as a key value pair of HASH(xyz). But in this case,
$attr{function} already exists: $attr{function} ==
"adaptation%20to%20stress". But you try and set before_colon =>
after_colon as a key value pair of that string. Which you can't do.
Basically, your data structure isn't so great, mixing scalars and hash
references as values of %attr.
The solution may be to parse using Bioperl instead ;).
More information about the Bioperl-l
mailing list