[Bioperl-l] SeqIO::refseq
ybcho
ybcho at biomics.org
Wed Dec 1 21:59:52 EST 2004
I have been parsing RefSeq gpff files using Bioperl-1.4.
But I found that $taxon_id was missed while printing with below script
And it produced below
taxon:9606
taxon:9606
taxon:9606
taxon:9606
GeneID:26278 LocusID:26278 MIM:604490
.....
from print "@db_xref\n";
I can not find why this happened.
But, I can take taxonomy id from $taxonomy_id = $species->ncbi_taxid;
After removing "&& ( $species->ncbi_taxid())" in 508 line of genbank.pm
Because it has null value all the time.
Can any one correct these?
Cheers.
============= refseq parsing script ====================
foreach $feature(@features = $seq->get_SeqFeatures){
$location_type = $feature->location->location_type;
$feature_type = $feature->primary_tag;
%seen_tag = ();
@tags = ();
foreach $tag (@tags = $feature->get_all_tags){
$seen_tag{$tag}++;
}
$organism = $db_xref = $taxonomy_id = $strain = $plasmid = ();
if ($feature_type eq "source"){
@db_xref = $feature->get_tag_values('db_xref') if exists
$seen_tag{'map'};
print "@db_xref\n";
($taxonomy_id) = $db_xref[0] =~ /taxon\:(\d+)/;
($strain) = $feature->get_tag_values('strain') if exists
$seen_tag{'strain'};
($plasmid) = $feature->get_tag_values('plasmid') if exists
$seen_tag{'plasmid'};
print
"$internal_id\t$organism\t$strain\t$taxonomy_id\t$plasmid\n"
}
.............
}
================================================================
More information about the Bioperl-l
mailing list