[Bioperl-l] Parsing entrezgene with Bio::SeqIO
Liisa Koski
koski at cenix-bioscience.com
Fri Mar 17 09:10:57 UTC 2006
Thanks Stefan,
Unfortunately I only parse out the URL and not the primary_id or any comments.
print "\n\nDBlinks for geneid: ",$gene->id, "\t",
"acc: ", $gene->accession_number,"\n";
my @dblinks= $ann->get_Annotations('dblink');
foreach my $dblink (@dblinks) {
next unless ($dblink->database eq "KEGG");
print "primary_id:", "\t",$dblink->primary_id,"\n";
print "url:", "\t", $dblink->url, "\n";
print "as_text:", "\t", $dblink->as_text, "\n";
print "optional_id:","\t",$dblink->optional_id,"\n" ;
print "comment:", "\t", $dblink->comment, "\n" ;
print "object_id:", "\t", $dblink->object_id, "\n";
print "namespase:", "\t", $dblink->namespace, "\n" ;
print "authority:", "\t", $dblink->authority, "\n" ;
print "\nhash_tree\n";
my $hash_ref = $dblink->hash_tree;
for my $key (keys %{$hash_ref}) {
print $key,": ",$hash_ref->{$key},"\n";
}
}
Output:
------------------------------------
DBlinks for geneid: ABAT acc: 18
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 42.
primary_id:
url: http://www.genome.jp/dbget-bin/www_bget?hsa:18
Use of uninitialized value in concatenation (.) or string
at /netshare/home/koski/perl_modules/Bio/Annotation/DBLink.pm line 146.
as_text: Direct database link to in database KEGG
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 48.
optional_id:
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 50.
comment:
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 52.
object_id:
namespase: KEGG
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 56.
authority:
printing hash_tree
database: KEGG
Use of uninitialized value in print at ./entrez_gene_seqio.pl line 62.
primary_id:
-----------------------------------------
I see that on the gene page for ABAT (acc: 18) there are KEGG pathways:
KEGG pathway: Alanine and aspartate metabolism 00252
KEGG pathway: Butanoate metabolism 00650
KEGG pathway: Glutamate metabolism 00251
KEGG pathway: Propanoate metabolism 00640
KEGG pathway: Valine, leucine and isoleucine degradation 00280
KEGG pathway: beta-Alanine metabolism 00410
Is it possible to pull out these pathway names?
Thanks,
Liisa
On Thursday 16 March 2006 17:29, Stefan Kirov wrote:
> Do this:
> my @dblinks=$ann->get_Annotations('dblink');
> foreach my $link (@dblinks) {
> next unless ($dblink->database eq 'KEGG");
> print $dblink->primary_id,"\t",$dblink->url,"\n";
> }
> This works for me, hopefully it will for you too. Let me know if
> something is not right.
> Stefan
>
> Liisa Koski wrote:
> >Hi,
> >I'm using Bio::SeqIO to parse the EntrezGene file Homo_sapiens (from
> >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_OLD/Mammalia/Homo_sapiens.gz).
> >
> >I'm using bioperl-1.5.1.
> >
> >I want to extract the KEGG annotations.
> >See code below.
> >
> >use Bio::SeqIO;
> >use Bio::ASN1::EntrezGene;
> >
> >my $seqio = Bio::SeqIO->new(-format => 'entrezgene',
> > -file => 'Homo_sapiens');
> >while (my $gene = $seqio->next_seq){
> > print "\n",$gene->id, "\t", $gene->accession_number, "\n";
> > my $ann = $gene->annotation();
> > foreach my $key ( $ann->get_all_annotation_keys() ) {
> > my @values = $ann->get_Annotations($key);
> > foreach my $value ( @values ) {
> > print $key, "\t", "=", "\t", $value->as_text,"\n";
> > }
> > }
> >}
> >
> >Unfortunately the only KEGG annotation I see in the results looks like:
> >dblink = Direct database link to in database KEGG
> >(Notice the space between 'to in')
> >
> >Anyone have any ideas how to get the KEGG annotation results?
> >
> >Note: I also tried parsing the file
> >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Homo_sapiens.ags.
> >gz but I got the below error:
> >
> >./entrez_gene_seqio.pl Homo_sapiens.ags
> >Data Error: none conforming data found on line 1 in Homo_sapiens.ags!
> >first 20 (or till end of input) characters including the non-conforming
> > data: 00
> > at /netshare/home/koski/perl_modules/bioperl-live/Bio/SeqIO/entrezgene.pm
> >line 138
> >
> >
> >Thanks,
> >Liisa
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Liisa Koski
Bioinformatics Software Engineer
Cenix BioScience GmbH
Tatzberg 47
01307 Dresden
Germany
Phone: +49(351)4173-149
More information about the Bioperl-l
mailing list