[Bioperl-l] Bio:Seq $seq_obj->accession_number not
returningaccession number?
Barry Moore
bmoore at genetics.utah.edu
Sun Dec 4 16:23:48 EST 2005
Sam-
The fasta parser makes no attempt to parse the fasta header since there
is no standard format for what should be in a fasta header. Parse the
accession out of the primary_id field with a regular expression in your
script or use GenBank or ENSEMBL format sequences to get all the goodies
parsed for you. Google on "accession fasta parse site:bioperl.org" to
read other posts on this topic.
Barry
-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Sam
Al-Droubi
Sent: Sunday, December 04, 2005 1:18 PM
To: BioPerl list BioPerl list
Subject: [Bioperl-l] Bio:Seq $seq_obj->accession_number not
returningaccession number?
The fasta format for this sequence AF410462 from NCBI looks like this
>gi|17066572|gb|AF410462.1|AF410462 Mus musculus PEM homeobox (Pem)
gene, promoter region and partial cds
ATGCGTGTGGGCATGCGCTCATGCCCACTTGCTTGAGCACATGTGTGCTCACATGGACGTTAGAGGCAAC
TTTCAGGAGTTATTTTTTTCCCTTCTAACTTGAGTTCCTGGACCTCAGACTTGTATAATAGGTACTTTCC
CAACTTAAGTCTTACTGGCTCCAGGGTATCTGGTATACTCTTCTAGCCTCCAAGGGCAGCCACTCATGCT
TCTTCAGGTGTGAAGAGGTGAGCCAGATACAACGGTGGGAGGCAGTGTGCCCTCAGTGTGTAGACTCTTT
ATGCCCTTGGGGATTAGCGCCTCTAGCTGCCAGTCGGGTCTCTGGGTCCCTCCTGCTAAGGCCACTCTCG
TCATGGTTCCTCTTGTCCTGGTGAGCCATTACGACCCTCTCACTTCCTTGTGTTCTCTTCCCTGTGTTCT
CTCTCTGCTGCTGTGGCCATTCTAGCTCCCTGCACAGTCCTTCAAGCTCACCTCCTGCCTTCCGTGGACA
AGAGGAAGCACAAAGAATCATCCAGTATGTATGCTCATGGCATAAGGGGATCCTGGGGAAGGGCTGAAGC
CTGAGCCGGGCTGGTCAACAGAATCTCCCTCTCCCTAACTCCATCTCCCTCTCCTTCCCTCTTCCTCTCT
CTATCCCTCCCCCCTCTCTCCCCCCACCACCGCATGTTTTGGGTCAGCTGACTGCTCTAGCCTTGATGAG
ATATCTTCCCAGGAAGAGTTGGTGCTGACTGTACAGATTGAGTTAGAGGGAGGGAAGAAAGCTCCTGTTT
GATCACTGGAGATCTTTATGCCTAGCTACATGTCTTACCAAAGCCAGGGGAGTCAGCTGAGCTGTAACTG
GGCACCCTAAGTTCTGCACACCCACATGCCCATGAACTGTGTCCATCTTGCAAGCACATCGTGCTCATTA
CATCCCCAAACTGCTATCACTTGTGTACCCCAAAGGCTCGGCCCACAGGAACGTCCTGTGAGCAAATCAC
AAAGACCAGCTTAGGGCTGGAAACATTGTAACCTGAAGTAGGCCAGAGGAGATCCCTGCCAGGTTGAGCA
TCACAGATCTCATTCTGTTCCCGGGGACACCAGGGGCCCAAGCTCAGAATCTGCCGAAGCATAACTTCAT
CATTGATCCTATTCAGGGTATGGAAGCTGAGGGTTCCAGCCGCAAGGTCACCAGGCTACTCCGCCTGGGA
GTCAAGGAAG
When I read this from a file as a sequence object using Bio::Seq I get
accession_number unknow. The
accession number is in the header of the fasta file. Anyone knows why
this happens.
My code looks like this:
print "primary id is: ",$seq_obj->primary_id."\n";
print "Description is ",$seq_obj->desc."\n";
print "Accession Number is ",$seq_obj->accession_number."\n";
Output looks like this:
primary id is: gi|17066572|gb|AF410462.1|AF410462
Description is Mus musculus PEM homeobox (Pem) gene, promoter region
and partial cds
Accession Number is unknown
Thank you.
Sincerely,
Sam Al-Droubi, M.S.
saldroubi at yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list