[Bioperl-l] Re: AF165282 and Bio::DB::GenBank
Jason Stajich
jason@chg.mc.duke.edu
Tue, 24 Apr 2001 17:33:14 -0400 (EDT)
Anton, thank you for you request to look into this. Please submit these
type of things as a bug on the bioperl bug submission form in the future.
I have taken care of this for this case.
It appears that the problem is not in the db download (Bio::DB::GenBank),
but in the genbank parsing, but you may need to clarify what you mean by
'it does not work' as I can certainly download the sequence and get at
least sequence information just not all the features are being parsed
correctly.
The offending features look somthing like this which the regexps aren't
handling I'm guessing.
gene join(<1..226,AF165283.1:1..197,AF165284.1:1..243,
AF165285.1:1..242,AF165286.1:1..225,AF165287.1:1..152,
AF165288.1:1..163,AF165289.1:1..158,AF165290.1:1..241,
AF165291.1:1..93,AF165292.1:1..223,AF165293.1:1..69,
AF165294.1:1..134,AF165295.1:1..169,AF165296.1:1..145,
AF165297.1:1..119,AF165298.1:1..209,AF165299.1:1..115,
AF165300.1:1..53,AF165301.1:1..126,AF165302.1:1..95,
AF165303.1:1..190,AF165304.1:1..198,AF165305.1:1..136,
AF165306.1:1..165,AF165307.1:1..150,AF165308.1:1..141,
AF165309.1:1..83,AF165310.1:1..>264)
/gene="ABC1"
556 magrathea tests $ cat genbank_dbtests.t
#!/usr/local/bin/perl -w
use strict;
use Bio::DB::GenBank;
use Bio::SeqIO;
my $db = new Bio::DB::GenBank;
my $seq = $db->get_Seq_by_acc('AF165282');
my $seqout = new Bio::SeqIO(-format => 'genbank',
-fh => \*STDOUT);
$seqout->write_seq($seq);
557 magrathea tests $ perl genbank_dbtests.t
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of AF165283.1:1..197,
ignoring feature (seqid=HSATPCB01)
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse feature gene in EMBL/GenBank/SwissProt sequence entry
(id=HSATPCB01), ignoring
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of AF165283.1:16..192,
ignoring feature (seqid=HSATPCB01)
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse feature mRNA in EMBL/GenBank/SwissProt sequence entry
(id=HSATPCB01), ignoring
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of AF165283.1:16..192,
ignoring feature (seqid=HSATPCB01)
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse feature CDS in EMBL/GenBank/SwissProt sequence entry
(id=HSATPCB01), ignoring
---------------------------------------------------
LOCUS HSATPCB01 226 bp DNA PRI 17-AUG-1999
DEFINITION Homo sapiens ATP cassette binding transporter 1 (ABC1) gene,
exon
12.
ACCESSION AF165282
VERSION AF165282.1 GI:5734104
KEYWORDS .
SOURCE Homo sapiens.
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 226)
AUTHORS Rust S., Rosier M., Funke H., Real J., Amoura Z., Piette J.C.
Deleuze J.F., Brewer H.B., Duverger N., Denefle P. and Assmann
G.
TITLE Tangier disease is caused by mutations in the gene encoding
ATP-binding cassette transporter 1
JOURNAL Nat. Genet. 22 (4), 352-355 (1999)
REFERENCE 2 (bases 1 to 226)
AUTHORS Rust S., Rosier M., Funke H., Real J., Amoura Z., Piette J.C.
Deleuze J.F., Brewer H.B., Duverger N., Denefle P. and Assmann
G.
TITLE Direct Submission
JOURNAL Submitted (06-JUL-1999) Genomics, Rhone-Poulenc Rorer, 2 rue
GastonCremieux, Evry 91006, France
FEATURES Location/Qualifiers
source 1..226
/organism="Homo sapiens"
/db_xref="taxon:9606"
/chromosome=9
/map="9q31"
exon 16..221
/number=12
/gene="ABC1"
BASE COUNT 69 a 46 c 58 g 53 t
ORIGIN
1 ctgttcttct atcagtgtgt caacctgaac aagctagaac ccatagcaac agaagtctgg
61 ctcatcaaca agtccatgga gctgctggat gagaggaagt tctgggctgg tattgtgttc
121 actggaatta ctccaggcag cattgagctg ccccatcatg tcaagtacaa gatccgaatg
181 gacattgaca atgtggagag gacaaataaa atcaaggatg ggtaag
//
-Jason
On Tue, 24 Apr 2001, Anton Nekrutenko wrote:
> Dear Aaron and Jason,
>
> It seem like there is bug in the Bio::DB::GenBank
>
> The following string does not work:
>
> $seq = $gb->get_Seq_by_acc('AF165282');
>
> There seems to be something magical about this particular accession
> number.
>
> Thanks for your kind help and time.
>
> Anton
>
> --
> -----------------------------------
> Anton Nekrutenko, Ph. D.
> Department of Ecology and Evolution
> The University of Chicago
> anton@nekrut.uchicago.edu
> http://nekrut.uchicago.edu
> (773) 834-3965
> (773) 702-9740 (fax)
> -----------------------------------
>
>
>
>
Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center
http://www.chg.duke.edu/