[Bioperl-l] GFF file and load_gff.pl
Richard Harrison
richard.harrison at ed.ac.uk
Wed Jan 28 17:36:16 UTC 2009
Thank you Chris, Scott and Adam,
You are right, I was confused. I have now managed to create a
Bio::DB::GFF database with my genome annotation loaded into it. One
further question.
I am having trouble retrieving the desired info from the database.
Shown below is a typical entry into the GFF file for a gene
#chr01 SGD gene 33449 34702 . + .
ID=YAL061W;Name=YAL061W;gene=BDH2;Alias=BDH2;Ontology_term=GO:
0008150,GO:0005634,GO:0005737,GO:0016616,GO:0008270,GO:0016491,GO:
0046872;Note=Putative%20medium-chain%20alcohol%20dehydrogenase%20with
%20similarity%20to%20BDH1%3B%20transcription%20induced%20by
%20constitutively%20active%20PDR1%20and%20PDR3%3B%20BDH2%20is%20an
%20essential
%20gene;dbxref=SGD:S000000057;orf_classification=Uncharacterized
#chr01 SGD CDS 33449 34702 . + 0
Parent=YAL061W;Name=YAL061W;gene=BDH2;Alias=BDH2;Ontology_term=GO:
0008150,GO:0005634,GO:0005737,GO:0016616,GO:0008270,GO:0016491,GO:
0046872;Note=Putative%20medium-chain%20alcohol%20dehydrogenase%20with
%20similarity%20to%20BDH1%3B%20transcription%20induced%20by
%20constitutively%20active%20PDR1%20and%20PDR3%3B%20BDH2%20is%20an
%20essential
%20gene;dbxref=SGD:S000000057;orf_classification=Uncharacterized
I would like to search the database for YAL061W and retrieve the CDS
coordinates, details about introns etc. I don't need the sequence, as
I have separate multiple genome-alignments..
At present all I can work out how to do is get all feature types and
classes in the database.. (see code below)
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
-dsn => 'dbi:mysql:biosql',
user => 'root',
pass => '*******'
);
#get types
my @types = $db->types;
EG:
#telomere:SGDintron:SGDinsertion:SGDchromosome:SGDregion:landmarkncRNA:SGDtransposable_element_gene:SGDregion:SGDARS:SGDsnRNA:SGDsnoRNA:SGDnc_primary_transcript:SGDrRNA
etc...
#get classes
my @classes = $db->classes;
ID=YKR067W
ID=YKR068C
ID=YKR069W
ID=YKR070W
ID=YKR071C
ID=YKR072C
ID=YKR073C
ID=YKR074W
etc...
Could someone point me towards a useful set of pointers for this. I've
tried reading the documentation but it doesn't seem to illustrate what
I want to do.
Best wishes and thanks for the help so far,
Richard
On 28 Jan 2009, at 16:15, Scott Cain wrote:
> Hi Richard,
>
> Your mixing up two database schemas. Do you want to use a BioSQL
> database (bioperl-db) or a Bio::DB::GFF database? I'm guessing that
> you want the latter, so I'll answer that question (as it's the easier
> one anyway). You need to add the "-c" flag (for --create) to the
> load_gff.pl command to create the Bio::DB::GFF schema.
>
> If you really wanted a BioSQL database, you'll have to wait for help
> from someone else more knowledgeable about it.
>
> Scott
>
>
>
>
> On Wed, Jan 28, 2009 at 10:22 AM, Richard Harrison
> <richard.harrison at ed.ac.uk> wrote:
>> Dear all,
>>
>> I am running Bioperl 1.6 on osx- leopard on a macbook pro.
>>
>> I have installed mysql-5.1.30-osx10.5-x86, DBD-mysql-4.010, the
>> biosql-schema for mysql and bioperl-db. As per the instructions I
>> have a
>> database called biosql which I associated the SQL dialect biosqldb-
>> mysql.sql
>>
>> After much fannying, the install seems fine....although i can't be
>> sure
>> (never used mysql before)
>>
>> I am having problems with the script load_gff.pl
>>
>> I want to load a database with the data from a genome.gff file (for
>> saccharomyces cerevisiae). I don't want to add sequence to it, as
>> all i need
>> is the annotation.
>>
>> I have tried the following command(s):
>>
>> ./bp_load_gff.pl -d biosql -user root -pass mypassword genome.gff
>> ./bp_load_gff.pl -d biosql -user root -pass mypassword --
>> adaptor=dbi::mysql
>> genome.gff
>>
>> With both I get the following error:
>>
>> No ftype id for CDS:SGD Table 'biosql.ftype' doesn't exist Record
>> skipped.
>> (then another few '000 of these)
>> then..
>>
>> genome.gff: 16379 records loaded
>>
>>
>> Any ideas where I'm going wrong?
>>
>> Thanks,
>>
>> Richard
>>
>> ____________________________
>> Dr Richard Harrison
>> 127 Ashworth Labs
>> Institutes of Evolutionary Biology
>> King's Buildings
>> West Mains Road
>> Edinburgh EH9 3JT
>>
>>
>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. scott at
> scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Bioperl-l
mailing list