[Bioperl-l] GFF file and load_gff.pl

Richard Harrison richard.harrison at ed.ac.uk
Wed Jan 28 17:36:16 UTC 2009


Thank you Chris, Scott and Adam,
You are right, I was confused. I have now managed to create a  
Bio::DB::GFF database with my genome annotation loaded into it. One  
further question.
I am having trouble retrieving the desired info from the database.   
Shown below is a typical entry into the GFF file for a gene


#chr01	SGD	gene	33449	34702	.	+	.	 
ID=YAL061W;Name=YAL061W;gene=BDH2;Alias=BDH2;Ontology_term=GO: 
0008150,GO:0005634,GO:0005737,GO:0016616,GO:0008270,GO:0016491,GO: 
0046872;Note=Putative%20medium-chain%20alcohol%20dehydrogenase%20with 
%20similarity%20to%20BDH1%3B%20transcription%20induced%20by 
%20constitutively%20active%20PDR1%20and%20PDR3%3B%20BDH2%20is%20an 
%20essential 
%20gene;dbxref=SGD:S000000057;orf_classification=Uncharacterized

#chr01	SGD	CDS	33449	34702	.	+	0	 
Parent=YAL061W;Name=YAL061W;gene=BDH2;Alias=BDH2;Ontology_term=GO: 
0008150,GO:0005634,GO:0005737,GO:0016616,GO:0008270,GO:0016491,GO: 
0046872;Note=Putative%20medium-chain%20alcohol%20dehydrogenase%20with 
%20similarity%20to%20BDH1%3B%20transcription%20induced%20by 
%20constitutively%20active%20PDR1%20and%20PDR3%3B%20BDH2%20is%20an 
%20essential 
%20gene;dbxref=SGD:S000000057;orf_classification=Uncharacterized


I would like to search the database for YAL061W and retrieve the CDS  
coordinates, details about introns etc. I don't need the sequence, as  
I have separate multiple genome-alignments..


At present all I can work out how to do is  get all feature types and  
classes  in the database.. (see code below)


my $db      = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                    -dsn     => 'dbi:mysql:biosql',
				   user => 'root',
				   pass => '*******'
				 );
	#get types
	my @types = $db->types;

EG:  
#telomere:SGDintron:SGDinsertion:SGDchromosome:SGDregion:landmarkncRNA:SGDtransposable_element_gene:SGDregion:SGDARS:SGDsnRNA:SGDsnoRNA:SGDnc_primary_transcript:SGDrRNA 
  etc...



	#get classes
	my @classes = $db->classes;

ID=YKR067W
ID=YKR068C
ID=YKR069W
ID=YKR070W
ID=YKR071C
ID=YKR072C
ID=YKR073C
ID=YKR074W

etc...

Could someone point me towards a useful set of pointers for this. I've  
tried reading the documentation but it doesn't seem to illustrate what  
I want to do.

Best wishes and thanks for the help so far,

Richard







On 28 Jan 2009, at 16:15, Scott Cain wrote:

> Hi Richard,
>
> Your mixing up two database schemas.  Do you want to use a BioSQL
> database (bioperl-db) or a Bio::DB::GFF database?  I'm guessing that
> you want the latter, so I'll answer that question (as it's the easier
> one anyway).  You need to add the "-c" flag (for --create) to the
> load_gff.pl command to create the Bio::DB::GFF schema.
>
> If you really wanted a BioSQL database, you'll have to wait for help
> from someone else more knowledgeable about it.
>
> Scott
>
>
>
>
> On Wed, Jan 28, 2009 at 10:22 AM, Richard Harrison
> <richard.harrison at ed.ac.uk> wrote:
>> Dear all,
>>
>> I am running Bioperl 1.6 on osx- leopard on a macbook pro.
>>
>> I have installed mysql-5.1.30-osx10.5-x86, DBD-mysql-4.010, the
>> biosql-schema for mysql and bioperl-db.  As per the instructions I  
>> have a
>> database called biosql which I associated the SQL dialect biosqldb- 
>> mysql.sql
>>
>> After much fannying, the install seems fine....although i can't be  
>> sure
>> (never used mysql before)
>>
>> I am having problems with the script load_gff.pl
>>
>> I want to load  a database with the data from a genome.gff file (for
>> saccharomyces cerevisiae). I don't want to add sequence to it, as  
>> all i need
>> is the annotation.
>>
>> I have tried the following command(s):
>>
>> ./bp_load_gff.pl -d biosql -user root -pass mypassword genome.gff
>> ./bp_load_gff.pl -d biosql -user root -pass mypassword -- 
>> adaptor=dbi::mysql
>> genome.gff
>>
>> With both I get the following error:
>>
>> No ftype id for CDS:SGD Table 'biosql.ftype' doesn't exist Record  
>> skipped.
>> (then another few '000 of these)
>> then..
>>
>> genome.gff: 16379 records loaded
>>
>>
>> Any ideas where I'm going wrong?
>>
>> Thanks,
>>
>> Richard
>>
>> ____________________________
>> Dr Richard Harrison
>> 127 Ashworth Labs
>> Institutes of Evolutionary Biology
>> King's Buildings
>> West Mains Road
>> Edinburgh EH9 3JT
>>
>>
>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at  
> scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the Bioperl-l mailing list