[Bioperl-l] Re: load_gff.pl question

Marcel van Batenburg marcelvb at nikhef.nl
Mon Aug 25 17:07:21 EDT 2003


Hi Shin,


I would not even use load_gff.pl for so many lines.
Try bulk_load_gff.pl (ab initio) or fast_load_gff.pl (appending).

Greetings,
Marcel





On 6 Aug 2003, Scott Cain wrote:

> Shin,
> 
> The problem you are running into is not really with load_gff.pl, but
> with the database schema.  Assuming you are using MySQL, the table
> create statement for fdata looks like this:
> 
>  create table fdata (
>     fid                 int not null  auto_increment,
>     fref                varchar(100) not null,
>     fstart              int unsigned   not null,
>     fstop               int unsigned   not null,
>     fbin                double(20,6)  not null,
>     ftypeid             int not null,
>     fscore              float,
>     fstrand             enum('+','-'),
>     fphase              enum('0','1','2'),
>     gid                 int not null,
>     ftarget_start       int unsigned,
>     ftarget_stop        int unsigned,
>     primary key(fid),
>     unique index(fref,fbin,fstart,fstop,ftypeid,gid),
>     index(ftypeid),
>     index(gid)
> 
> The problem  you have is with that unique index on
> (fref,fbin,fstart,fstop,ftypeid,gid).  This index conflicts with your
> data, in that the similar lines are getting assigned the same gid (group
> id), since they look like the same thing.  So, the quick way to fix this
> is to remove the 'unique' from the index declaration.  That can be found
> in Bio/DB/GFF/Adaptor/dbi/mysql.pm. Then run load_gff.pl as usual.  The
> longer way to fix this is look at your data and figure out why they are
> all getting assigned the same group id and make them sufficiently
> different so that they don't.  
> 
> Hope that helps,
> Scott
> 
> On Wed, 2003-08-06 at 13:31, bioperl-l-request at portal.open-bio.org
> wrote:
> > Where do I start to customize this script to allow loading of large 
> > number of similar entities?
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.org
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list