[Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry '' for key 2?

Sun Mar 5 20:07:38 UTC 2006

Start looking through load_seqdatabase.pl, the other scripts, and the  
test suite to get an idea of the internals.

It looks like you've loaded the biosql schema, so you must have read  
the INSTALL instructions to get that far.  This is from the INSTALL  
file:

With bioperl and bioperl-db installed you are ready to load some data.
It is advisable to pre-load the NCBI taxonomy database (use
scripts/load_taxonomy.pl in the biosql-schema package, the details are
in its documentation). Otherwise you'll see errors from misparsed
organisms.

The actual script is load_ncbi_taxonomy.pl and is located with biosql- 
schema (the INSTALL needs to be updated), but everything else is the  
same.

On Mar 5, 2006, at 9:36 AM, Jay Hannah wrote:

> Chris Fields wrote:
>> Sorry if I'm a bit off (pub you know) but have you tried the  
>> bioperl- db
>> script load_seqdatabase.pl (scripts dir)?
>
> I poked around in the scripts directory, but am trying to learn the  
> guts well enough to roll my own since I have some point-and-click  
> CGI interfacing in mind. (I'll be posting about the project to this  
> list once we get our thoughts together).
>
>> Have you loaded taxonomy?
>
> No, I'm not familiar with that. I'll read up on it.
>
> Marc Logghe wrote:
>> Yes, I agree with Chris. I also think you'd be better off with
>> load_seqdatabase.pl.
>
> I'm sure I would be for general loading. I'm sure the scripts are  
> far more robust than my little piecemeal stab at it, but I'm not  
> sure I'll learn the guts if I just use scripts. Reading the code  
> there are many nuances I don't understand so I'm trying to learn  
> from the ground up, and I'm not sure what I'm doing wrong in my  
> first baby steps. :)
>
>>>> mysql> select * from biodatabase;
>>>> +----------------+------+-----------+-------------+
>>>> | biodatabase_id | name | authority | description |
>>>> +----------------+------+-----------+-------------+
>>>> |             23 |      | NULL      | NULL        |
>>>> +----------------+------+-----------+-------------+
>>
>> BTW, here you actually did not delete your sequence but the  
>> namespace.
>> If you want to check 'sequences' you should look into the bioentry
>> table.
>
> The data also disappeared out of the biosequence table. That  
> indicates I deleted the sequence, right? (I didn't check bioentry  
> at the time.) I have a question out to the BioSQL-l mailing list  
> about the purpose of the biodatabase table. (I assume this mailing  
> list isn't the right forum for that question.) I've been poking  
> around in the BioSQL ERD, trying to understand the purpose of each  
> of the tables.
>
>> Using load_seqdatabase.pl the namespace is set automatically to the
>> default ('bioperl') but you can set it as well with the --namespace
>> option.
>
> Am I foolhardy to think that I can roll my own simplistic load via  
> the code I posted?
>
> If I do get it working should I write up a HOWTO? I can put a big  
> "For robust file loading, please see load_seqdatabase.pl" warning  
> at the top. But in our case, we're using Bio::SeqIO to walk through  
> tens of thousands of flat file sequences to find the hundred or so  
> we're interested in, and are trying to store only the ones we want  
> into mySQL. (And we're trying to automate this process for rapid  
> subsequent runs: Load my database w/ only those sequences that X.)
>
> Thanks for the quick help!
>
> j
> Omaha Perl Mongers
> http://omaha.pm.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign