[BioSQL-l] BioSQL conflict with swissprot and NCBI
Hilmar Lapp
hlapp at gnf.org
Mon Nov 10 12:38:02 EST 2003
Some swissprot records still won't parse properly b/c of species
parsing problems. Try to run the load_seqdatabase.pl with --safe
(that's always a good idea anyway unless you want to immediately get
thrown out upon the first trouble maker), then see what the accession
numbers of those records are. A complete parse of swissprot and trembl
should give you a count of failures that should be in the low double
digits (out of a total of more than 1 million).
-hilmar
On Monday, November 10, 2003, at 08:30 AM, Raphael Bauer wrote:
> taxonomy
>
>> i've got some trouble parsing the ncbi taxonomy into an existing
>> biosql
>> schmema populated with swissport.
> ...cut out....
>>
>> ...in my opinion it is due to the fact that swissprot has some kind of
>> taxonomy in it's OC lines that are a part of the NCBI taxonomy.
>> (parsed
>> already in table term)
>>
>> So my question is if there is a way to integrate swissprot and ncbi in
>> one biosql schema.
>> Or if it is better to keep NCBI and swissprot seperated in own biosql
>> schemas and map them together lateron to get a mapping from ncbi and
>> swissprot...
>>
>>> So my question is if there is a way to integrate swissprot and ncbi
>>> in
>>> one biosql schema.
>>
>> Absolutely, but in the opposite order than you did. The problem with
>> loading
>> swissprot first is that then you get about 6000-7000 taxa with
>> unreliable
>> (against the NCBI taxonomy as the standard) and/or incomplete
>> lineages.
>>
>> First load the NCBI taxonomy database, only then a sequence database.
>> Which
>> BTW should also rid you of some errors you will have seen when you
>> loaded
>> swissprot.
>>
>
> Hi Hilmar,
> thanks for the fast reply.
> I just tried it the other way round (First NCBI
> then Swissprot) but the problem still remains...
> ... I tried also parsing Swissprot with load_seqdatabase with --lookup
> and
> without -- lookup, but it makes no difference... (that's to some point
> clear for me as well)..
> ...
> My command lines and the error message:
> NCBI:
> ----
> perl load_ncbi_taxonomy.pl --dbname NCBIdannSprot --driver Pg --host
> localhost --dbuser biosql --download --directory ~/wbi/.
> ...works fine
>
> Swissprot with lookup:
> ----------------------
> perl load_seqdatabase.pl --lookup --host localhost --dbuser biosql
> --dbname NCBIdannSprot_mitlookup --namespace swissprot --driver Pg
> --format swiss /local/sprot_weekly.dat
> Loading /local/sprot_weekly.dat ...
> DBD::Pg::st execute failed: ERROR: Cannot insert a duplicate key into
> unique index taxon_pkey at
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Pg/SpeciesAdaptorDriver.pm
> line 356, <GEN0> line 385883.
> Could not store O18759:
> ------------- EXCEPTION -------------
> MSG: create: object (Bio::Species) failed to insert or to be found by
> unique key
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/
> BasePersistenceAdaptor.pm:207
> STACK Bio::DB::Persistent::PersistentObject::create
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/
> PersistentObject.pm:243
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/
> BasePersistenceAdaptor.pm:170
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/
> BasePersistenceAdaptor.pm:253
> STACK Bio::DB::Persistent::PersistentObject::store
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/
> PersistentObject.pm:270
> STACK (eval) load_seqdatabase.pl:446
> STACK toplevel load_seqdatabase.pl:429
>
> --------------------------------------
>
> ...perhaps there is something wrong in my command line options, but i
> can't see it...
>
> Thanks for your help,
>
> Raphael Bauer
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list