[BioSQL-l] error loading uniprot release 49.6 into mysql

s.rayner at att.net s.rayner at att.net
Mon May 15 12:34:15 UTC 2006


I found where the script is hiccuping....

The Uniprot release contains lines with identical annotation for the RL keyword for two different sequences.

___________________

First occurence...  
___________________

ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
AC   Q5RFJ2; Q5RDK2;
DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2005, sequence version 2.
DT   18-APR-2006, entry version 13.
DE   14-3-3 protein theta.
GN   Name=YWHAQ;
OS   Pongo pygmaeus (Orangutan).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Pongo.
OX   NCBI_TaxID=9600;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Brain cortex, and Kidney;
RG   The German cDNA consortium;
RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.  <======  Not Unique


___________________

Second occurence...  
___________________


ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
AC   Q5RC20;
DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2005, sequence version 2.
DT   18-APR-2006, entry version 13.
DE   14-3-3 protein gamma.
GN   Name=YWHAG;
OS   Pongo pygmaeus (Orangutan).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Pongo.
OX   NCBI_TaxID=9600;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Heart;
RG   The German cDNA consortium;
RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   <======  Not Unique



in these two cases the generated CRC key is identical and so MySQL throws a wobbly.

if i look at the MySQL entry in the REFERENCE table for the first sequence
------+-------+---------+----------------------+
|          139 |      NULL | Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
+--------------+-----------+----------------------------------------------------

and the error when the script choked was 

 MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were 
 ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ 
 databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
 Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3

hence the problem.

I'm guessing i'm not the first person to encounter this, but dont see any hints for an easy way around this.  

any suggestions....?

ta




More information about the BioSQL-l mailing list