[Bioperl-l] One more load_seqdatabase.pl question

Hilmar Lapp hlapp at gmx.net
Thu Nov 30 23:28:40 UTC 2006


Right. You need to tell it to lookup sequences first if you know that  
you are loading sequences which may be in the database already (see  
the POD of load_seqdatabase.pl, switch --lookup; there are several  
other command line options that control what will happen if a  
sequence entry is already present in the database.).

The messages in you report are warnings, not errors. It looks like  
some of the comments are duplicated for a sequence, it doesn't look  
like reason for concern. Is not so good if you get errors thrown.

	-hilmar

On Nov 30, 2006, at 5:08 PM, gang wu wrote:

> Thanks Hilmar. Do you mean the NVL() clause will make  
> load_seqdatabase.pl not work when update?
>
> I have problem with updating. Seems load_seqdatabase.pl only tries  
> to insert instead of update. I used one of the test genbank file  
> coming whith bioperl-db. Please take a look at the attached output.
>
> Thanks.
>
> Gang
>
> =========================================
> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle - 
> dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank - 
> namespace test /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/ 
> biosql/data/AP000868.gb
> Loading /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/ 
> AP000868.gb ...
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed,  
> values were ("This sequence was reannotated via the Ensembl system.  
> Please visit the Ensembl web site, http://www.ensembl.org/ for more  
> information. ","1") FKs (389109)
> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT)  
> violated (DBD ERROR: OCIStmtExecute)
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed,  
> values were ("The /gene indicates a unique id for a gene, /cds a  
> unique id for a translation and a /exon a unique id for an exon.  
> These ids are maintained wherever possible between versions. For  
> more information on how to interpret the feature table, please  
> visit http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109)
> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT)  
> violated (DBD ERROR: OCIStmtExecute)
> ---------------------------------------------------
> ...
> ...
> ==========================================================
> Hilmar Lapp wrote:
>> These are the protein translations stored in the feature table as  
>> tags of features, right? You can change the type of the column  
>> (although there may be some issues when you update the column  
>> because the NVL() clause won't work if I recall that correctly),  
>> but doing so will deprive you of any 'normal' searches against  
>> that column. (You can still use functions >from the DBMS_LOB  
>> package, but they will be much slower and are completely non- 
>> standard.) It is up to you whether that is too big of a price to  
>> pay for having some redundant protein translations (translating  
>> the feature's DNA sequence should give you the same) in the  
>> database. I always trimmed those feature tags off (using a custom  
>> SeqProcessor). An alternative is to convert these feature tags  
>> into actual bioentries (i.e., Bio::Seq objects; again, a custom  
>> SeqProcessor will allow you to do that). -hilmar On Nov 28, 2006,  
>> at 4:13 PM, gang wu wrote:
>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank  
>>> genome sequences to my Oracle BioSQL database. I saw some errors 
>>> (See attached warning message) related to  
>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE  
>>> column), which has Varchar2 data type of maximum 4000 bytes. Did  
>>> anybody mention this issue before? Should I just modify the  
>>> column to a type being able store more data such as LONG or CLOB?  
>>> Thanks. Gang Log information:  
>>> ============================================ load_seqdatabase.pl - 
>>> host elegans -driver Oracle -dbname sparc -dbuser biosqldb- 
>>> sgowner -dbpass PASS -format genbank -namespace genbank / 
>>> genomeseq/arabidopsis//NC_003070.gbk Loading /genomeseq/ 
>>> arabidopsis//NC_003070.gbk ... -------------------- WARNING  
>>> --------------------- MSG: SimpleValueAdaptor::add_assoc:  
>>> unexpected failure of statement execution: ORA-01461: can bind a  
>>> LONG value only for insert into a LONG column (DBD ERROR: error  
>>> possibly near <*> indicator at char 12 in 'INSERT INTO  
>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank)  
>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2]  
>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: FK 
>>> [Bio::SeqFeature::Generic]:14898, FK 
>>> [Bio::Annotation::SimpleValue]:800,  
>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSG 
>>> FV  
>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASA 
>>> DR  
>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSS 
>>> EI  
>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNS 
>>> FP  
>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFE 
>>> RA  
>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYT 
>>> RY  
>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRT 
>>> EA  
>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIV 
>>> DI  
>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENI 
>>> DW  
>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRF 
>>> AL  
>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGS 
>>> DN  
>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPT 
>>> RY  
>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYL 
>>> YT  
>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLR 
>>> GL VQATYQASA!  
>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGG 
>>> KV  
>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDED 
>>> AY  
>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSD 
>>> FV  
>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSD 
>>> SE  
>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGT 
>>> LG  
>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQ 
>>> EV  
>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLG 
>>> QL  
>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVP 
>>> TL  
>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRT 
>>> VT  
>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEM 
>>> RL  
>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRIL 
>>> AV  
>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGID 
>>> EY  
>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDG 
>>> VD  
>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKL 
>>> LR VKLDFNFM!  
>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNL 
>>> QS  
>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAV 
>>> SN  
>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVY 
>>> CL  
>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSG 
>>> WD  
>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADD 
>>> SE  
>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEA 
>>> SV  
>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLD 
>>> GL  
>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQ 
>>> SS  
>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDED 
>>> DF  
>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAK 
>>> GL  
>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKG 
>>> VA  
>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNG 
>>> CL  
>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSM 
>>> RN  
>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPK 
>>> VE  
>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSK 
>>> PL WLSVGADAS!  
>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRP 
>>> VY  
>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWIT 
>>> ND  
>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQ 
>>> ES  
>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNM 
>>> TS  
>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPF 
>>> SV  
>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSE 
>>> SW  
>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKN 
>>> IV  
>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILY 
>>> MS  
>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDL 
>>> RV  
>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSM 
>>> AM  
>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGV 
>>> DI  
>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEG 
>>> AK  
>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSL 
>>> LR  
>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNI 
>>> MG QRKFIPAK!  
>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAI 
>>> DQ AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS",  
>>> rank:"1" --------------------------------------------------  
>>> =============================================    
>>> _______________________________________________ Bioperl-l mailing  
>>> list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/ 
>>> mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list