[Biojava-l] Issues with BioSqlRichSequenceDB.java class

Deepak Sheoran sheoran143 at gmail.com
Thu Feb 11 09:00:15 UTC 2010


Hi
This class(BiosqlRichSequence) have methods to retrieve record from a 
local instance of biosql schema but when you type in accession number 
for record it mostly show the info but in some case (Record with 
accession:M97762)  it give following error :

     Hibernate: select sequence0_.bioentry_id as bioentry1_9_, 
sequence0_1_.name as name9_, sequence0_1_.identifier as identifier9_, 
sequence0_1_.accession as accession9_, sequence0_1_.description as 
descript5_9_, sequence0_1_.version as version9_, sequence0_1_.division 
as division9_, sequence0_1_.taxon_id as taxon8_9_, 
sequence0_1_.biodatabase_id as biodatab9_9_, sequence0_.version as 
version13_, sequence0_.length as length13_, sequence0_.alphabet as 
alphabet13_, sequence0_.seq as seq13_ from biosequence sequence0_ inner 
join bioentry sequence0_1_ on 
sequence0_.bioentry_id=sequence0_1_.bioentry_id where sequence0_1_.name=?
Exception in thread "main" java.lang.RuntimeException: Error while 
trying to load by id: M97762
         at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:212)
         at 
com.orionbiosciences.orionGenBankLib.genBankDb.GenBankDb.GenBankDbToFileDownLoader(GenBankDb.java:355)
         at trashtesting.Main.main(Main.java:39)
Caused by: org.biojava.bio.seq.db.IllegalIDException: Id not found: M97762
         at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:206)
         ... 2 more
Java Result: 1

The only way to find this record in my database is to search for LOCUS 
instead of Accession number which is "BTVNS1TUBA", java doc for 
BioSqlRichSequenceDb class say the id should be Genbank Id  i can't 
understand what does that means, but when investigated the matter the 
error is in following method

public RichSequenceDB getRichSequences(Set ids, RichSequenceDB db) 
throws BioException, IllegalIDException {
         if (db==null) db = new HashRichSequenceDB();
         try {
             for (Iterator i = ids.iterator(); i.hasNext(); ) {
                 String id = (String)i.next();
                 // Build the query object
                ***************************error*******************
                  String queryText = "from Sequence where name = ?";
              ***************************error***********************
             *****************************solution**************************
                 String queryText = "from Sequence where accession = ?";
                // because name stand for Locus from gen-bank record 
which don't have any unique constraint name so its should not be good 
idea to use it for searching unique records
               // also people usually refer to a gen-bank record using 
accession number instead of LOCUS
            
*****************************solution******************************
                 Object query = this.createQuery.invoke(this.session, 
new Object[]{queryText});
                 // Set the parameters
                 query = this.setParameter.invoke(query, new 
Object[]{new Integer(0), id});
                 // Get the results
                 List result = (List)this.list.invoke(query,(Object[]) 
null);
                 // If the result doesn't just have a single entry, 
throw an exception
                 if (result.size()==0) throw new IllegalIDException("Id 
not found: "+id);
                 // Add the results to the results db.
                 for (Iterator j = result.iterator(); j.hasNext(); ) 
db.addRichSequence((RichSequence)j.next());
             }
         } catch (Exception e) {
             // Throw the exception with our nice message
             throw new RuntimeException("Error while trying to load by 
ids: "+ids,e);
         }
         return db;
     }

even ncbi says " It is better to search for the actual accession number 
rather than the locus name, because the accessions are stable and locus 
names can change."
REF: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusNameB

So my suggestion is to change the query so it will look for accession 
instead of name in this method.
Also if you will try to download record from ncbi using java interface 
first with accession:M97762( as genbank_id) you can get it, but when you 
try to get using LOCUS you will get bad section  exception  around 
reference I don't know why ?


Deepak Sheoran






More information about the Biojava-l mailing list