[Biojava-l] Issues with BioSqlRichSequenceDB.java class

Richard Holland holland at eaglegenomics.com
Thu Feb 11 22:18:18 UTC 2010


My preference would be to leave the existing method as-is, and modify the javadocs so that it is explicit that it is searching by name and not accession. This is to prevent breaking any code that may rely on this behaviour (as Genbank is not the only kind of sequence that can be stored in BioSQL, we can't guarantee that other sequence types are not using name as the unique identifier instead).

Instead, I would propose adding a second method called getRichSequencesByAccession, with the modification you suggest.

With regard to your last point about bad section errors, could you post the stack trace and the code that causes it?

cheers,
Richard

On 11 Feb 2010, at 22:00, Deepak Sheoran wrote:

> Hi
> This class(BiosqlRichSequence) have methods to retrieve record from a local instance of biosql schema but when you type in accession number for record it mostly show the info but in some case (Record with accession:M97762)  it give following error :
> 
>    Hibernate: select sequence0_.bioentry_id as bioentry1_9_, sequence0_1_.name as name9_, sequence0_1_.identifier as identifier9_, sequence0_1_.accession as accession9_, sequence0_1_.description as descript5_9_, sequence0_1_.version as version9_, sequence0_1_.division as division9_, sequence0_1_.taxon_id as taxon8_9_, sequence0_1_.biodatabase_id as biodatab9_9_, sequence0_.version as version13_, sequence0_.length as length13_, sequence0_.alphabet as alphabet13_, sequence0_.seq as seq13_ from biosequence sequence0_ inner join bioentry sequence0_1_ on sequence0_.bioentry_id=sequence0_1_.bioentry_id where sequence0_1_.name=?
> Exception in thread "main" java.lang.RuntimeException: Error while trying to load by id: M97762
>        at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:212)
>        at com.orionbiosciences.orionGenBankLib.genBankDb.GenBankDb.GenBankDbToFileDownLoader(GenBankDb.java:355)
>        at trashtesting.Main.main(Main.java:39)
> Caused by: org.biojava.bio.seq.db.IllegalIDException: Id not found: M97762
>        at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:206)
>        ... 2 more
> Java Result: 1
> 
> The only way to find this record in my database is to search for LOCUS instead of Accession number which is "BTVNS1TUBA", java doc for BioSqlRichSequenceDb class say the id should be Genbank Id  i can't understand what does that means, but when investigated the matter the error is in following method
> 
> public RichSequenceDB getRichSequences(Set ids, RichSequenceDB db) throws BioException, IllegalIDException {
>        if (db==null) db = new HashRichSequenceDB();
>        try {
>            for (Iterator i = ids.iterator(); i.hasNext(); ) {
>                String id = (String)i.next();
>                // Build the query object
>               ***************************error*******************
>                 String queryText = "from Sequence where name = ?";
>             ***************************error***********************
>            *****************************solution**************************
>                String queryText = "from Sequence where accession = ?";
>               // because name stand for Locus from gen-bank record which don't have any unique constraint name so its should not be good idea to use it for searching unique records
>              // also people usually refer to a gen-bank record using accession number instead of LOCUS
>           *****************************solution******************************
>                Object query = this.createQuery.invoke(this.session, new Object[]{queryText});
>                // Set the parameters
>                query = this.setParameter.invoke(query, new Object[]{new Integer(0), id});
>                // Get the results
>                List result = (List)this.list.invoke(query,(Object[]) null);
>                // If the result doesn't just have a single entry, throw an exception
>                if (result.size()==0) throw new IllegalIDException("Id not found: "+id);
>                // Add the results to the results db.
>                for (Iterator j = result.iterator(); j.hasNext(); ) db.addRichSequence((RichSequence)j.next());
>            }
>        } catch (Exception e) {
>            // Throw the exception with our nice message
>            throw new RuntimeException("Error while trying to load by ids: "+ids,e);
>        }
>        return db;
>    }
> 
> even ncbi says " It is better to search for the actual accession number rather than the locus name, because the accessions are stable and locus names can change."
> REF: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusNameB
> 
> So my suggestion is to change the query so it will look for accession instead of name in this method.
> Also if you will try to download record from ncbi using java interface first with accession:M97762( as genbank_id) you can get it, but when you try to get using LOCUS you will get bad section  exception  around reference I don't know why ?
> 
> 
> Deepak Sheoran
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/





More information about the Biojava-l mailing list