[Biojava-l] Getting a part of a sequence

Richard Holland holland at eaglegenomics.com
Fri Oct 10 14:30:03 UTC 2008


This looks like a bug in BJX. I have just committed a fix that I think will
fix it to the head of subversion. Can you check out the latest source,
compile it, and try your program again?

cheers,
Richard

2008/10/9 Gabrielle Doan <gabrielle_doan at gmx.net>

> Hi Richard,
>
> thanks a lot for your mail. I have successfully retrieved the subsequence
> of a sequence as a String. And now I try to get the features for a
> particular range with following code:
>
> <code>
>        public FeatureHolder filterFeature(String name, int startpos, int
> endpos) {
>                RichLocation rl = new SimpleRichLocation(new
> SimplePosition(startpos),
>                                new SimplePosition(endpos), 0);
>                BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>                                new
> BioSQLFeatureFilter.BySequenceName(name),
>                                new
> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>                return filter(filter);
>        }
> <\code>
>
> Fortunately I received these errors:
> <message>
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>        at
> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>        at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>        ... 3 more
> Caused by: org.hibernate.PropertyAccessException: Exception occurred inside
> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>        at
> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>        at
> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>        at
> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>        at
> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>        at
> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>        at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>        at
> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>        at org.hibernate.loader.Loader.doList(Loader.java:2213)
>        at
> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>        at org.hibernate.loader.Loader.list(Loader.java:2099)
>        at
> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>        at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>        ... 8 more
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>        ... 21 more
> Caused by: java.lang.NullPointerException
>        at
> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>        at
> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>        at
> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>        ... 26 more
> <\message>
>
> Why do I get these errors?
> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. How
> can I find out the sequence name? Is it the value "name" in the table
> "Bioentry"? As the build-in subSequence method takes a long time I intend to
> get the subsequence as a String by myself and add the features to it. What
> do you think about this?
>
> I'm grateful for any hints.
> cheers,
>
> Gabrielle
>
>
>
> Richard Holland schrieb:
>
>  Hello.
>>
>> Your code is pretty good already - but you're right, it will load the
>> whole chromosome into memory before you can chop out the interesting
>> bit you actually need.
>>
>> As you observed, by using ThinRichSequence in your query it will load
>> only the initial shell of a sequence object to start with, but the
>> moment you try and sub-sequence it, it will immediately load the whole
>> sequence data into memory in order to perform the operation.
>>
>> If you only want the sequence data, as a string, you can do this by
>> specifying the sequence attribute in the query and bypassing the
>> sequence object entirely:
>>
>>  select rs.stringSequence from Sequence as rs where rs.description
>> like '%hromosome :num%
>>
>> This will return a String instead of a RichSequence object. You can
>> use HQL operators to perform substrings etc. on the string inside the
>> query itself - see
>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>> , particularly section 14.9.
>>
>> If you only want the features, you can do this by using the
>> BioSQLFeatureFilter technique. In particular you will want the
>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>> filter. You construct a filter then pass it to the filter() method in
>> BioSQLRichSequenceDB. The database will return to you all the
>> RichFeature objects that match your criteria. Note that it searches
>> the whole database so you really must use a BySequenceName filter at
>> the very least in order to make the results useful!
>>
>> However, you can't use HQL to construct a complete slice of a sequence
>> directly in the database before returning it to the program for use as
>> a ready-made RichSequence object. This would require Hibernate to know
>> what a BioJava sub-sequence object is and how it behaves in relation
>> to an 'unsliced' one, which is beyond the scope of it's job as a
>> persistence framework.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/7 Gabrielle Doan <gabrielle_doan at gmx.net>:
>>
>>> Hi all,
>>> I have a BioSQL database which contains all human chromosomes. My
>>> intention
>>> is to get the information about a particular gene. How can I get a part
>>> of a
>>> particular chromosome with all associated features? At the moment I use
>>> following code to create my new sequence:
>>>
>>> <code>
>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>       position[0], position[1], ns, geneName, parent.getAccession(),
>>>       parent.getIdentifier(), parent.getVersion() + 1,
>>>       (Double) (parent.getVersion() + 1.0));
>>> <\code>
>>>
>>> Here is the part how I get the parent sequence:
>>> <code>
>>>       public static RichSequence getChromosome(String chrNo) {
>>>               Transaction tx = session.beginTransaction();
>>>               RichSequence ret = null;
>>>
>>>               String query;
>>>
>>>               try {
>>>                       if (chrNo.equals("MT")) {
>>>                               query = "from BioEntry as be where
>>> be.description like '%:num%'";
>>>                               query = query.replaceAll(":num",
>>> "mitochondrion");
>>>                       } else {
>>>                               query = "from BioEntry as be where
>>> be.description like '%hromosome :num%'";
>>>                               query = query.replaceAll(":num", chrNo);
>>>                       }
>>>
>>>                       Query q = session.createQuery(query);
>>>
>>>                       ret = (RichSequence) q.list().get(0);
>>>                       tx.commit();
>>>               } catch (Exception e) {
>>>                       tx.rollback();
>>>                       e.printStackTrace();
>>>               }
>>>               return ret;
>>>       }
>>> <\code>
>>>
>>> I always have to load the whole chromsome to get a part of it, so it
>>> takes
>>> very long time and I get a lot of unused information (waste of memory). I
>>> also tried to use <code>ThinRichSequence<\code> instead of
>>> <code>RichSequence<\code>, but thereby I didn't notice any difference.
>>> Can you give me a hint how to accelerate the code?
>>> I am grateful for any hits.
>>>
>>> cheers,
>>> Gabrielle
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>
>>
>>
>>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/



More information about the Biojava-l mailing list