[Biojava-l] Getting a part of a sequence

Gabrielle Doan gabrielle_doan at gmx.net
Thu Oct 9 12:22:01 UTC 2008


Hi Richard,

thanks a lot for your mail. I have successfully retrieved the 
subsequence of a sequence as a String. And now I try to get the features 
for a particular range with following code:

<code>
	public FeatureHolder filterFeature(String name, int startpos, int endpos) {
		RichLocation rl = new SimpleRichLocation(new SimplePosition(startpos),
				new SimplePosition(endpos), 0);
		BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
				new BioSQLFeatureFilter.BySequenceName(name),
				new BioSQLFeatureFilter.OverlapsRichLocation(rl));
		return filter(filter);
	}
<\code>

Fortunately I received these errors:
<message>
Exception in thread "main" java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
	at org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
	at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
	... 3 more
Caused by: org.hibernate.PropertyAccessException: Exception occurred 
inside setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
	at 
org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
	at 
org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
	at 
org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
	at 
org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
	at 
org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
	at org.hibernate.loader.Loader.doQuery(Loader.java:729)
	at 
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
	at org.hibernate.loader.Loader.doList(Loader.java:2213)
	at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
	at org.hibernate.loader.Loader.list(Loader.java:2099)
	at 
org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
	at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
	at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
	... 21 more
Caused by: java.lang.NullPointerException
	at 
org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
	at 
org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
	at 
org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
	at 
org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
	at org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
	at 
org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
	... 26 more
<\message>

Why do I get these errors?
BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. 
How can I find out the sequence name? Is it the value "name" in the 
table "Bioentry"? As the build-in subSequence method takes a long time I 
intend to get the subsequence as a String by myself and add the features 
to it. What do you think about this?

I'm grateful for any hints.
cheers,

Gabrielle



Richard Holland schrieb:
> Hello.
> 
> Your code is pretty good already - but you're right, it will load the
> whole chromosome into memory before you can chop out the interesting
> bit you actually need.
> 
> As you observed, by using ThinRichSequence in your query it will load
> only the initial shell of a sequence object to start with, but the
> moment you try and sub-sequence it, it will immediately load the whole
> sequence data into memory in order to perform the operation.
> 
> If you only want the sequence data, as a string, you can do this by
> specifying the sequence attribute in the query and bypassing the
> sequence object entirely:
> 
>  select rs.stringSequence from Sequence as rs where rs.description
> like '%hromosome :num%
> 
> This will return a String instead of a RichSequence object. You can
> use HQL operators to perform substrings etc. on the string inside the
> query itself - see
> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
> , particularly section 14.9.
> 
> If you only want the features, you can do this by using the
> BioSQLFeatureFilter technique. In particular you will want the
> BySequenceName filter, the And filter, and the OverlapsRichLocation
> filter. You construct a filter then pass it to the filter() method in
> BioSQLRichSequenceDB. The database will return to you all the
> RichFeature objects that match your criteria. Note that it searches
> the whole database so you really must use a BySequenceName filter at
> the very least in order to make the results useful!
> 
> However, you can't use HQL to construct a complete slice of a sequence
> directly in the database before returning it to the program for use as
> a ready-made RichSequence object. This would require Hibernate to know
> what a BioJava sub-sequence object is and how it behaves in relation
> to an 'unsliced' one, which is beyond the scope of it's job as a
> persistence framework.
> 
> cheers,
> Richard
> 
> 
> 
> 2008/10/7 Gabrielle Doan <gabrielle_doan at gmx.net>:
>> Hi all,
>> I have a BioSQL database which contains all human chromosomes. My intention
>> is to get the information about a particular gene. How can I get a part of a
>> particular chromosome with all associated features? At the moment I use
>> following code to create my new sequence:
>>
>> <code>
>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>        position[0], position[1], ns, geneName, parent.getAccession(),
>>        parent.getIdentifier(), parent.getVersion() + 1,
>>        (Double) (parent.getVersion() + 1.0));
>> <\code>
>>
>> Here is the part how I get the parent sequence:
>> <code>
>>        public static RichSequence getChromosome(String chrNo) {
>>                Transaction tx = session.beginTransaction();
>>                RichSequence ret = null;
>>
>>                String query;
>>
>>                try {
>>                        if (chrNo.equals("MT")) {
>>                                query = "from BioEntry as be where
>> be.description like '%:num%'";
>>                                query = query.replaceAll(":num",
>> "mitochondrion");
>>                        } else {
>>                                query = "from BioEntry as be where
>> be.description like '%hromosome :num%'";
>>                                query = query.replaceAll(":num", chrNo);
>>                        }
>>
>>                        Query q = session.createQuery(query);
>>
>>                        ret = (RichSequence) q.list().get(0);
>>                        tx.commit();
>>                } catch (Exception e) {
>>                        tx.rollback();
>>                        e.printStackTrace();
>>                }
>>                return ret;
>>        }
>> <\code>
>>
>> I always have to load the whole chromsome to get a part of it, so it takes
>> very long time and I get a lot of unused information (waste of memory). I
>> also tried to use <code>ThinRichSequence<\code> instead of
>> <code>RichSequence<\code>, but thereby I didn't notice any difference.
>> Can you give me a hint how to accelerate the code?
>> I am grateful for any hits.
>>
>> cheers,
>> Gabrielle
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> 
> 
> 




More information about the Biojava-l mailing list