[Biojava-l] locating genes in a genomic sequence

Moses Hohman mmhohman@northwestern.edu
Tue, 7 Jan 2003 16:58:04 -0600


Small correction to this: Strings can be larger than 64kB. Prior to 
Java 1.3, if you tried to serialize (or to do anything resulting in 
serialization to) a String larger than 64kB, this would result in a 
java.io.UTFDataFormatException.

See http://java.sun.com/j2se/1.3/docs/guide/serialization/relnotes.html

The only reason I know this is I had this problem before, too ( :

Moses

On Tuesday, January 7, 2003, at 02:28  PM, Schreiber, Mark wrote:

> Hi -
>
> If you already have the SimpleGene features constructed these will
> contain a Location object. However, I think you are saying how can I
> find a subsequence in my Genomic sequence and locate the gene that way?
>
> To rapidly find exact matches you can use the biojava
> KnuthMorrisPrattSearch object from the org.biojava.bio.search package.
> It contains a main method that demonstrates it's use. This is a very
> efficient algorithm for finding exact matches.
>
> Note: if the genome is larger than 64kb you will not be able to dump it
> as a String as that is the maximum String length. You could dump it as 
> a
> char[].
>
> - Mark
>
>
>> -----Original Message-----
>> From: Karin Lagesen [mailto:karin.lagesen@labmed.uio.no]
>> Sent: Tuesday, 7 January 2003 10:53 p.m.
>> To: biojava-l@biojava.org
>> Subject: [Biojava-l] locating genes in a genomic sequence
>>
>>
>> Hi!
>>
>> I am trying to build a small program for finding intergenic
>> areas. This I am planning to do by locating all mRNA's in a
>> genome and outputting all the areas inbetween. Biojava seems
>> to be able to help with most of my tasks. However, I have a
>> few questions. From what I have understood I can have the
>> genomic sequence as a SimpleSequence with all of the genes as
>> SimpleGene's attached via a FeatureHolder to the genomic
>> sequence. However, I have not figured out a smart way of
>> finding the location of each of the genes in the genomic
>> sequence. Since genomic sequences can be large, I hoped to
>> avoid having to dump the sequence in a string and use
>> indexOf(gene sequence) to find the position. Is there
>> something I am missing here, or have I just misunderstood all of this?
>>
>> Thankyou in advance for your help.
>>
>> Karin
>> -- 
>> Karin Lagesen, PhD student
>> karin.lagesen@labmed.uio.no
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l@biojava.org
>> http://biojava.org/mailman/listinfo/biojava-l
>>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>