[Biojava-dev] Code Update
Scooter Willis
HWillis at scripps.edu
Tue Jan 26 20:17:47 UTC 2010
Andy
Let me know when you have that in a healthy state and I will work on the gtf/gff3 parser->create gene->transcript->(exon)->to protein code.
Scooter
On Jan 26, 2010, at 2:58 PM, Andy Yates wrote:
> Talking about code updates I've got DNA -> RNA -> Peptide working
> quite well. It's about a day or two of tinkering away from being in a
> sensible state. There's also some utilities I've gone & created;
> they've gone into org.biojava3.core.util ... anyone got any better
> suggestions as to where they should live?
>
> Andy
>
> On 26 Jan 2010, at 18:45, Andreas Prlic wrote:
>
>> the cookbook approach seems to work quite well. You could start a new
>> "Chapter" in the book and make it clear that this will be only
>> available once biojava 3 has been released (or via SVN checkout)
>>
>> Andreas
>>
>> On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis
>> <HWillis at scripps.edu> wrote:
>>>
>>> I checked in updates with test cases for Fasta fileparsing where
>>> the main focus is on the fasta header. The test cases are based on
>>> the wikipedia examples so results will vary with actual files. It
>>> is very easy now to do a custom header parser so we have lots of
>>> flexibility. I also started the code for the file pointer sequence
>>> proxy where the key usage is creating a sequence with the header
>>> and storing a reference to the file and offset in the file for the
>>> start of the sequence. When a method is called related to getting a
>>> sequence/subsequence the init() method is called to load the
>>> sequence data via RandomAccessFile with a seek to the offset. It
>>> turns out that none of the java io classes will actually return an
>>> offset index of the actual bytes read. This also gets complicated
>>> with the readline() methods where the CR and/or LF is stripped off
>>> when the string is returned so you can't keep track of it
>>> externally. I copied the BufferedReader.java class to
>>> BufferedReaderBytes!
>>> Read.java and keep track of the file pointer internally. This code
>>> still needs to be tested. This should be a great way to load large
>>> date sets with minimal memory. To complete this approach I will
>>> probably do a collection that is proxy aware that can go through
>>> and free up storage by returning a sequence to its proxy state.
>>>
>>> I will work this week on getting some wiki pages created to give
>>> examples on using the header parsing interface and proxy sequences.
>>> How do we want to organize wiki pages related to biojava3 work?
>>>
>>> Thanks
>>>
>>> Scooter
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
More information about the biojava-dev
mailing list