[Biojava-dev] reading a subsequence from a .nib file

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Apr 3 01:03:20 UTC 2007


Hi -

Too my knowledge nothing like this exists in BioJava. Could someone take 
it the last mile and make it produce SymbolLists?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910





Josh Burdick <jburdick at keyfitz.org>
Sent by: biojava-dev-bounces at lists.open-bio.org
01/23/2007 12:29 AM

 
        To:     biojava-dev at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-dev] reading a subsequence from a .nib file


  I wrote some code to read a chunk of DNA sequence from a file in Jim
Kent's blat ".nib" file format.  This is a simple format using four
bits/base.

  I didn't attach the code, to avoid spamming the whole list; but it,
and a (very crude!) JUnit test, are at

http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java

  You could use 2 bits/base, but then you can't have ambiguous bases.  4
bits/base seems like a reasonable compromise; plus sites that have
"blat" installed will need to have the .nib files on a server somewhere
anyway, and this way repeat-masking can be included, which may be
convenient.

  Also, it doesn't support writing a .nib file; again, presumably people
will be using Jim Kent's faToNib program to do that.

  It would need some tweaking to be included in BioJava, because it
returns a plain String of ACGT, instead of a PackedSequence object.
(Probably this would just involve rewriting the setupBuffer() and
addToBuffer() methods in the code.)  Also, the coordinate information
could come from a Range object.

  If similar code is already somewhere in BioJava, please ignore this;
but I couldn't find it with thirty seconds of Googling, so I figured it
hadn't been written...

Josh Burdick
programmer, Vivian Cheung's lab, Children's Hospital of Philadelphia
jburdick at keyfitz.org


_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev






More information about the biojava-dev mailing list