[Biojava-l] BioJava-X parsing of RichSequences

Richard Holland richard.holland at ebi.ac.uk
Thu May 4 15:45:38 UTC 2006


The UniProt file format has apparently changed since I wrote the parser,
and the date lines now take a different format:

DT   01-OCT-1994, integrated into UniProtKB/Swiss-Prot.
DT   27-APR-2001, sequence version 3.
DT   18-APR-2006, entry version 85.

These are not recognised by the parser and are throwing an exception.

Also, UniProt changed their Feature Table format. I've also fixed this.

I've updated the parser in CVS to (hopefully) cope with this, although
it now no longer recognises the old format (which was the same as the
EMBL format). Can someone test it thoroughly please?

cheers,
Richard

 On Thu, 2006-05-04 at 11:48 +0200, Ola Spjuth wrote:
> Richard,
> 
> This is what I tried:
> 			Class.forName("org.biojavax.bio.seq.io.EMBLFormat");
> Class.forName("org.biojavax.bio.seq.io.EMBLxmlFormat");
> Class.forName("org.biojavax.bio.seq.io.FastaFormat");
> Class.forName("org.biojavax.bio.seq.io.GenbankFormat");
> Class.forName("org.biojavax.bio.seq.io.INSDseqFormat");
> Class.forName("org.biojavax.bio.seq.io.RichSequenceFormat");
> Class.forName("org.biojavax.bio.seq.io.UniProtFormat");
> Class.forName("org.biojavax.bio.seq.io.UniProtXMLFormat");
> 
> Namespace ns = RichObjectFactory.getDefaultNamespace();         
> RichSequenceIterator seqit;
> seqit = RichSequence.IOTools.readFile(new File(MyFilename),ns);
> 
> ArrayList<RichSequence> seqs=new ArrayList<RichSequence>();
> while (seqit.hasNext()){
>   RichSequence rseq=null;
>   Sequence seq=null;
>   rseq = seqit.nextRichSequence();
>   if (rseq!=null)
>      seqs.add(rseq);
> }
> 
> --
> 
> Seems that seqit.hasNext() returns true, but seqit.nextRichSequence()
> throws an exception.
> 
> It works with my Fasta-sequences but not with the attached UniProt
> sequence (or else I'm doing something wrong). The test-file was attached
> by Mark Southern (thanks Mark!) and works with biojavas SeqIOTools.
> 
> Glad if you could have a look at it!
> 
> Cheers,
> 
>    .../Ola
> 
> 
> On Wed, 2006-05-03 at 11:18 +0100, Richard Holland wrote:
> > Interesting - the code and file would be useful in trying to work out
> > what is happening.
> > 
> > cheers,
> > Richard
> > 
> > On Wed, 2006-05-03 at 00:24 +0200, Ola Spjuth wrote:
> > > Hi Richard,
> > > 
> > > Thanks a lot, I really appreciate that! I think Bioclipse will serve as
> > > an excellent showcase for what can easily be achieved with Biojava.
> > > 
> > > Another problem I found was that parsing of a UniprotFormat file
> > > resulted in no RichSequences while it worked with the old Biojava
> > > SeqIOtools. If you like I can provide the file and code used for my
> > > reading of it.
> > > 
> > > Cheers,
> > > 
> > >    .../Ola
> > > 
> > > 
> > > On Tue, 2006-05-02 at 16:33 +0100, Richard Holland wrote:
> > > > Hi Ola. I'll look into implementing something that'll help you. Give me
> > > > a day or two and see what happens... :)
> > > > 
> > > > cheers,
> > > > Richard
> > > > 
> > > > 
> > > > On Tue, 2006-05-02 at 15:15 +0200, Ola Spjuth wrote:
> > > > > Hi,
> > > > > 
> > > > > Implementing a Biojava reader/parser for sequences in Bioclipse [1,2] I
> > > > > have come up with a few questions:
> > > > > 
> > > > > 1) I'd like to use Biojava-X with Bioclipse. Are there any problems
> > > > > running it with Java 1.5 (as is required by Bioclipse)?
> > > > > 
> > > > > 2) I would propose the addition of a readStream(...) method in
> > > > > RichSequence.IOTools in addition to readFile(...). For the Bioclipse
> > > > > project it would be most useful to be able to guess the format of a
> > > > > Stream. As IOTools is marked final it cannot be subclassed.
> > > > > 
> > > > > 3) Is HashBioEntryDB a suitable base object for storing 1-N
> > > > > RichSequences in memory or should I use RichSequence[]? Which solution
> > > > > has the simplest toByte() method for writing to e.g. a File?
> > > > > 
> > > > > So, basically I am looking for the most convenient way of doing:
> > > > > 
> > > > > i)   Read byte[] (from a File containing 1-N sequences) into a base
> > > > > object in memory (HashBioEntryDB or RichSequence[])
> > > > > ii) Write the (HashBioEntryDB or RichSequence[]) to byte[] (and then
> > > > > later to File using Bioclipse-methods)
> > > > > 
> > > > > Cheers,
> > > > > 
> > > > >    .../Ola
> > > > > 
> > > > > [1] http://www.bioclipse.net
> > > > > [2] http://wiki.bioclipse.net
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > > 
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416




More information about the Biojava-l mailing list