[Biojava-l] Exception when reading protein fasta file

Ren, Zhen zren@amylin.com
Thu, 17 Oct 2002 15:12:19 -0700


Hi,

When the name of the sequence becomes shorter, the problem does disappear.  Anyway, I deleted that sequence from my input file.  The same problem happened again to another sequence which is pretty short this time.

>AAR14362 E.histolytica protein. 167 bp 
mkimkmmkmkkqvqvitqnqaqvinqiinqkqvqvinqkqvqvinqiinqkqvqvinqii
nqkqvqvinqiinqkqvqvinqiinqkqaqvinqiinqkqaqlinqkqaqlinqkqaqli
nqkqaqlinqkqaqlviqminqevvqitiiitlmlhqvhslfsvlsl

Any idea?  Thanks again.

Zhen

-----Original Message-----
From: Schreiber, Mark [mailto:mark.schreiber@agresearch.co.nz]
Sent: Thursday, October 17, 2002 3:01 PM
To: Ren, Zhen; biojava-l@biojava.org
Subject: RE: [Biojava-l] Exception when reading protein fasta file


Hi -

I suspect what is happening is the name of the file is too long for the
read ahead facility of the inputstream. When you mark a point on a
stream you can only read ahead a certain distance before the mark is
lost. I'll see if I can fix it. The temporary solution is to trim the
fasta header to a smaller size.

- Mark


> -----Original Message-----
> From: Ren, Zhen [mailto:zren@amylin.com] 
> Sent: Friday, 18 October 2002 10:22 a.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] Exception when reading protein fasta file
> 
> 
> Hi,
> 
> What's wrong with the code below when testing with a file 
> containing this sequence in FASTA format?
> 
>  >AAP40098 Sequence at antigenic site of a VP1 capsid protein 
> of foot and mouth disease virus (FMDV) selected from 
> serotypes A24 Cruzeiro, C3 Indaial or O1 BFS. 10 bp 
> vhvsgnqhtl
> 
> Here is the exception message I got:
> 
> BioException: Could not read sequence
> java.io.IOException: Can't reset: Mark invalid parseStart=12 
> bytesRead=512
>         at 
> org.biojava.bio.seq.io.FastaFormat.readSequenceData(FastaForma
> t.java:173)
>         at 
> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:123)
>         at 
> org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.
> java:100)
> rethrown as org.biojava.bio.BioException: Could not read sequence
>         at 
> org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.
> java:103)
>         at CTerm.computeLength(CTerm.java:61)
>         at CTerm.run(CTerm.java:94)
> 
> The strange thing is that when you run this program against 
> this sequence alone, it is just fine.  However, when I ran 
> the program against a protein FASTA file containing this 
> sequnece, the above exception always happened.  I really 
> don't see anything special for this sequence.  The file is 
> just plain text file.  I tried to attach the origianl file or 
> included the whole file in this email, both have been 
> rejected by the mailing list.  Please help!  Thanks a lot.
> 
> Zhen
> 
> code:
> 
> import java.io.*;
> import org.biojava.bio.*;
> import org.biojava.bio.seq.*;
> import org.biojava.bio.seq.io.*;
> 
> public class TestSeqIOTools {
> 
>     public static void main(String[] args) {
> 
>         if (args.length != 1) {
>             System.out.println("Usage: java TestSeqIOTools 
> filename.fasta");
>             System.exit(1);
>         }
> 
>         try {
>             BufferedReader fin = new BufferedReader(new 
> FileReader(args[0]));
>             SequenceIterator stream = 
> SeqIOTools.readFastaProtein(fin);
>             while(stream.hasNext()) {
>                 Sequence seq = stream.nextSequence();
>                 System.out.println(seq.seqString());
>             }
>             fin.close();
>         } catch(BioException e) {
>             System.err.println("BioException: " + e.getMessage());
>             e.printStackTrace();
>             System.exit(0);
>         } catch(IOException ex) {
>             System.err.println("IOException: " + ex.getMessage());
>         }
>     }
> }
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================