[Biojava-l] bug with SeqIOTools.readFastaDNA?

Thomas Down thomas at derkholm.net
Mon Jul 21 13:23:28 EDT 2003


Once upon a time, Karin Lagesen wrote:
> I am using the SeqIOTools.readFastaDNA() method to get hold sequences
> which are stored in a file. These are as far as I can tell in the
> correct fasta format. However, whenever the fasta description line
> contains a paranthesis, like this for instance:
> 
> >(gi|16127994:1080570-1080686, 1080677-1081408)
> 
> this sequence does not get read. Is this a bug or is it a feature? And
> if it is a feature, could somebody tell me how to work around it?

What errors are you seeing?  Or is the sequence just
disappearing completely?

Reading FASTA files containing parentheses works for me.
The one caveat is that BioJava determines the name of
the sequence from the text between the '>' and the
first ' ' character.  So in this case, BioJava will,
by default, name your sequence "(gi|16127994:1080570-1080686,",
which might not be what you want.  

Where did you get this file?  Is this another special
case of FASTA format that BioJava ought to understand?

In the mean time, you can get at the complete description
line of a sequence using code like:

    SequenceIterator si = SeqIOTools.readFastaDNA(...);
    while (si.hasNext()) {
        Sequence seq = si.nextSequence();
        System.out.println(seq.getAnnotation().getProperty(
            FastaFormat.PROPERTY_DESCRIPTIONLINE
        ));
    }

Thomas.


More information about the Biojava-l mailing list