[Biojava-l] Reading sequence identifier from file in FASTA format

Thu Nov 2 21:37:58 UTC 2006

Hello Everybody, this is my first write to the list, i'm a student of
computer science, and a really newbie into this subjects, so i'm
experimenting with BioJava against BioPerl...

My problem is that i'm trying to read FASTA file with
///////////////////
        BufferedReader input = new BufferedReader( new
FileReader("NC_008009.fna") );
        RichSequenceIterator seqIter = RichSequence.IOTools.readFastaDNA(
input, RichObjectFactory.getDefaultNamespace() );
        RichSequence rseq = null;

        if( seqIter.hasNext() )
        {
            rseq = seqIter.nextRichSequence();
            System.out.println("Identifier: "+rseq.getIdentifier() );
            System.out.println("Description: "+rseq.getDescription() );
            System.out.println("SubList: "+rseq.subList(10, 20).seqString()
);
        }
///////////////////
but i'm getting a "identifier: null" into the answer when it had to be a
"Identifier: 94967031", the description and sublist works good... in (
http://biojava.org/wiki/BioJava:BioJavaXDocs#Reading) says that:
>gi|<identifier>|<namespace>|<accession>.<version>|<name> <description>
identifier will be read it with setIdentifier() and will available through
getIdentifier() but i get a null.

Then... this file is really in FASTA format ? or i'm doing into the wrong
way this ?

The test file is:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Acidobacteria_bacterium_Ellin345/NC_008009.fna

And the header is:
>gi|94967031|ref|NC_008009.1| Acidobacteria bacterium Ellin345, complete
genome
CCGTGTGTTGCGCGGCCAGATGAGAAATTTCTATGTCCCTCTCGACCACGACTCCACCAGCTCCGAACCC

Where i can find a newbie tutorial for starter task into BioJava ?

I appreciate any help... thanks.

Farewell
Alan Acosta
Cali - Colombia