[Biojava-l] Newbie question: species info from EMBL?

Andreas Matern andreas.matern@lbri.lionbioscience.com
Wed, 11 Dec 2002 18:35:06 -0500


I should know better, but it's been a while since I've used biojava...

I need to parse an EMBL file, pulling out all the FASTA for organisms of 
  interest.

in bioperl I would do something like:

$in  = Bio::SeqIO->new(-file => "$file" , '-format' => 'EMBL');
$out = Bio::SeqIO->new(-file=> ">$file.out", '-format' => 'Fasta');
while ( $seq = $in->next_seq() ) {
   $acc = $seq->accession_number;
   $species = $seq->species();
   $species = $species->binomial();
	//there exists %myfavoritespecies() somewhere...
   if (exists($myfavoritespecies{$species})) {
	$seq->display_id($acc);
	$out->write_seq($seq);
   }
}

in biojava it's something like:  (TOTALLY stolen from bioconf.otago.ac.nz)

public class TestEMBLParsing {
     public static void main (String[] args) {
         BufferedReader br = null;

         try {
             br = new BufferedReader(new FileReader(args[0]));
         } catch (FileNotFoundException e) {
             e.printStackTrace();
         }
         // read the embl file
         SequenceIterator sequences = SeqIOTools.readEmbl(br);
         while (sequences.hasNext()) {
             try {
                 Sequence seq = sequences.nextSequence();
                 String accession = seq.getName();
                 String fasta = seq.seqString();
                 // how do I check to see if its my species of interest?
		// how do I create a FASTA output stream?
                 System.out.println(accession); // for testing
                 System.out.println(fasta); // for testing
             } catch (NoSuchElementException e) {
                 e.printStackTrace();
             } catch (BioException e) {
                 e.printStackTrace();
             }
         }
     }
}


Thanks so much, sorry if this has been posted somewhere... just couldn't 
find it looking around the website...

-Andreas



--------------
Andreas Matern
Bioinformatician
Bioinformatics - Research and Development
Lion Bioscience Research Inc.
141 Portland Street, 10th floor
Cambridge, MA 02139  USA
Phone: 617-245-5483
Fax: 617-245-5499
amatern@lbri.lionbioscience.com
www.lionbioscience.com