[Biojava-l] Newbie question: species info from EMBL?
Andreas Matern
andreas.matern@lbri.lionbioscience.com
Wed, 11 Dec 2002 18:35:06 -0500
I should know better, but it's been a while since I've used biojava...
I need to parse an EMBL file, pulling out all the FASTA for organisms of
interest.
in bioperl I would do something like:
$in = Bio::SeqIO->new(-file => "$file" , '-format' => 'EMBL');
$out = Bio::SeqIO->new(-file=> ">$file.out", '-format' => 'Fasta');
while ( $seq = $in->next_seq() ) {
$acc = $seq->accession_number;
$species = $seq->species();
$species = $species->binomial();
//there exists %myfavoritespecies() somewhere...
if (exists($myfavoritespecies{$species})) {
$seq->display_id($acc);
$out->write_seq($seq);
}
}
in biojava it's something like: (TOTALLY stolen from bioconf.otago.ac.nz)
public class TestEMBLParsing {
public static void main (String[] args) {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(args[0]));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// read the embl file
SequenceIterator sequences = SeqIOTools.readEmbl(br);
while (sequences.hasNext()) {
try {
Sequence seq = sequences.nextSequence();
String accession = seq.getName();
String fasta = seq.seqString();
// how do I check to see if its my species of interest?
// how do I create a FASTA output stream?
System.out.println(accession); // for testing
System.out.println(fasta); // for testing
} catch (NoSuchElementException e) {
e.printStackTrace();
} catch (BioException e) {
e.printStackTrace();
}
}
}
}
Thanks so much, sorry if this has been posted somewhere... just couldn't
find it looking around the website...
-Andreas
--------------
Andreas Matern
Bioinformatician
Bioinformatics - Research and Development
Lion Bioscience Research Inc.
141 Portland Street, 10th floor
Cambridge, MA 02139 USA
Phone: 617-245-5483
Fax: 617-245-5499
amatern@lbri.lionbioscience.com
www.lionbioscience.com