[Biojava-l] sort fasta file

xyz mitlox at op.pl
Thu Mar 25 13:23:37 UTC 2010


Hi James,
Thank you for the solution, but I get this 
7
13
23
30
as output for this input file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttccccccccccccccccccccccc

How is it possible to fix it and why did you chose Comparator and not
Comparable?

Thank you in advance.

Best regards,


On Sun, 21 Mar 2010 16:56:35 -0400
James Swetnam <jswetnam at gmail.com> wrote:

> Just hacked this together, warning: I am new to both java and biojava.
> 
> import java.io.*;
> import java.util.*;
> 
> import org.biojava.bio.BioException;
> import org.biojava.bio.symbol.*;
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.*;
> 
> import java.util.Comparator;
> 
> public class SortFasta {
> 
>     static private class RichSequenceComparator implements
> Comparator<RichSequence> {
> 
>     public int compare(RichSequence seq1, RichSequence seq2)
>     {
>         return seq1.length() - seq2.length();
>     }
> 
> 
>     }
> 
>     // Usage:  SortFasta unsortedFile.fasta
>     public static void main(String[] args) throws
> FileNotFoundException, BioException {
> 
>     String fastaFile = args[0];
> 
>     BufferedReader br = new BufferedReader(new FileReader(fastaFile));
>     SimpleNamespace ns = new SimpleNamespace("biojava");
> 
>     Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");
> 
>     RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
>                                   protein.getTokenization("token"),
>                                   ns);
> 
>     SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
> SortFasta.RichSequenceComparator());
> 
>     while (rsi.hasNext()) {
>         sorted.add(rsi.nextRichSequence());
>     }
> 
>     Iterator<RichSequence> sortedIt = sorted.iterator();
> 
>     //Do whatever you want here with the ascending list of
> RichSequences by length, I'll just print them.
>     while(sortedIt.hasNext())
>         {
>         System.out.println(((RichSequence) sortedIt.next()).length());
>         }
>     }
> }
> 



More information about the Biojava-l mailing list