[Biojava-l] sort fasta file

James Swetnam jswetnam at gmail.com
Sun Mar 21 20:56:35 UTC 2010


Just hacked this together, warning: I am new to both java and biojava.

import java.io.*;
import java.util.*;

import org.biojava.bio.BioException;
import org.biojava.bio.symbol.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.*;

import java.util.Comparator;

public class SortFasta {

    static private class RichSequenceComparator implements
Comparator<RichSequence> {

    public int compare(RichSequence seq1, RichSequence seq2)
    {
        return seq1.length() - seq2.length();
    }


    }

    // Usage:  SortFasta unsortedFile.fasta
    public static void main(String[] args) throws FileNotFoundException,
                          BioException {

    String fastaFile = args[0];

    BufferedReader br = new BufferedReader(new FileReader(fastaFile));
    SimpleNamespace ns = new SimpleNamespace("biojava");

    Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
                                  protein.getTokenization("token"),
                                  ns);

    SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
SortFasta.RichSequenceComparator());

    while (rsi.hasNext()) {
        sorted.add(rsi.nextRichSequence());
    }

    Iterator<RichSequence> sortedIt = sorted.iterator();

    //Do whatever you want here with the ascending list of RichSequences by
length, I'll just print them.
    while(sortedIt.hasNext())
        {
        System.out.println(((RichSequence) sortedIt.next()).length());
        }
    }
}

On Sat, Mar 20, 2010 at 6:17 AM, xyz <mitlox at op.pl> wrote:

> Hello,
> I would like to sort multiple fasta file depends on the sequence length,
> ie. from the read with longest sequence to the read with the shortest
> sequence.
>
> import java.io.BufferedReader;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import org.biojava.bio.BioException;
>
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.RichSequence;
> import org.biojavax.bio.seq.RichSequenceIterator;
>
> public class SortFasta {
>
>  public static void main(String[] args) throws FileNotFoundException,
>  BioException {
>
>    BufferedReader br = new BufferedReader(new
>    FileReader("sortfasta.fasta")); SimpleNamespace ns = new
>    SimpleNamespace("biojava");
>
>    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null,
>    ns);
>
>    while (rsi.hasNext()) {
>      RichSequence rs = rsi.nextRichSequence();
>      System.out.println(rs.getName());
>      System.out.println(rs.seqString());
>    }
>  }
> }
>
> I have tried to do it, but I do not how to continue.
>
> Thank you in advance.
>
> Best regards,
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



More information about the Biojava-l mailing list