[Biojava-l] Biojava-l Digest, Vol 102, Issue 4

Fri Jul 8 05:09:08 UTC 2011

The code is as follows: Actually the code is taken from BioJavaCookbook with a little modification. The following method is called from another class. The method takes the names of the files or simply say files as an argument in the form of list.

public void MSAFromFiles(List<String> ids) throws Exception{
        List<ProteinSequence> lst = new ArrayList<ProteinSequence>();
         ProteinSequence pSeq=null;
        for (String id : ids) {
            pSeq=getSequenceFromFiles(id);
            lst.add(pSeq);
            //System.out.println("seq==" +pSeq);
        }
        profile = Alignments.getMultipleSequenceAlignment(lst);
    }

getSequenceFromFiles() method is given below

private ProteinSequence getSequenceFromFiles(String inputFile) throws Exception{
        ProteinSequence seq=null;
        //System.out.println("inputFile==="+inputFile);
         FileInputStream is = new FileInputStream(inputFile);

            FastaReader<ProteinSequence, AminoAcidCompound> fastaReader = new FastaReader<ProteinSequence, AminoAcidCompound>(is, new GenericFastaHeaderParser<ProteinSequence,AminoAcidCompound>(), new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));
            LinkedHashMap<String,ProteinSequence> proteinSequences = fastaReader.process();
            is.close();
             //System.out.println( "proteinSequences=" + proteinSequences );
            //LinkedHashMap<String, ProteinSequence> a = FastaReaderHelper.readFastaProteinSequence(new File(fileName));
            for (  Entry<String, ProteinSequence> entry : proteinSequences.entrySet() ) {
                seq= new ProteinSequence(entry.getValue().getSequenceAsString());
                seq.setAccession(entry.getValue().getAccession());
                //System.out.println( "Inside getSequenceFromFile=" + seq );
            //FastaReaderHelper.readFastaDNASequence for DNA sequences
            }
       return seq;

    }
After getting the Profile object I wrote the following code to display the No. of gaps 

List<AlignedSequence<ProteinSequence,AminoAcidCompound>> listOfalSeq=profile.getAlignedSequences();

        AlignedSequence<ProteinSequence,AminoAcidCompound> alSeq;
        int noOfcompounds=0;
        int numOfGaps=0;
        StringBuilder html= new StringBuilder("<html><body><table border=1><tr><td>Accession Id</td><td>Number of gaps</td></tr>");
        for (int i=0; i<listOfalSeq.size(); i++){

            alSeq=listOfalSeq.get(i);
            accessionId=alSeq.getAccession().getID();
            noOfcompounds=alSeq.countCompounds();
            numOfGaps=alSeq.getNumGaps();
            html.append("<tr><td>"); 
            html.append(accessionId);
            html.append("</td><td>"); 
            html.append(numOfGaps); 
            html.append("</td></tr>"); 
            //System.out.println("accessionId==" +accessionId);
            //pSeq=new ProteinSequence(seq.getSequenceAsString(),seq.getCompoundSet());
            //pSeq.setAccession(seq.getAccession());
            //multipleSequenceAlignment.addAlignedSequence(pSeq);

        }
        html.append("</table></body></html>"); 
        setText(html.toString());

setText() method is the method of JEditorPane or JTextPane

Tariq, Phd Scholar

> From: biojava-l-request at lists.open-bio.org
> Subject: Biojava-l Digest, Vol 102, Issue 4
> To: biojava-l at lists.open-bio.org
> Date: Thu, 7 Jul 2011 12:00:04 -0400
> 
> Send Biojava-l mailing list submissions to
> 	biojava-l at lists.open-bio.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.open-bio.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
> 	biojava-l-request at lists.open-bio.org
> 
> You can reach the person managing the list at
> 	biojava-l-owner at lists.open-bio.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
> 
> 
> Today's Topics:
> 
>    1. BioJava Gene Hierarchies (Daniel Di Giulio)
>    2. Page creation in Biojava (Muhammad Tariq Pervez)
>    3. Re: No. of gaps in aligned sequences (Andreas Prlic)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 6 Jul 2011 12:07:07 -0400
> From: Daniel Di Giulio <drd6y at virginia.edu>
> Subject: [Biojava-l] BioJava Gene Hierarchies
> To: biojava-l at lists.open-bio.org
> Message-ID:
> 	<CAEb=YSPftXrqTGYqFueBsGPJnHQTNorP32WaFU91Seso-zRbAA at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hello,
> 
> I'm currently using BioJava to upgrade a eukaryotic gene finder program
> (EVIGAN) to be compatible with the GFF3 formats.  Your BioJava genome
> package is very useful, but I had a question about implementing a sort of
> gene hierarchy from parsed files.  Essentially, I would like to be able to
> read in a GFF3 file of a region of interest, parse out the CDS segments, and
> then create a hierarchy of genes from the attribute tags, which I can then
> employ later in my program.  It seems as if the
> org.biojava3.genome.parsers.gff class is good for this, but there doesn't
> seem to be a data structure for organizing related "Feature" objects into a
> higher grouping based on similar attributes.  Does anyone know of a way to
> implement this, or a package within BioJava which could be useful?
> 
> Thanks a lot,
> Daniel
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 7 Jul 2011 08:48:08 +0000
> From: Muhammad Tariq Pervez <tariq_cp at hotmail.com>
> Subject: [Biojava-l] Page creation in Biojava
> To: <biojava-l at lists.open-bio.org>
> Message-ID: <SNT131-w40215BD2872B570D8D3A7FC410 at phx.gbl>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> 
> Hi, All
> I want to contribute in BioJava CookBook. I have already login id/account. I can create internal/external links. But further what to do. How can i create a page and link to the internal/external link.
> 
> Regards.
>  
> 
> Tariq, Phd Scholar
>  		 	   		  
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 7 Jul 2011 08:10:53 -0700
> From: Andreas Prlic <andreas at sdsc.edu>
> Subject: Re: [Biojava-l] No. of gaps in aligned sequences
> To: Muhammad Tariq Pervez <tariq_cp at hotmail.com>
> Cc: biojava-dev at biojava.org, biojava-l at biojava.org
> Message-ID:
> 	<CALthepw15crxKRkk5sYOdRruMCt_3xBrTdNa4=W7iCVy6KkBvg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi Tariq,
> 
> Can you send us the sample code / DB accession IDs so we can try to
> reproduce this?
> 
> Andreas
> 
> On Wed, Jul 6, 2011 at 4:37 AM, Muhammad Tariq Pervez
> <tariq_cp at hotmail.com> wrote:
> >
> >
> > Hi, Dear all,
> > I am working on the development of MSA application using BioJava. I want to make clear a thing. It is that when two or more protein sequences are aligned the '-' is shown more times in an aligned sequence than the gaps display by the method of alSeq.getNumGaps(). 'alSeq' is an aligned sequence. For example, if there are actual 50 '-' in an aligned sequence but the method shows it only 30. What is the difference between these two results.
> >
> > Best Regards
> >
> >
> > Tariq, Phd Scholar
> >
> > _______________________________________________
> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 
> End of Biojava-l Digest, Vol 102, Issue 4
> *****************************************