[Biojava-l] No. of gaps in aligned sequences

Muhammad Tariq Pervez tariq_cp at hotmail.com
Fri Jul 8 05:09:51 UTC 2011


The code is as follows: Actually the code is taken from BioJavaCookbook 
with a little modification. The following method is called from another 
class. The method takes the names of the files or simply say files as an
 argument in the form of list.



public void MSAFromFiles(List<String> ids) throws Exception{

        List<ProteinSequence> lst = new ArrayList<ProteinSequence>();

         ProteinSequence pSeq=null;

        for (String id : ids) {

            pSeq=getSequenceFromFiles(id);

            lst.add(pSeq);

            //System.out.println("seq==" +pSeq);

        }

        profile = Alignments.getMultipleSequenceAlignment(lst);

    }


getSequenceFromFiles() method is given below



private ProteinSequence getSequenceFromFiles(String inputFile) throws Exception{

        ProteinSequence seq=null;

        //System.out.println("inputFile==="+inputFile);

         FileInputStream is = new FileInputStream(inputFile);



            FastaReader<ProteinSequence, AminoAcidCompound> 
fastaReader = new FastaReader<ProteinSequence, 
AminoAcidCompound>(is, new 
GenericFastaHeaderParser<ProteinSequence,AminoAcidCompound>(), new
 
ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));

            LinkedHashMap<String,ProteinSequence> proteinSequences = fastaReader.process();

            is.close();

             //System.out.println( "proteinSequences=" + proteinSequences );

            //LinkedHashMap<String, ProteinSequence> a = FastaReaderHelper.readFastaProteinSequence(new File(fileName));

            for (  Entry<String, ProteinSequence> entry : proteinSequences.entrySet() ) {

                seq= new ProteinSequence(entry.getValue().getSequenceAsString());

                seq.setAccession(entry.getValue().getAccession());

                //System.out.println( "Inside getSequenceFromFile=" + seq );

            //FastaReaderHelper.readFastaDNASequence for DNA sequences

            }

       return seq;

           

    }

After getting the Profile object I wrote the following code to display the No. of gaps 



List<AlignedSequence<ProteinSequence,AminoAcidCompound>> listOfalSeq=profile.getAlignedSequences();

      

        AlignedSequence<ProteinSequence,AminoAcidCompound> alSeq;

        int noOfcompounds=0;

        int numOfGaps=0;

        StringBuilder html= new 
StringBuilder("<html><body><table 
border=1><tr><td>Accession Id</td><td>Number 
of gaps</td></tr>");

        for (int i=0; i<listOfalSeq.size(); i++){

      

            alSeq=listOfalSeq.get(i);

            accessionId=alSeq.getAccession().getID();

            noOfcompounds=alSeq.countCompounds();

            numOfGaps=alSeq.getNumGaps();

            html.append("<tr><td>"); 

            html.append(accessionId);

            html.append("</td><td>"); 

            html.append(numOfGaps); 

            html.append("</td></tr>"); 

            //System.out.println("accessionId==" +accessionId);

            //pSeq=new ProteinSequence(seq.getSequenceAsString(),seq.getCompoundSet());

            //pSeq.setAccession(seq.getAccession());

            //multipleSequenceAlignment.addAlignedSequence(pSeq);

               

        }

        html.append("</table></body></html>"); 

        setText(html.toString());



setText() method is the method of JEditorPane or JTextPane


Tariq, Phd Scholar

Muhammad Tariq Pervez

Assistant Professor,
Department of Computer Science
Virtual University of Pakistan, Lahore
Tel: (042) 9203114-7  
URL: www.vu.edu.pk
Mobile: +923364120541, +923214602694


> Date: Thu, 7 Jul 2011 08:10:53 -0700
> Subject: Re: [Biojava-l] No. of gaps in aligned sequences
> From: andreas at sdsc.edu
> To: tariq_cp at hotmail.com
> CC: biojava-l at biojava.org; biojava-dev at biojava.org
> 
> Hi Tariq,
> 
> Can you send us the sample code / DB accession IDs so we can try to
> reproduce this?
> 
> Andreas
> 
> On Wed, Jul 6, 2011 at 4:37 AM, Muhammad Tariq Pervez
> <tariq_cp at hotmail.com> wrote:
> >
> >
> > Hi, Dear all,
> > I am working on the development of MSA application using BioJava. I want to make clear a thing. It is that when two or more protein sequences are aligned the '-' is shown more times in an aligned sequence than the gaps display by the method of alSeq.getNumGaps(). 'alSeq' is an aligned sequence. For example, if there are actual 50 '-' in an aligned sequence but the method shows it only 30. What is the difference between these two results.
> >
> > Best Regards
> >
> >
> > Tariq, Phd Scholar
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
 		 	   		  



More information about the Biojava-l mailing list