[Bioperl-l] Bio::SimpleAlign problems

Brian Osborne brian_osborne at cognia.com
Tue Sep 9 08:02:41 EDT 2003


Peter,

AlignIO will attempt to guess the format based on the file suffix, here are
the rules, from Bio/AlignIO.pm:

  return 'fasta'   if /\.(fasta|fast|seq|fa|fsa|nt|aa)$/i;
  return 'msf'     if /\.(msf|pileup|gcg)$/i;
  return 'pfam'    if /\.(pfam|pfm)$/i;
  return 'selex'   if /\.(selex|slx|selx|slex|sx)$/i;
  return 'phylip'  if /\.(phylip|phlp|phyl|phy|phy|ph)$/i;
  return 'nexus'   if /\.(nexus|nex)$/i;
  return 'mega'     if( /\.(meg|mega)$/i );
  return 'clustalw' if( /\.aln$/i );
  return 'meme'     if( /\.meme$/i );
  return 'emboss'   if( /\.(water|needle)$/i );
  return 'psi'      if( /\.psi$/i );

If you suspect that AlignIO isn't parsing your alignment files correctly
then you may want to compare them to files that it certainly can parse.
These files are in the t/ directory and they're used by AlignIO.t, so
they're parseable:

  data/testaln.fasta
data/testaln.pfam
  data/testaln.mase
data/testaln.phylip
data/testaln.prodom
   data/testaln.msf
data/testaln.selex
   data/testaln.nexus

That's odd that there's no clustalw file there, I will add a test for it in
AlignIO.t.

Brian O.


-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Peter Stogios
Sent: Monday, September 08, 2003 3:22 PM
To: bioperl-l at portal.open-bio.org
Subject: [Bioperl-l] Bio::SimpleAlign problems

Hello,

I would like some help with Bio::SimpleAlign and Bio::AlignIO.

I am trying to do some VERY simple tasks but the AlignIO module is being
difficult in reading alignment files.  It does not seem to read many
formats correctly.

I am using the sample code included at the SimpleAlign documentation page.
The code of interest is:

$str = Bio::AlignIO->new('-file' => 'testaln.aln');
$aln = $str->next_aln();
print $aln->no_residues, "\n";
print $aln->no_sequences, "\n";

I have tried loading ClustalX format version 1.81, ClustalW version 1.5,
MSF format, and PHYLIP format, without having success.  I am sure the
alignment files are in their correct formats, since other programs can
read them.

Can someone please inform me what is the preferred format for reading by
AlignIO and SimpleAlign?  Also, should I specify the format of the
alignment in the Bio::AlignIO-->new line?

Thank you very much in advance,

Peter Stogios

--
____________________________________________________________
Peter Stogios               |  Ontario Cancer Institute
Graduate Student            |  Princess Margaret Hospital
G. Prive Lab                |  610 University Ave. Rm.7-207
Dept. of Medical Biophysics |  M5G 2M9
University of Toronto       |  (416) 946-2000 ex. 5615

pstogios at uhnres.utoronto.ca
http://xtal.uhnres.utoronto.ca/prive
____________________________________________________________

_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list