[Bioperl-l] alignIO::fasta bug

Jason Stajich jason at bioperl.org
Thu Jan 22 21:33:15 UTC 2009


FYI - it appears that if the last sequence in a FASTA MSA is all gaps  
(which happens in some whole genome synteny+multiple alignment  
chunking like Mercator) no alignment is returned.

yuck. It basically comes down to this bit of code where $end would  
equal sequence length.

	#  If $end <= 0, we have either reached the end of
	#  file in <> or we have encountered some other error
	if ( $end <= 0 &&  ) {
		undef $aln;
		return $aln;
	}

This start/end requirement of locatable seq is nice but kind of a pain  
where I am managing the map of sequences outside of alignment chunk.

Why not just check to see that the number of seq characters is 0 - an  
all-gapped sequence as the last sequence of the file should still be  
legal.

Instead:
	if ( length($seqchar) == 0 ) {
		undef $aln;
		return $aln;
	}

Although that would invalidate an empty alignment like this -- do we  
want to still permit these?
 >A

 >B

 >C



Also I've locally implemented possibility of parsing start/end from  
the header line that is part of Mercator output and I think a variant  
of UCSC headers that
look like this for MFA.
 >Cp scaffold_1.3086:644980-660265+

I can commit this to main-trunk so as to not interfere with branch  
release.  Do any of the LocatableSeq/AlignIO::Fasta maintainers want  
to comment?

-jason



More information about the Bioperl-l mailing list