[Bioperl-l] Is it possible to do contig alignments?
Chris Fields
cjfields at uiuc.edu
Fri Aug 24 17:16:10 UTC 2007
On Aug 24, 2007, at 11:07 AM, Florent Angly wrote:
...
> De-Jian,ZHAO wrote:
>> How do you pad the sequences with gaps manually? Just replace the
>> hyphens with blanks? If yes, you can program in perl to automate
>> this process.
>>
> How do I pad the sequences manually?? I calculate how many gaps
> have to
> go left and right of the aligned sequence based on its length, its
> position in the aligned consensus and the consensus length.
> my $newseq = '-' x $leftnum . $seq . '-'x$rightnum
> By the way, the sequences cannot be stored with blanks in them...
>
> I think the best way to provide an out-of-the-box solution for
> displaying contigs the described way would be to _not_ use
> Bio::Align at
> all, but rather to create a new assembly IO module like
> Bio::Assembly::IO::simpleout for example. That would be useful.
>
> The reason I wanted to visualize these contigs is because I made a
> Bio::Assembly::IO module for TIGR Assembler files that I intend on
> submitting to BioPerl. I wanted to make sure first that I did not have
> any obvious bug in my contig coordinates. I've read the
> documentation on
> the Wiki so if a BioPerl developer would please like lo step up and
> contact me directly for checking my code, that would be nice =)
>
> Florent
A similar question has been previously asked on the same subject:
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/2827/focus=2869
Jason's suggestion was to have a Bio::Assembly::Contig method get_aln
() which produces a Bio::SimpleAlign object containing appropriately
padded seqs compatible for AlignIO output. However, the method was
never implemented.
Personally, the way I would try going about this would be to
implement the Contig::get_aln() method, padding with bioperl-
compliant alignment gap symbols (currently -.*?=~), so if anyone
wanted they could write to any AlignIO-implemented format (MSF,
Clustal, etc). In your Bio::Assembly::IO::simpleout module implement
write_assembly() and use the Contig::get_aln() method where needed to
grab the SimpleAlign, then simply substitute gap symbols with spaces
when writing contig output.
In general, any new code is attached to a bugzilla report as an
enhancement request:
http://bugzilla.open-bio.org/
One of the devs will work on getting the code incorporated into
bioperl. Make sure the code is documented (http://www.bioperl.org/
wiki/Advanced_BioPerl), and attach appropriate tests (http://
www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests) and test data.
chris
More information about the Bioperl-l
mailing list