[Bioperl-l] AlignIO problems
Chris Fields
cjfields at uiuc.edu
Sun Feb 25 19:58:23 UTC 2007
Bio::AlignIO::clustalw doesn't work with masked sequences; it parses
the output quite literally as is, so any [.-] are treated as gaps.
If the seqs are 100% identical then you will have a seq with 100%
gaps and no sequence, thus giving you the warnings you see.
The best way to accomplish what you want is to not mask the sequence
alignment to begin with when running clustalw/muscle/whatever.
Exactly how are you generating these? When I use clustalw no
identity masking occurs by default.
chris
On Feb 25, 2007, at 7:28 AM, 江 文恺 wrote:
> hi, all,
> I use the AlignIO module to convert the alignment file.
> my original file is :
> CLUSTAL W(1.81) multiple sequence alignment
>
>
> dana_GLEANR_11249
> MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
> dere_GLEANR_7213 ...V...................I.......................
> .............
> dgri_GLEANR_6962 .......................I.......................
> .............
> FBgn0004638 .......................I.......................
> .............
> dmoj_GLEANR_6118 ...........N...........I.......................
> .............
> dper_GLEANR_18885 ...V...................I.......................
> .............
> dpse_GLEANR_14384 ...V...................I.......................
> .............
> dsec_GLEANR_3096 .................N.....I.......................
> .............
> dsim_GLEANR_9744
> -----------------------------...............................
> dvir_GLEANR_4811 .......................I.......................
> .............
> dwil_GLEANR_10869 .......................I.......................
> .............
> dyak_GLEANR_13576 .......................I.......................
> .............
>
>
>
> dana_GLEANR_11249
> YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
> dere_GLEANR_7213 ...............................................
> .............
> dgri_GLEANR_6962 ...............................................
> .............
> FBgn0004638 ...............................................
> .............
> dmoj_GLEANR_6118 .................L.............................
> .............
> dper_GLEANR_18885 ...............................................
> .............
> dpse_GLEANR_14384 ...............................................
> .............
> dsec_GLEANR_3096 ...............................................
> .............
> dsim_GLEANR_9744 ...............................................
> .............
> dvir_GLEANR_4811 ...............................................
> .............
> dwil_GLEANR_10869 ...............................................
> .............
> dyak_GLEANR_13576 ...............................................
> .............
>
>
>
> dana_GLEANR_11249
> VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
> dere_GLEANR_7213 ...............................................
> .............
> dgri_GLEANR_6962 ...............................................
> .............
> FBgn0004638 ...............................................
> .............
> dmoj_GLEANR_6118 ..............................V.D..............
> .............
> dper_GLEANR_18885 .......................E.......................
> .............
> dpse_GLEANR_14384 .......................E.......................
> .............
> dsec_GLEANR_3096 ...............................................
> .............
> dsim_GLEANR_9744 ...............................................
> .............
> dvir_GLEANR_4811 ...............................................
> .............
> dwil_GLEANR_10869 ...............................................
> .............
> dyak_GLEANR_13576 ...............................................
> .............
>
>
>
> dana_GLEANR_11249 VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
> dere_GLEANR_7213 ...............................
> dgri_GLEANR_6962 ...............................
> FBgn0004638 ...............................
> dmoj_GLEANR_6118 ............Q..................
> dper_GLEANR_18885 ...............................
> dpse_GLEANR_14384 ...............................
> dsec_GLEANR_3096 ...............................
> dsim_GLEANR_9744 ...............................
> dvir_GLEANR_4811 ...............................
> dwil_GLEANR_10869 ...............................
> dyak_GLEANR_13576 ...............................
>
>
> I want to change those "." characters back to alphabetic
> expression, then i write the code like this:
> use Bio::AlignIO;
> my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
> -format => 'clustalw');
> my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
> -format =>'clustalw');
> while (my $aln=$in->next_aln() ){
> $aln->unmatch();
> $aln->set_displayname_flat();
> $out->write_aln($aln);
> }
>
> but when i run the code, there are error message like:
>
> -------------------- WARNING ---------------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: No sequence with name [dsim_GLEANR_9744/1-182]
> STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/
> Bio/SimpleAlign.pm:2307
> STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/
> bioperl-live/Bio/SimpleAlign.pm:2374
> STACK toplevel aligntest.pl:11
>
> --------------------------------------
>
> I don't know where is the problem.
>
> Jiang
>
> _________________________________________________________________
> 免费下载 MSN Explorer: http://explorer.msn.com/lccn/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list