[Bioperl-l] extract sequences and save into files by genes
Cook, Malcolm
MEC at stowers.org
Mon Feb 27 15:47:51 UTC 2012
You don't need bioperl for this one.....
The following perl one liner will do it for you.
perl -p -e 'if (1==$.) {($species = $ARGV) =~ s|\.txt||}; if (s/^>(.*)/">${species}"/e) {$gene=$1; open($O{$gene},qq{>> ${gene}.txt}); select($O{$gene})} ; close ARGV if eof' *.txt
~Malcolm
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of yang liu
> Sent: Saturday, February 25, 2012 12:52 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] extract sequences and save into files by genes
>
> Dear colleagues,
>
> I have multiple files named by species name. Each file has ca. 100
> different genes. I want to extract the sequences and save them by gene.
> In the output file, the gene name would be the species name. How should I
> do?
>
> The input file would be like this (with the file name, Acidosasa.txt,
> Acorus.txt....)
>
> >rps12
> ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT
> AGAAAATCGCCCGCGC
> TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC
> CAAAAAACCCAACTC
> >psbA
> TTATCCATTAAGAGATGGAACTTCAAGAACAGCTAGGTCTAGAGGGAAGTTGTG
> AGCATTACGTTCGTGC
> ATTACCTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAATAAC
> GCGACCTTGGCTAT
> .....
>
> I hope the output file to be like this, file name = rps12.txt, psbA.txt....
>
> within rps12.txt, the sequence is like,
>
> >Acidosasa
>
> ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT
> AGAAAATCGCCCGCGC
> TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC
> CAAAAAACCCAACTC
>
>
>
>
>
> >Acorus
> ATGCCAACTATTAAACAACTTATTAGAAACACAAGACAGCCAATCCGAAATGTC
>
> I do not know if I expressed clearly.
>
> Thanks.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list