[Bioperl-l] how to rename genbank header in fasta file?
Jason Stajich
jason.stajich at gmail.com
Sun Oct 21 00:15:01 UTC 2012
> perl -i -p -e 's/>.+\[gene=([^\]]+)\].+/>$1/' file.fa
should have been -e not -s in my example.
you can name the file whatever you want just replace that part in the command above-- it sounds like you are really new to Perl in general so I would recommend some basic books first if you are this new to programming and running scripts - try Unix and Perl to the Rescue at http://unixandperl.com
Jason
On Oct 20, 2012, at 5:56 PM, yang liu <yang.liu0508 at gmail.com> wrote:
> Hello Jason,
>
> Thanks for your help. I tried the script, it returned:
> Can't open perl script "s/>.+\[gene=([^\]]+)\].+/>$1/": No such file or directory
>
> Don't know why.
>
> I named the fasta file as file.fa
>
> Yang.
>
> On Sat, Oct 20, 2012 at 1:43 AM, Jason Stajich <jason.stajich at gmail.com> wrote:
> are you parsing exactly this file - it is in FASTA format not genbank.
>
> You don't need bioperl for this:
> perl -i -p -s 's/>.+\[gene=([^\]]+)\].+/>$1/' file.fa
>
> I'd read up on regular expressions and perl to learn more about how to do string replacement to learn how to do this better.
>
>
> On Oct 19, 2012, at 11:23 PM, yang liu <yang.liu0508 at gmail.com> wrote:
>
>> Hello,
>>
>> I am a new user of BioPerl, can anyone help with this? I have multiple
>> sequences in a fasta file like the following,
>>
>>> lcl|NC_014487.1_cdsid_YP_003875479.1 [gene=cox1] [protein=cytochrome c
>> oxidase subunit 1] [protein_id=YP_003875479.1] [location=1..1575]
>> ATGACAAATCTGATTCGATGGCTCTTCTCTACTAATCACAAGGATATAGGGACTCTCTATTTCATCTTCG
>> GCGCCATTGCTGGAGTGATGGGCACATGCTTTTCAGTACTGATTCGTATGGAATTAGCACGCCCCGGCGA
>>> lcl|NC_014487.1_cdsid_YP_003875480.1 [gene=cox3] [protein=cytochrome c
>> oxidase subunit 3] [protein_id=YP_003875480.1]
>> [location=complement(13218..14015)]
>> ATGATTGAATCTCAACGGCATTCTTTTCATTTGGTAGATCCAAGTCCATGGCCTATTTCGGGTTCACTCG
>> GAGCTTTGGCAACCACCGTAGGAGGTGTGATGTACATGCACTCATTTCAAGGGGGTGCAACACTTCTCAG
>>
>>> lcl|NC_014487.1_cdsid_YP_003875481.1 [gene=atp8] [protein=ATPase subunit
>> 8] [protein_id=YP_003875481.1] [location=complement(15042..15548)]
>> ATGCCTCAACTGGATAAATTTACTTATTTCACACAATTCTTCTGGTCATGCCTTTTTTTCTTTACTTTCT
>> ATATTCTAATATGCAATGATAGAGATGGAGTACTTGGGATCAGCAGAATTCTAAAACTACGAAATCAACT
>>
>> I hope to rename the sequences by gene name,such as:
>>
>>> cox1
>> ATGACAAATCTGATTCGATGGCTCTTCTCTACTAATCACAAGGATATAGGGACTCTCTATTTCATCTTCG
>> GCGCCATTGCTGGAGTGATGGGCACATGCTTTTCAGTACTGATTCGTATGGAATTAGCACGCCCCGGCGA
>>> cox3
>> ATGATTGAATCTCAACGGCATTCTTTTCATTTGGTAGATCCAAGTCCATGGCCTATTTCGGGTTCACTCG
>> GAGCTTTGGCAACCACCGTAGGAGGTGTGATGTACATGCACTCATTTCAAGGGGGTGCAACACTTCTCAG
>>
>> any one can help? Thanks.
>>
>> Yang.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
>
>
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
More information about the Bioperl-l
mailing list