[Bioperl-l] modify sequence names

Florent Angly florent.angly at gmail.com
Sun May 20 23:41:39 UTC 2012


Hi Yang,
If you'd rather learn Bioperl and use it to solve your problem, start here:
http://www.bioperl.org/wiki/HOWTO:Beginners
Florent


On 20/05/12 00:34, yang liu wrote:
> Dear colleagues,
>
> Would anyone please help me to modify sequence names with bioperl? I am
> editing them manually now, is there a easier way?
> I have a bunch of sequences in the format:
>
>> lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome c
> oxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
> ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
> GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
> TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
>
>> lcl|NC_017840.1_cdsid_YP_006280920.1 [gene=ccmFn] [protein=cytochrome c
> biogenesis FN] [protein_id=YP_006280920.1] [location=2225..3940]
> ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
> AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
> CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC
>
> I hope to keep only the gene name, which means the word behind "gene=",
> like:
>> cox1
> ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
> GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
> TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
>
>> ccmFn
> ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
> AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
> CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC
>
> Any help would be appreciated. Thanks,
>
> Yang.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list