[Bioperl-l] modify sequence names
yang liu
yang.liu0508 at gmail.com
Sat May 19 14:34:04 UTC 2012
Dear colleagues,
Would anyone please help me to modify sequence names with bioperl? I am
editing them manually now, is there a easier way?
I have a bunch of sequences in the format:
>lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome c
oxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
>lcl|NC_017840.1_cdsid_YP_006280920.1 [gene=ccmFn] [protein=cytochrome c
biogenesis FN] [protein_id=YP_006280920.1] [location=2225..3940]
ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC
I hope to keep only the gene name, which means the word behind "gene=",
like:
>cox1
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
>ccmFn
ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC
Any help would be appreciated. Thanks,
Yang.
More information about the Bioperl-l
mailing list