[Bioperl-l] modify sequence names
Adam Sjøgren
asjo at koldfront.dk
Sat May 19 15:13:03 UTC 2012
On Sat, 19 May 2012 10:34:04 -0400, yang wrote:
> Would anyone please help me to modify sequence names with bioperl? I am
> editing them manually now, is there a easier way?
You don't need BioPerl specifically to do simple text manipulation.
>> lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome
>> coxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
[... to ...]
>> cox1
Maybe you can use something like:
$ sed 's/^>.*\[gene=\([^]]*\)\].*$/\1/g'
>lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome coxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
cox1
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
$
If you need to use Perl rather than sed, you can use:
$ perl -pe 's/^>.*\[gene=([^]]+).*$/>$1/'
instead.
The easiest way is probably to learn a little programming and/or regular
expressions.
Learning Perl by Randal L. Schwartz, brian d foy, and Tom Phoenix could
be a starting point, so could many online tutorials.
Best regards,
Adam
--
"Hur långt man än har kommit Adam Sjøgren
är det alltid längre kvar" asjo at koldfront.dk
More information about the Bioperl-l
mailing list