[Bioperl-l] Bioperl-l Digest, Vol 114, Issue 11 regex in message 2
Tom Keller
kellert at ohsu.edu
Sat Oct 20 17:16:15 UTC 2012
The regex is clever and shows the power of regular expressions and perl. Basically, the capturing parens include a negation so it says "after gene=" save any characters except ']' until the next ']', which is exactly what you said you wanted.
But I think there is a typo: the -s should be -e
thanks for the nice help Jason.
Tom
OHSU, Portland OR
On Oct 20, 2012, at 9:00 AM, <bioperl-l-request at lists.open-bio.org<mailto:bioperl-l-request at lists.open-bio.org>> wrote:
Message: 2
Date: Fri, 19 Oct 2012 23:43:29 -0600
From: Jason Stajich <jason.stajich at gmail.com<mailto:jason.stajich at gmail.com>>
Subject: Re: [Bioperl-l] how to rename genbank header in fasta file?
To: yang liu <yang.liu0508 at gmail.com<mailto:yang.liu0508 at gmail.com>>
Cc: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
Message-ID: <5611663A-0073-4D26-9DDF-D01BAFDCDC5D at gmail.com<mailto:D01BAFDCDC5D at gmail.com>>
Content-Type: text/plain; charset=us-ascii
are you parsing exactly this file - it is in FASTA format not genbank.
You don't need bioperl for this:
perl -i -p -s 's/>.+\[gene=([^\]]+)\].+/>$1/' file.fa
I'd read up on regular expressions and perl to learn more about how to do string replacement to learn how to do this better.
More information about the Bioperl-l
mailing list