Suppose you want to replace bits of a string. For example, 'us' with 'them'.
$_='Us ? The bus usually waits for us, unless the driver forgets us.'; print "$_\n"; s/Us/them/; # operates on $_, otherwise you need $foo=~s/Us/them/; print "$_\n";
What happens here is that the string 'Us' is searched for, and when a match is found it is replaced with the right side of the expression, in this case 'them'. Simple.
You'll notice that only one substitution was made. To match globally use
/g which runs through the entire string, changing
wherever it can. Try:
which fails. This is because regexes are not, by default, case-sensitive. So:
would be a better bet. Now, everything is changed. A little too
much, but one problem at a time. Everything you have learn about regex so far
can be used with
s/// , like parens,
[ ] , greedy and
stingy matching and much more. Deleting things is easy too. Just specify
nothing as the replacement character, like so
So we can use some of that knowledge to fix this problem. We need to make sure that a space precedes the 'us'. What about:
An small improvement. The first 'Us' is now no longer changed, but one problem at a time ! We'll first consider the problem of the regex changing 'usually' and other words with 'us' in them.
What we are looking for is a space, then 'us', then a comma, period or space. We know how to specify one of a number of options - the character class.
s/ us[. ,]/them/g;
Another tiny step. Unfortunately, that step wasn't really in the right direction, more on the slippery slope to Poor Programming Practice. Why? Because we are limiting ourselves. Suppose someone wrote ' send it to us; when we get it'.
You can't think of all the possible permutations. It is often easier, and safer, to simply state what must not follow the match. In this case, it can be anything except a letter. We can define that as a-z. So we can add that to the regex.
s/ us[^a-z]/ them/g;
^ negates the
character class, and
every alphabet from a to z inclusive. A space has been added to the
substitution part - as the original space was matched, it should be replaced
to maintain readability.