Robert's Perl Tutorial

http://www.sthomas.net/roberts-perl-tutorial.htm


Basic changes

Suppose you want to replace bits of a string. For example, 'us' with 'them'.

$_='Us ? The bus usually waits for us, unless the driver forgets us.';

print "$_\n";

s/Us/them/;   # operates on $_, otherwise you need $foo=~s/Us/them/;

print "$_\n";

What happens here is that the string 'Us' is searched for, and when a match is found it is replaced with the right side of the expression, in this case 'them'. Simple.

You'll notice that only one substitution was made. To match globally use /g which runs through the entire string, changing wherever it can. Try:

s/Us/them/g;

which fails. This is because regexes are not, by default, case-sensitive. So:

s/us/them/ig;

would be a better bet. Now, everything is changed. A little too much, but one problem at a time. Everything you have learn about regex so far can be used with s/// , like parens, character classes [ ] , greedy and stingy matching and much more. Deleting things is easy too. Just specify nothing as the replacement character, like so s/Us//; .

So we can use some of that knowledge to fix this problem. We need to make sure that a space precedes the 'us'. What about:

s/ us/them/g;

An small improvement. The first 'Us' is now no longer changed, but one problem at a time ! We'll first consider the problem of the regex changing 'usually' and other words with 'us' in them.

What we are looking for is a space, then 'us', then a comma, period or space. We know how to specify one of a number of options - the character class.

s/ us[. ,]/them/g;

Another tiny step. Unfortunately, that step wasn't really in the right direction, more on the slippery slope to Poor Programming Practice. Why? Because we are limiting ourselves. Suppose someone wrote ' send it to us; when we get it'.

You can't think of all the possible permutations. It is often easier, and safer, to simply state what must not follow the match. In this case, it can be anything except a letter. We can define that as a-z. So we can add that to the regex.

s/ us[^a-z]/ them/g;

the caret ^ negates the character class, and a-z represents every alphabet from a to z inclusive. A space has been added to the substitution part - as the original space was matched, it should be replaced to maintain readability.