Table of Contents

Re-using the match -- \1, $1...

Suppose we didn't know what HTML tag we had to match ? It could be B, I, EM or whatever, and we want everything that is in between. Well, HTML container tags like B and EM have end tags which are the same as the start tag, except for the / . So what we could do is:

Can this be done ? Of course. This is perl, all things are possible. Now, remember the side effect of parens. I promise I'll explain the primary effect at some point. If whatever is in (parens) matches, the result is stored in a variable called $1 . So we can use <(.*?)> which will find us < then as many anythings (the . and * ) up to the next, not last > (the ? forces stingy matching).

The result is stored in $1 because we used parens. Next, we need everything up to the closing tag. That's easy : (.*?) matches everything up until the next character or set of characters. And how exactly do we define where to stop ?

We can use $1 even in the same regex it was found in. However, it is not referred to within a regex as $1 , but \1 .

So we want to match </$1> which in perl code is <\/\1> . The / must be escaped because it is the end of the regex, and 1 is escaped so it refers to $1 instead of matching the number 1.

Still here ? This is what it looks like:

$_='HTML <I>munging</I> time is here <I>again</I> !.';
/<(.*?)>(.*?)<\/\1>/i;

print "Found it ! $2\n";

If you want to know how to return all the matches above, read on. But before that: