Re-using the match -- \1, $1...
Suppose we didn't know what HTML tag we had to match ? It could be B, I, EM or whatever, and we want everything that is in between. Well, HTML container tags like B and EM have end tags which are the same as the start tag, except for the / . So what we could do is:
- find out what is inside < >
- search for exactly the same tag, but with the closing /
- return whatever is in between.
Can this be done ? Of course. This is perl, all things are possible. Now, remember the
side effect of parens. I promise I'll explain the primary effect at some point. If
whatever is in (parens) matches, the result is stored in a variable called
$1 . So we can use
which will find us
< then as
many anythings (the
) up to the next, not last
? forces stingy matching).
The result is stored in
$1 because we used
parens. Next, we need everything up to the closing tag. That's easy :
(.*?) matches everything up until the next character
or set of characters. And how exactly do we define where to stop ?
We can use
$1 even in the same regex it was
found in. However, it is not referred to within a regex as
So we want to match
</$1> which in perl
<\/\1> . The
must be escaped because it is the end of the regex, and
1 is escaped so it refers to
instead of matching the number 1.
Still here ? This is what it looks like:
$_='HTML <I>munging</I> time is here <I>again</I> !.'; /<(.*?)>(.*?)<\/\1>/i; print "Found it ! $2\n";
If you want to know how to return all the matches above, read on. But before that: