Re-using the match -- \1, $1...
Suppose we didn't know what HTML tag we had to match ? It could be B, I, EM or whatever, and we want everything that is in between. Well, HTML container tags like B and EM have end tags which are the same as the start tag, except for the / . So what we could do is:
- find out what is inside < >
- search for exactly the same tag, but with the closing /
- return whatever is in between.
Can this be done ? Of course. This is perl, all things are possible. Now, remember the
side effect of parens. I promise I'll explain the primary effect at some point. If
whatever is in (parens) matches, the result is stored in a variable called $1 . So we can use <(.*?)>
which will find us < then as
many anythings (the . and *
) up to the next, not last > (the
? forces stingy matching).
The result is stored in $1 because we used
parens. Next, we need everything up to the closing tag. That's easy : (.*?) matches everything up until the next character
or set of characters. And how exactly do we define where to stop ?
We can use $1 even in the same regex it was
found in. However, it is not referred to within a regex as $1 ,
but \1 .
So we want to match </$1> which in perl
code is <\/\1> . The /
must be escaped because it is the end of the regex, and 1 is escaped so it refers to $1
instead of matching the number 1.
Still here ? This is what it looks like:
$_='HTML <I>munging</I> time is here <I>again</I> !.'; /<(.*?)>(.*?)<\/\1>/i; print "Found it ! $2\n";
If you want to know how to return all the matches above, read on. But before that: