How to keep only some text with a regular expression, while removing complete lines


Regular expressions are extremely powerful in replacing one text with another. If however you want to keep some text, while removing all other text, including removing complete lines, things become a bit less obvious.

Here is a regular expression that does the tric:

(?:^.*(the text that should be kept).*$)|^.*\r?\n

The text that needs to be kept is available as the (first) replacement string (typically \1).

Here is an example; the text to be processed is:

<table>
<tr>
<td>row1</td>
<td>row2</td>
</tr>
</table>

Using the regular expression:

(?:^.*<td>(.*)</td>.*$)|^.*\r?\n

with \1 as the replacement string will result in:

row1
row2

Notice that there are many implementation of regular expression using slightly different implementations. I use this type of regular expressions frequently with EditPadPro.