Remove defined HTML tags including content from a string with PHP

Remove defined HTML tags including content from a string with PHP

Recently, a friend asked me to send a script to remove pre tags from a string, including content. He uses a script to calculate the reading time for the content, but doesn't want the pre-tags to be included in the scoring. So if you ever have a similar problem, this post might help you.

Determine tag

In our example, we determine the tag and say that all pre-tags are affected. That is, our regular expression looks like this:

/<pre[^>]*>([\s\S]*?)<\/pre[^>]*>/m

If it should only affect h1 headings, it looks like this:

/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/m

And if all links are to be filtered out, it looks like this:

/<a[^>]*>([\s\S]*?)<\/a[^>]*>/m

You can use the regular expression for pretty much any tag.

Replace content

Now we want to replace the tags including content with an empty string, that is, so that they are no longer present in the string. This could look like this:

$regex = '/<pre[^>]*>([\s\S]*?)<\/pre[^>]*>/m';
$string = 'My long text with <pre>some code</pre> and so on.'
$string = preg_replace($regex, '', $string);

Now all pre tags are replaced with an empty string. From this, you can also build a function that is quite flexible:

function pxbt_strip_tag(string $tag = 'pre', string $string) {
    $regex = '/<' . $tag . '[^>]*>([\s\S]*?)<\/' . $tag . '[^>]*>/m';
    return preg_replace($regex, '', $string);
}

You can use this function as often as you like. You can find examples here:

$string = 'My string';

// remove h1
$string = pxbt_strip_tag('h1', $string);

// remove p
$string = pxbt_strip_tag('p', $string);

// remove pre
$string = pxbt_strip_tag('pre', $string);

Did you find this article valuable?

Support Kevin Pliester by becoming a sponsor. Any amount is appreciated!