Skip to content Skip to sidebar Skip to footer

Recursively Loop Through The Dom Tree And Remove Unwanted Tags?

$tags = array( 'applet' => 1, 'script' => 1 ); $html = file_get_contents('test.html'); $dom = new DOMdocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($

Solution 1:

Had you considered HTML Purifier? starting with your own html sanitizing is just re-inventing the wheel, and isn't easy to accomplish.

Furthermore, a blacklist approach is also bad, see SO/why-use-a-whitelist-for-html-sanitizing

You may also be interested in reading how to cinfigure allowed tags & attributes or testing HTML Purifier demo

Solution 2:

$tags = array(
    "applet" => 1,  
    "script" => 1
);

$html = file_get_contents("test.html");
$dom = new DOMdocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

for($i=0; $i<count($tags); ++$i) {
   $list = $xpath->query("//".$tags[$i]);
   for($j=0; $j<$list->length; ++$j) {
      $node = $list->item($j);
      if ($node == null) continue;
      $node->parentNode->removeChild($node);
   }
}

$string = $dom->saveXML();

Something like that.

Post a Comment for "Recursively Loop Through The Dom Tree And Remove Unwanted Tags?"