Skip to content Skip to sidebar Skip to footer

Bs4 Breaks Html Trying To Repair It

BS4 corrects faulty html. Usually this is not a problem. I tried parsing, altering and saving the html of this page: ulisses-regelwiki.de/index.php/sonderfertigkeiten.html In this

Solution 1:

Try this lib.

from simplified_scrapy import SimplifiedDoc

html = '''
<!DOCTYPE html><center>
Some Test content
<!-- A comment --><center>
'''
doc = SimplifiedDoc(html)
print (doc.html)

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Post a Comment for "Bs4 Breaks Html Trying To Repair It"