Member-only story
Web Scraping with Beautiful Soup — Encoding
3 min readJan 31, 2021
We can get data from web pages with Beautiful Soup.
It lets us parse the DOM and extract the data we want.
In this article, we’ll look at how to scrape HTML documents with Beautiful Soup.
Output Formatters
We can format our output with Beautiful Soup.
For example, we can write:
from bs4 import BeautifulSoup
french = "<p>Il a dit <<Sacré bleu!>></p>"
soup = BeautifulSoup(french, 'html.parser')
print(soup.prettify(formatter="html"))
to set the formatter to the one we want when we call prettify
.
Also we can use the html5
formatter,
For example, we can write:
from bs4 import BeautifulSoup
br = BeautifulSoup("<br>", 'html.parser').br
print(br.prettify(formatter="html"))
print(br.prettify(formatter="html5"))
Then from the first print
, we see:
<br/>
And from the 2nd print
, we see:
<br>
Also, we can set the formatter
to None
:
from bs4 import BeautifulSoup
link_soup = BeautifulSoup('<a href="http://example.com/?foo=val1&bar=val2">A link</a>', 'html.parser')…