Member-only story
DOM Manipulation with Beautiful Soup — Removing Nodes, Wrap and Unwrap Elements, and Printing
3 min readJan 31, 2021
We can get data from web pages with Beautiful Soup.
It lets us parse the DOM and extract the data we want.
In this article, we’ll look at how to manipulate HTML documents with Beautiful Soup.
extract()
The extract
method removes a node from the tree.
For examp[le, we can write:
from bs4 import BeautifulSoup, NavigableString
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'html.parser')
a_tag = soup.a
i_tag = soup.i.extract()
print(i_tag)
print(a_tag)
Then we get:
<i>example.com</i>
as the value of i_tag
and:
<a href="http://example.com/">I linked to </a>
as the value of a_tag
.
decompose()
The decompose
method removes a tag from the tree and completely destroy it and its contents.
So if we write:
from bs4 import BeautifulSoup, NavigableString
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'html.parser')
a_tag…