Member-only story

DOM Manipulation with Beautiful Soup — Removing Nodes, Wrap and Unwrap Elements, and Printing

John Au-Yeung
3 min readJan 31, 2021

--

Photo by Hanna Balan on Unsplash

We can get data from web pages with Beautiful Soup.

It lets us parse the DOM and extract the data we want.

In this article, we’ll look at how to manipulate HTML documents with Beautiful Soup.

extract()

The extract method removes a node from the tree.

For examp[le, we can write:

from bs4 import BeautifulSoup, NavigableString
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'html.parser')
a_tag = soup.a
i_tag = soup.i.extract()
print(i_tag)
print(a_tag)

Then we get:

<i>example.com</i>

as the value of i_tag and:

<a href="http://example.com/">I linked to </a>

as the value of a_tag .

decompose()

The decompose method removes a tag from the tree and completely destroy it and its contents.

So if we write:

from bs4 import BeautifulSoup, NavigableString
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'html.parser')
a_tag…

--

--

No responses yet