Member-only story
Web Scraping with Beautiful Soup — Siblings and Parent Nodes
3 min readJan 31, 2021
We can get data from web pages with Beautiful Soup.
It lets us parse the DOM and extract the data we want.
In this article, we’ll look at how to scrape HTML documents with Beautiful Soup.
find_parents()
and find_parent()
We can find parent elements of a given element with the find_parents
method.
The find_parent
method returns the first parent element only.
For example, we can write:
from bs4 import BeautifulSoup
import re
html_doc = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""soup = BeautifulSoup(html_doc, 'html.parser')
a_string = soup.find(string="Lacie")
print(a_string.find_parents("a"))
And we get:
[<a class="sister"…