Collecting data from websites, commonly known as web scraping, is a practical technique for many projects. Libraries like BeautifulSoup are great for working with basic HTML, however they often struggle when pages rely heavily on JavaScript in order to display content. That is where Selenium comes in to solve this.
In this guide, you will learn how to use Selenium with Python to scrape websites effectively.
First Thing First – What Is Selenium?
Selenium is a browser automation framework designed for testing web applications. It simulates real user behaviour by controlling an actual browser like Chrome or Firefox. Due to this of this, it can handle JavaScript-rendered content that other tools cannot.
This makes Selenium a great solution for scraping content from interactive websites, forms, infinite scrolls and more.
How To Install Selenium
To get started, make sure to install Selenium with pip:
pip install selenium
How To Set Up a WebDriver
Selenium requires a WebDriver to communicate with the browser. Here is a simple example using Chrome:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service("/path/to/chromedriver")
driver = webdriver.Chrome(service=service)
If you want to run the browser without opening a window (useful on servers), make sure to enable headless mode:
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
How To Find Elements on the Page
You can use different strategies to locate HTML elements:
from selenium.webdriver.common.by import By
element = driver.find_element(By.CLASS_NAME, "product-title")
Other locator options are:
By.ID
By.TAG_NAME
By.CSS_SELECTOR
By.XPATH
Waiting for JavaScript to Load
Instead of using time.sleep()
, Selenium supports smart waiting using WebDriverWait
:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "content"))
)
Executing JavaScript
If you need to scroll the page or trigger poorly loaded elements, you can run JavaScript:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
How To Take Screenshots
Capture a screenshot of the current view with:
driver.save_screenshot("screenshot.png")
Handling Pagination
To scrape multiple pages, you can loop through links or interact with a “Next” button:
next_button = driver.find_element(By.LINK_TEXT, "Next")
next_button.click()
Exporting Data
You can use the Pandas library to save your scraped data to a CSV file:
import pandas as pd
df = pd.DataFrame(data)
df.to_csv("output.csv", index=False)
Scrolling with Keys
To simulate pressing keys like PAGE_DOWN
or END
:
from selenium.webdriver.common.keys import Keys
body = driver.find_element(By.TAG_NAME, "body")
body.send_keys(Keys.END)
Blocking Images and Other Resources
To speed up scraping and reduce resource usage:
driver.execute_cdp_cmd("Network.setBlockedURLs", {"urls": ["*.jpg", "*.png"]})
How Does Selenium Compare to Other Tools?
Tool | JavaScript Support | Speed | Ideal Use Case |
---|---|---|---|
Selenium | Full | Moderate | Interactive/dynamic pages |
BeautifulSoup | None | Quick | Static HTML scraping |
Scrapy | Optional (through Selenium) | Very quick | Large-scale scraping projects |
Puppeteer | Full (Node.js only) | Moderate | Headless Chromium-based scraping |
When Should You Use Selenium?
Choose Selenium when:
- The website relies mostly on JavaScript
- You are in need to simulate user interactions (clicks, scrolls and inputs)
- You’re working on a small or medium-scale scraping task
For larger or faster scraping jobs, consider tools like Scrapy, or specialized APIs that take care of residential proxies, CAPTCHA and JavaScript for you.
Conclusion
Selenium is a perfect option for scraping dynamic websites using Python. After setting it up, it allows you to extract content from complex pages. While it’s not the most fastest tool, its ability to automate a real browser makes it incredibly flexible.