How to use Selenium and Python for web scraping

Collecting data from websites, commonly known as web scraping, is a practical technique for many projects. Libraries like BeautifulSoup are great for working with basic HTML, however they often struggle when pages rely heavily on JavaScript in order to display content. That is where Selenium comes in to solve this.

In this guide, you will learn how to use Selenium with Python to scrape websites effectively.

First Thing First – What Is Selenium?

Selenium is a browser automation framework designed for testing web applications. It simulates real user behaviour by controlling an actual browser like Chrome or Firefox. Due to this of this, it can handle JavaScript-rendered content that other tools cannot.

This makes Selenium a great solution for scraping content from interactive websites, forms, infinite scrolls and more.

How To Install Selenium

To get started, make sure to install Selenium with pip:

pip install selenium

How To Set Up a WebDriver

Selenium requires a WebDriver to communicate with the browser. Here is a simple example using Chrome:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service("/path/to/chromedriver")
driver = webdriver.Chrome(service=service)

If you want to run the browser without opening a window (useful on servers), make sure to enable headless mode:

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)

How To Find Elements on the Page

You can use different strategies to locate HTML elements:

from selenium.webdriver.common.by import By

element = driver.find_element(By.CLASS_NAME, "product-title")

Other locator options are:

By.ID
By.TAG_NAME
By.CSS_SELECTOR
By.XPATH

Waiting for JavaScript to Load

Instead of using time.sleep(), Selenium supports smart waiting using WebDriverWait:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "content"))
)

Executing JavaScript

If you need to scroll the page or trigger poorly loaded elements, you can run JavaScript:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

How To Take Screenshots

Capture a screenshot of the current view with:

driver.save_screenshot("screenshot.png")

Handling Pagination

To scrape multiple pages, you can loop through links or interact with a “Next” button:

next_button = driver.find_element(By.LINK_TEXT, "Next")
next_button.click()

Exporting Data

You can use the Pandas library to save your scraped data to a CSV file:

import pandas as pd

df = pd.DataFrame(data)
df.to_csv("output.csv", index=False)

Scrolling with Keys

To simulate pressing keys like PAGE_DOWN or END:

from selenium.webdriver.common.keys import Keys

body = driver.find_element(By.TAG_NAME, "body")
body.send_keys(Keys.END)

Blocking Images and Other Resources

To speed up scraping and reduce resource usage:

driver.execute_cdp_cmd("Network.setBlockedURLs", {"urls": ["*.jpg", "*.png"]})

How Does Selenium Compare to Other Tools?

Tool	JavaScript Support	Speed	Ideal Use Case
Selenium	Full	Moderate	Interactive/dynamic pages
BeautifulSoup	None	Quick	Static HTML scraping
Scrapy	Optional (through Selenium)	Very quick	Large-scale scraping projects
Puppeteer	Full (Node.js only)	Moderate	Headless Chromium-based scraping

When Should You Use Selenium?

Choose Selenium when:

The website relies mostly on JavaScript
You are in need to simulate user interactions (clicks, scrolls and inputs)
You’re working on a small or medium-scale scraping task

For larger or faster scraping jobs, consider tools like Scrapy, or specialized APIs that take care of residential proxies, CAPTCHA and JavaScript for you.

Conclusion

Selenium is a perfect option for scraping dynamic websites using Python. After setting it up, it allows you to extract content from complex pages. While it’s not the most fastest tool, its ability to automate a real browser makes it incredibly flexible.

UP TO 50% off ON RESIDENTIAL PROXIES