Harsh Maur
November 27, 2024
7 Mins read
Scraping

Handling Cookies in Playwright Python

Want to manage cookies easily for web scraping or automation? Playwright in Python makes it simple. Here's what you need to know:

Why cookies matter: Cookies store session data, keep you logged in, and help you act like a real user while scraping.
Key methods:
- add_cookies(): Add cookies to your browser session.
- get_cookies(): View active cookies.
- clear_cookies(): Remove cookies from a session.
Save and restore cookies: Use JSON files to save cookies and reload them later, avoiding repeated logins.
Browser contexts: Create separate "browsers" to isolate cookies for different tasks or user accounts.

Quick Example:

context.add_cookies([
    {"name": "auth_token", "value": "xyz789", "domain": "example.com", "path": "/"}
])
page.goto("https://example.com")

This guide covers everything from setting cookies to saving sessions, ensuring seamless automation and scraping workflows.

How Playwright Handles Cookies

Playwright

Playwright makes cookie management simple and effective through its browser contexts feature. This system helps you control cookies and session data when you're running web scraping tasks or automated tests. The best part? Each browser context keeps cookies separate, so you can run multiple tasks without them getting mixed up.

Think of browser contexts like separate browsers - each one has its cookie jar. This comes in handy when you need to work with multiple user accounts at once. Want to scrape data from two different user profiles on the same website? Just create two browser contexts and you're good to go.

Here's a quick example of setting up a browser context with cookies:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    context = browser.new_context()
    context.add_cookies([
        {"name": "session_id", "value": "abc123", "domain": "example.com", "path": "/"}
    ])
    page = context.new_page()
    page.goto("/")

Browser contexts act just like real browsers. You can set up one context to act like a logged-in user and another as a guest - perfect for testing different user experiences on your site.

Key Methods for Managing Cookies in Playwright

Playwright gives you three main tools to work with cookies:

context.cookies() shows you all cookies in your current context
context.add_cookies() lets you add new cookies
context.clear_cookies() wipes the slate clean by removing all cookies

Here's how you might use these methods in practice:

# Take a look at your current cookies
cookies = context.cookies()
for cookie in cookies:
    print(f"Name: {cookie['name']}, Value: {cookie['value']}")

# Add a new cookie to the mix
context.add_cookies([
    {"name": "auth_token", "value": "xyz789", "domain": "example.com", "path": "/"}
])

How to Set Cookies in Playwright

Setting cookies in Playwright helps you control application states and test different scenarios. Here's how to handle cookies effectively in your tests.

Setting Cookies with `add_cookies()`

The add_cookies() method lets you add cookies to a browser context. You'll need to provide a list of dictionaries with these key details:

name: Cookie name (required)
value: Cookie value (required)
domain: Cookie's domain
path: Server path where the cookie works

Here's how to skip the log in process by setting a session cookie for Amazon:

context.add_cookies([
    {
        'name': 'auth_token',
        'value': 'xyz789',
        'domain': 'amazon.com',
        'path': '/'
    }
])
page.goto("https://www.amazon.com")

Advanced Techniques for Setting Cookies

Need more than a basic cookie setup? Here's how to handle complex scenarios.

When you're working with server-generated cookies like auth tokens, you can capture and use them dynamically:

api_request_context = context.request
response = await api_request_context.post(
    "/login",
    data={"username": "user", "password": "pass"}
)
# Cookies from the response automatically go to the browser context

For projects involving multiple domains, you'll want to keep cookies separate. Here's how:

# Set up contexts for each domain
context1 = browser.new_context(base_url="https://domain1.com")
context2 = browser.new_context(base_url="https://domain2.com")

# Add domain-specific cookies
context1.add_cookies([
    {'name': 'user_id', 'value': '123', 'domain': 'domain1.com'}
])

Have cookie troubles? Check these common issues:

Match the domain and path exactly with your target site
Set cookies before loading the page
Use context.cookies() to check what cookies are active

This setup works great for testing apps that use multiple domains or third-party APIs.

Saving and Restoring Cookies in Playwright

Want to avoid those annoying repeated logins during web scraping? Let's look at how Playwright helps you save and reload cookies between sessions.

How to Save Cookies to a File

Here's a simple script to save browser cookies to a JSON file:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://example.com")

    # Get current cookies
    cookies = context.cookies()

    # Save to JSON file
    with open('cookies.json', 'w') as f:
        json.dump(cookies, f)

    browser.close()

How to Load Cookies from a File

Here's how to bring those cookies back to life in a new session:

import json
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    context = browser.new_context()
    page = context.new_page()

    # Load cookies from file
    with open('cookies.json', 'r') as f:
        cookies = json.load(f)

    # Add cookies to context
    context.add_cookies(cookies)

    # Verify session
    page.goto("https://example.com")

    browser.close()

Tips for Saving and Restoring Cookies

Here are some pro tips to make your cookie management more effective:

Watch Out for Dynamic Cookies: Some cookies change during a session (like auth tokens). Use Playwright's APIRequestContext to catch and update these in your browser context.
Keep Test Sessions Clean: When running automated tests, use session-scoped fixtures. This keeps your cookie state consistent and your test results reliable.
Separate Your Cookie Jars: Working with multiple domains? Use different browser contexts to keep cookies from mixing where they shouldn't.

Tips for Managing Cookies and Fixing Issues

Let's dive into how to handle cookies effectively in Playwright for smooth web automation and scraping.

Keeping Cookies Isolated Between Contexts

Want to run multiple user sessions without them interfering with each other? Here's a simple way to keep cookies separate:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    # Create two separate contexts
    user1_context = browser.new_context()
    user2_context = browser.new_context()

    # Each context operates independently
    user1_page = user1_context.new_page()
    user2_page = user2_context.new_page()

    user1_page.goto("https://example.com")
    user2_page.goto("https://example.com")

Let's tackle the most common cookie headaches and how to fix them:

Cookie Persistence Issues Having trouble keeping your cookies? Here's what to check:

Run context.cookies() to make sure they're stored
Save cookies to a file after each session
Pop open your browser's dev tools to check cookie details

Cookie Loading Problems If your cookies won't load, check these three things:

Is your JSON formatted correctly?
Have your cookies expired?
Are the cookie settings (secure, httpOnly) set up right?

Working with Cross-Domain Cookies Cookies are picky about domains. Here's how to set them up properly:

context.add_cookies([
    {"name": "auth_token", "value": "xyz789", "domain": ".example.com"}
])

Handling API Request Cookies Need to keep your API and browser cookies in sync? Here's the way:

api_context = browser.new_context()
api_request = api_context.request
response = api_request.get("https://example.com/api")

For more complex cookie scenarios, check out tools like Web Scraping HQ - they've got specialized tools to handle tricky cookie situations.

Conclusion

Good cookie management makes web scraping and automation with Playwright in Python work better. You can control cookies by using add_cookies() and save or load browser sessions with storage_state.

The storage_state feature helps you manage sessions by saving and loading browser data, including cookies and storage info. Here's a quick example with API contexts:

# Save browser state including cookies
browser_context.storage_state(path="state.json")

# Create API context with saved state
api_context = browser.new_context(storage_state="state.json")
api_request = api_context.request
response = api_request.get("https://example.com/api")

Playwright's cookie handling works well for complex tasks too, like working across different domains. You can share cookies between browser contexts and API requests, which helps when you're building advanced automation scripts.

For bigger scraping projects or when you need extra help managing cookies, check out Web Scraping HQ - they've got tools that can make your web automation easier.

FAQs

How to set cookies in Playwright Python?

Setting cookies in Playwright Python is a straightforward context.add_cookies() method. Here's how to add cookies to your browser context:

context.add_cookies([
    {'name': 'session_id', 'value': 'abc123', 'domain': 'example.com', 'path': '/'},
])

This comes in handy when you need to skip login steps or set up specific browser states. For example, if you're building a scraper that needs access to members-only content, you can add session cookies directly instead of going through the login process each time.

How to enable cookies in Playwright?

Good news - Playwright handles cookies automatically! They're ON by default, so you don't need to fiddle with any settings. Want to check your current cookies? Just use:

cookies = context.cookies()
print(cookies)

This makes it super easy to track and debug your cookie-related tasks. Whether you're testing a website or building a scraper, Playwright's cookie handling works right out of the box.

How do you save cookies in Playwright?

Want to keep your cookies for later? Here's a simple way to save and reload them:

# Save cookies to a file
import json
cookies = context.cookies()
with open('cookies.json', 'w') as file:
    json.dump(cookies, file)

# Load cookies from a file
with open('cookies.json', 'r') as file:
    cookies = json.load(file)
context.add_cookies(cookies)

This trick is perfect for saving login sessions or user settings. Instead of logging in every time you run your script, just load your saved cookies and you're good to go!

How do you save cookies in Playwright Python?

The cookie-saving process in Playwright Python works exactly as shown above. Just grab your cookies context.cookies(), save them to a file, and load them back when needed. It's that simple! This approach works great for maintaining login states and user preferences across different script runs.