Software Engineering's New Tool: Automating Web Workflows with Skyvern
Here is an explanation of what Skyvern is, how it can help you as a software engineer, and how you can get started, all in a friendly, easy-to-understand manner.
Skyvern-AI/skyvern is an open-source framework that allows you to automate browser-based workflows using Large Language Models (LLMs) and Computer Vision.
Think of it as a next-generation browser automation tool. Instead of writing brittle selectors (div#main > a.link-class) like you would with traditional tools (e.g., Selenium, Puppeteer), Skyvern lets you describe the desired action in plain English. The tool then uses its built-in intelligence to understand the webpage's structure, identify the correct element (even if the page layout changes!), and perform the action.
As a software engineer, Skyvern can be a significant productivity booster and problem-solver, especially in the following areas
Problem
Traditional web scrapers often break because a website's frontend developers changed a CSS class name or shifted an element's position. This forces you to constantly maintain and update your scraping code.
Skyvern Solution
You don't target a specific CSS selector; you target the intent.
Instead of
driver.find_element(By.ID, "product-name-2024").click()
You instruct
"Click the 'Add to Cart' button for the blue T-shirt."
The LLM and Computer Vision models can visually interpret the page and find the correct element, making your automation much more resilient to UI changes.
Problem
Writing and maintaining E2E tests for complex, multi-step user flows (like an entire checkout process or a user registration form) can be time-consuming and prone to selector-based failures.
Skyvern Solution
You can create test scripts that read like human instructions. This makes your tests easier to write, more understandable, and significantly more stable. Your test suite becomes a documentation of user behavior.
Problem
Need a quick script to log into an external vendor's portal and download a report every day, but don't want to spend an hour mapping out all the form fields and buttons?
Skyvern Solution
Quickly script the entire workflow using high-level commands. This is perfect for internal tools, one-off data migration tasks, or proofs of concept, where speed and resilience are key.
Problem
Single Page Applications (SPAs) that load content dynamically or use complex interactions (drag-and-drop, modals) are notoriously difficult for simple scrapers.
Skyvern Solution
Since it observes the webpage using computer vision, it's better equipped to handle these complex, human-like interactions that simple DOM-based tools might miss.
The initial setup is typically done using Python's package manager, pip.
# It's always a good idea to use a virtual environment
python -m venv skyvern_env
source skyvern_env/bin/activate # On Windows, use: skyvern_env\Scripts\activate
# Install the Skyvern library
pip install skyvern
You will also likely need to configure an API key for the underlying LLM (like OpenAI or another supported model) which Skyvern uses for its "reasoning" layer. This is usually set as an environment variable (e.g., OPENAI_API_KEY).
Let's imagine you want to log into a sample website and click a specific link.
The process involves importing the necessary components, setting up the client, and then running the desired workflow steps.
import os
from skyvern.client import SkyvernClient
# --- 1. Configuration (Ensure OPENAI_API_KEY is set in your environment) ---
try:
skyvern = SkyvernClient(api_key=os.environ["OPENAI_API_KEY"])
except KeyError:
print("Error: Please set the OPENAI_API_KEY environment variable.")
exit()
# --- 2. Define the Target URL ---
LOGIN_URL = "https://example-login-site.com/login" # Replace with your actual target
# --- 3. Start the Workflow ---
print(f"Navigating to {LOGIN_URL}...")
session = skyvern.new_session(url=LOGIN_URL)
try:
# Action 1: Fill in the username field
# Notice the natural language instruction: "Fill in the username field with..."
print("Executing Action 1: Entering username...")
session.perform_action(
"Fill in the input field labeled 'Username' with the text 'my_test_user'"
)
# Action 2: Fill in the password field
print("Executing Action 2: Entering password...")
session.perform_action(
"Fill in the input field labeled 'Password' with the text 'secure-password-123'"
)
# Action 3: Click the login button
print("Executing Action 3: Clicking the login button...")
session.perform_action(
"Click the button that says 'Log In'"
)
# Action 4: Navigate after successful login
# This step assumes the page has changed after login
print("Executing Action 4: Navigating to the Reports page...")
session.perform_action(
"Click the link in the navigation bar titled 'Monthly Reports'"
)
# You can also get data back after navigation
# print("Current page URL:", session.current_url)
print("\n Workflow complete! Check the browser window for results.")
except Exception as e:
print(f"\n An error occurred during the workflow: {e}")
finally:
# --- 4. Clean up (Important for closing the browser session) ---
session.close()
print("Session closed.")
This example shows how the intent-driven approach simplifies complex browser interactions. You're giving the system high-level goals, and Skyvern handles the low-level, error-prone details of locating the exact element on the page.