Playwright Core
Playwright Core is a powerful wrapper around the Playwright browser automation library, specifically designed for Large Language Models (LLMs) to control web browsers. This package enhances the capabilities of the Factifai Agent Suite by providing an interaction system that's optimized for AI models.
Overview
Playwright Core offers simplified browser control, intelligent element detection, and rich visual debugging tools that make browser automation more reliable and easier to troubleshoot. It serves as the browser automation engine for Factifai Agent, providing a coordinate-based approach that allows AI models to control browsers without needing complex DOM selectors.
Key Features
- Enhanced Browser Control: Session-based browser management with improved stability
- Smart Element Detection: Automatic identification of interactive page elements
- Visual Debugging Tools: Visualization of detected elements with numbered overlays
- Simplified API: High-level functions that abstract away Playwright complexity
- LLM-friendly Interface: Streamlined coordinate-based approach optimized for AI models
Core Capabilities
Session Management
Playwright Core uses a session-based approach to manage browser instances, allowing multiple concurrent browser sessions with isolated contexts.
// Get a browser session
const sessionId = `test-${Date.now()}`;
const page = await BrowserService.getInstance().getPage(sessionId);
// Close a session when done
await BrowserService.getInstance().closePage(sessionId);
Visual Element Detection
One of the most powerful features is the ability to automatically detect interactive elements on a page and visualize them with numbered overlays.
// Mark all interactive elements with numbered boxes
await browser.markVisibleElements(sessionId);
// Take a screenshot with marked elements
const screenshot = await browser.takeMarkedScreenshot(sessionId);
Coordinate-Based Interaction
Instead of relying on complex DOM selectors, Playwright Core uses a coordinate-based approach for interactions, making it ideal for AI models.
// Click at specific coordinates
await click(sessionId, { x: 150, y: 200 });
// Type text (after clicking on an input field)
await type(sessionId, 'Hello, World!');
Screenshot Tools
Capture screenshots with optional element highlighting for debugging and documentation.
// Take a regular screenshot
const screenshot = await browser.takeScreenshot(sessionId);
// Take a screenshot with marked elements
const markedScreenshot = await browser.takeMarkedScreenshot(sessionId);
API Reference
BrowserService
The main service for managing browser sessions and interactions.
getInstance()
: Get the singleton instance of BrowserServicegetPage(sessionId)
: Get the active page for a sessiontakeScreenshot(sessionId, minWaitMs?)
: Capture a screenshotcaptureScreenshotAndInfer(sessionId)
: Capture screenshot with page element datagetAllPageElements(sessionId)
: Get all clickable and input elementstakeMarkedScreenshot(sessionId, options?)
: Take screenshot with marked elementsclosePage(sessionId)
: Close a sessioncloseAll()
: Close all sessions
Navigation Functions
navigate(sessionId, url, options?)
: Navigate to a URLgetCurrentUrl(sessionId)
: Get the current page URLreload(sessionId)
: Reload the current pagegoBack(sessionId)
: Navigate back in historygoForward(sessionId)
: Navigate forward in historywait(sessionId, ms)
: Wait for a specified time
Interaction Functions
click(sessionId, coordinates, options?)
: Click at specific coordinatestype(sessionId, text, options?)
: Type textclear(sessionId, coordinates?)
: Clear input fieldscrollToNextChunk(sessionId)
: Scroll down one viewportscrollToPrevChunk(sessionId)
: Scroll up one viewport
Element Marking Functions
markVisibleElements(sessionId, options?)
: Mark elements with numbered boxesremoveElementMarkers(sessionId)
: Remove element markers
Advanced Usage
Custom Element Marking
You can customize how elements are marked on the page:
// Mark interactive elements with custom colors
await browser.markVisibleElements(sessionId, {
boxColor: 'blue',
textColor: 'white',
borderWidth: 2,
elements: [
{ x: 100, y: 150, width: 200, height: 50, label: 'Search Box' }
]
});
Page Element Data
Get detailed information about interactive elements on the page:
// Get detailed info about page elements
const elements = await browser.getAllPageElements(sessionId);
console.log(`Found ${elements.length} interactive elements:`);
elements.forEach(el => {
console.log(`- ${el.tagName} at (${el.x}, ${el.y}), size: ${el.width}x${el.height}`);
});
Integration with Factifai Agent
Playwright Core is the browser automation engine that powers Factifai Agent. When you run a test with Factifai Agent, it:
- Parses your natural language instructions
- Converts them into a series of steps
- Uses Playwright Core to execute those steps in the browser
- Captures screenshots and generates reports
This separation of concerns allows Factifai Agent to focus on natural language processing and test orchestration, while Playwright Core handles the browser automation details.
Requirements
- Node.js 18+
- Playwright (peer dependency)
- Browser binaries (Chromium, Firefox, and/or WebKit)
Installation
# Install the package
npm install @presidio-dev/playwright-core
# IMPORTANT: Install Playwright globally first
npm install -g playwright
# Then install browser dependencies (required)
npx playwright install --with-deps
The installation process is crucial:
- First, install Playwright globally to ensure the CLI tools are properly recognized
- Then run
npx playwright install --with-deps
which installs:- Browser binaries (Chromium, Firefox, WebKit)
- Required system dependencies for proper browser operation
- Font packages and media codecs needed for complete rendering