Add browse tool support. I reviewed some MCPs (using OpenAI's deep research to help), and it helped me choose chromedp as the relevant library and helped me come up with an interface. This commit adds chrome to the Docker image which is kind of big. (I've noticed that it's smaller on Ubuntu, where it doesn't pull in X11.) go-playwright was a library contender as well. Implement browser automation tooling using chromedp This implementation adds browser automation capabilities to the system via the chromedp library, enabling Claude to interact with web content effectively. Key features include: 1. Core browser automation functionality: - Created new browsertools package in claudetool/browser - Implemented tools for navigating, clicking, typing, waiting for elements, getting text, evaluating JavaScript, taking screenshots, and scrolling - Added lazy browser initialization that defers until first use - Integrated with the agent to expose these tools to Claude 2. Screenshot handling and display: - Implemented screenshot storage with UUID-based IDs in /tmp/sketch-screenshots - Added endpoint to serve screenshots via /screenshot/{id} - Created dedicated UI component for displaying screenshots - Ensured proper responsive design with loading states and error handling - Fixed URL paths for proper rehomed URL support - Modified tool calls component to auto-expand screenshot results 3. Error handling and reliability: - Added graceful error handling for browser initialization failures - Implemented proper cleanup of browser resources The browser automation tools provide a powerful way for Claude to interact with web content, making it possible to scrape data, test web applications, and automate web-based tasks. Co-Authored-By: sketch <hello@sketch.dev>

commit: 33d282f80db786cc60ba521a38ed5166f23239ed [log] [tgz]
author: Philip Zeyliger <philip@bold.dev> Sat May 03 04:01:54 2025 +0000
committer: Philip Zeyliger <philip@bold.dev> Tue May 06 10:23:39 2025 -0700
tree: 9ed1f15c6d3081d5bef7d16b9d72e78a2c7780cf
parent: a9d87aa69cfefdc91ec7aaa6bc42907749748e76 [diff]
diff --git a/claudetool/browse/README.md b/claudetool/browse/README.md
new file mode 100644
index 0000000..6c03e1a
--- /dev/null
+++ b/claudetool/browse/README.md

@@ -0,0 +1,116 @@
+# Browser Tools for Claude
+
+This package provides a set of tools that allow Claude to control a headless
+Chrome browser from Go. The tools are built using the
+[chromedp](https://github.com/chromedp/chromedp) library.
+
+## Available Tools
+
+1. `browser_navigate` - Navigate to a URL and wait for the page to load
+2. `browser_click` - Click an element matching a CSS selector
+3. `browser_type` - Type text into an input field
+4. `browser_wait_for` - Wait for an element to appear in the DOM
+5. `browser_get_text` - Get the text content of an element
+6. `browser_eval` - Evaluate JavaScript in the browser context
+7. `browser_screenshot` - Take a screenshot of the page or a specific element
+8. `browser_scroll_into_view` - Scroll an element into view
+
+## Usage
+
+```go
+// Create a context
+ctx := context.Background()
+
+// Register browser tools and get a cleanup function
+tools, cleanup := browse.RegisterBrowserTools(ctx)
+defer cleanup() // Important: always call cleanup to release browser resources
+
+// Add tools to your agent
+for _, tool := range tools {
+    agent.AddTool(tool)
+}
+```
+
+## Requirements
+
+- Chrome or Chromium must be installed on the system
+- The `chromedp` package handles launching and controlling the browser
+
+## Tool Input/Output
+
+All tools follow a standard JSON input/output format. For example:
+
+**Navigate Tool Input:**
+```json
+{
+  "url": "https://example.com"
+}
+```
+
+**Navigate Tool Output (success):**
+```json
+{
+  "status": "success"
+}
+```
+
+**Tool Output (error):**
+```json
+{
+  "status": "error",
+  "error": "Error message"
+}
+```
+
+## Example Tool Usage
+
+```go
+// Example of using the navigate tool directly
+navTool := tools[0] // Get browser_navigate tool
+input := map[string]string{"url": "https://example.com"}
+inputJSON, _ := json.Marshal(input)
+
+// Call the tool
+result, err := navTool.Run(ctx, json.RawMessage(inputJSON))
+if err != nil {
+    log.Fatalf("Error: %v", err)
+}
+fmt.Println(result)
+```
+
+## Screenshot Storage
+
+The browser screenshot tool has been modified to save screenshots to a temporary directory and identify them by ID, rather than returning base64-encoded data directly. This improves efficiency by:
+
+1. Reducing token usage in LLM responses
+2. Avoiding encoding/decoding overhead
+3. Allowing for larger screenshots without message size limitations
+
+### How It Works
+
+1. When a screenshot is taken, it's saved to `/tmp/sketch-screenshots/` with a unique UUID filename
+2. The tool returns the screenshot ID in its response
+3. The web UI can fetch the screenshot using the `/screenshot/{id}` endpoint
+
+### Example Usage
+
+Agent calls the screenshot tool:
+```json
+{
+  "id": "tool_call_123",
+  "name": "browser_screenshot",
+  "params": {}
+}
+```
+
+Tool response:
+```json
+{
+  "id": "tool_call_123",
+  "result": {
+    "id": "550e8400-e29b-41d4-a716-446655440000"
+  }
+}
+```
+
+The screenshot is then accessible at: `/screenshot/550e8400-e29b-41d4-a716-446655440000`

diff --git a/claudetool/browse/browse.go b/claudetool/browse/browse.go
new file mode 100644
index 0000000..a76cd24
--- /dev/null
+++ b/claudetool/browse/browse.go

@@ -0,0 +1,633 @@
+// Package browse provides browser automation tools for the agent
+package browse
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"log"
+	"os"
+	"path/filepath"
+	"sync"
+	"time"
+
+	"github.com/chromedp/chromedp"
+	"github.com/google/uuid"
+	"sketch.dev/llm"
+)
+
+// ScreenshotDir is the directory where screenshots are stored
+const ScreenshotDir = "/tmp/sketch-screenshots"
+
+// BrowseTools contains all browser tools and manages a shared browser instance
+type BrowseTools struct {
+	ctx              context.Context
+	cancel           context.CancelFunc
+	browserCtx       context.Context
+	browserCtxCancel context.CancelFunc
+	mux              sync.Mutex
+	initOnce         sync.Once
+	initialized      bool
+	initErr          error
+	// Map to track screenshots by ID and their creation time
+	screenshots      map[string]time.Time
+	screenshotsMutex sync.Mutex
+}
+
+// NewBrowseTools creates a new set of browser automation tools
+func NewBrowseTools(ctx context.Context) *BrowseTools {
+	ctx, cancel := context.WithCancel(ctx)
+
+	// Ensure the screenshot directory exists
+	if err := os.MkdirAll(ScreenshotDir, 0755); err != nil {
+		log.Printf("Failed to create screenshot directory: %v", err)
+	}
+
+	b := &BrowseTools{
+		ctx:         ctx,
+		cancel:      cancel,
+		screenshots: make(map[string]time.Time),
+	}
+
+	return b
+}
+
+// Initialize starts the browser if it's not already running
+func (b *BrowseTools) Initialize() error {
+	b.mux.Lock()
+	defer b.mux.Unlock()
+
+	b.initOnce.Do(func() {
+		// ChromeDP.ExecPath has a list of common places to find Chrome...
+		opts := chromedp.DefaultExecAllocatorOptions[:]
+		allocCtx, _ := chromedp.NewExecAllocator(b.ctx, opts...)
+		browserCtx, browserCancel := chromedp.NewContext(
+			allocCtx,
+			chromedp.WithLogf(log.Printf),
+		)
+
+		b.browserCtx = browserCtx
+		b.browserCtxCancel = browserCancel
+
+		// Ensure the browser starts
+		if err := chromedp.Run(browserCtx); err != nil {
+			b.initErr = fmt.Errorf("failed to start browser (please apt get chromium or equivalent): %w", err)
+			return
+		}
+		b.initialized = true
+	})
+
+	return b.initErr
+}
+
+// Close shuts down the browser
+func (b *BrowseTools) Close() {
+	b.mux.Lock()
+	defer b.mux.Unlock()
+
+	if b.browserCtxCancel != nil {
+		b.browserCtxCancel()
+		b.browserCtxCancel = nil
+	}
+
+	if b.cancel != nil {
+		b.cancel()
+	}
+
+	b.initialized = false
+	log.Println("Browser closed")
+}
+
+// GetBrowserContext returns the context for browser operations
+func (b *BrowseTools) GetBrowserContext() (context.Context, error) {
+	if err := b.Initialize(); err != nil {
+		return nil, err
+	}
+	return b.browserCtx, nil
+}
+
+// All tools return this as a response when successful
+type baseResponse struct {
+	Status string `json:"status,omitempty"`
+}
+
+func successResponse() string {
+	return `{"status":"success"}`
+}
+
+func errorResponse(err error) string {
+	return fmt.Sprintf(`{"status":"error","error":"%s"}`, err.Error())
+}
+
+// NavigateTool definition
+type navigateInput struct {
+	URL string `json:"url"`
+}
+
+// NewNavigateTool creates a tool for navigating to URLs
+func (b *BrowseTools) NewNavigateTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_navigate",
+		Description: "Navigate the browser to a specific URL and wait for page to load",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"url": {
+					"type": "string",
+					"description": "The URL to navigate to"
+				}
+			},
+			"required": ["url"]
+		}`),
+		Run: b.navigateRun,
+	}
+}
+
+func (b *BrowseTools) navigateRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input navigateInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	err = chromedp.Run(browserCtx,
+		chromedp.Navigate(input.URL),
+		chromedp.WaitReady("body"),
+	)
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	return successResponse(), nil
+}
+
+// ClickTool definition
+type clickInput struct {
+	Selector    string `json:"selector"`
+	WaitVisible bool   `json:"wait_visible,omitempty"`
+}
+
+// NewClickTool creates a tool for clicking elements
+func (b *BrowseTools) NewClickTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_click",
+		Description: "Click the first element matching a CSS selector",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"selector": {
+					"type": "string",
+					"description": "CSS selector for the element to click"
+				},
+				"wait_visible": {
+					"type": "boolean",
+					"description": "Wait for the element to be visible before clicking"
+				}
+			},
+			"required": ["selector"]
+		}`),
+		Run: b.clickRun,
+	}
+}
+
+func (b *BrowseTools) clickRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input clickInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	actions := []chromedp.Action{
+		chromedp.WaitReady(input.Selector),
+	}
+
+	if input.WaitVisible {
+		actions = append(actions, chromedp.WaitVisible(input.Selector))
+	}
+
+	actions = append(actions, chromedp.Click(input.Selector))
+
+	err = chromedp.Run(browserCtx, actions...)
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	return successResponse(), nil
+}
+
+// TypeTool definition
+type typeInput struct {
+	Selector string `json:"selector"`
+	Text     string `json:"text"`
+	Clear    bool   `json:"clear,omitempty"`
+}
+
+// NewTypeTool creates a tool for typing into input elements
+func (b *BrowseTools) NewTypeTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_type",
+		Description: "Type text into an input or textarea element",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"selector": {
+					"type": "string",
+					"description": "CSS selector for the input element"
+				},
+				"text": {
+					"type": "string",
+					"description": "Text to type into the element"
+				},
+				"clear": {
+					"type": "boolean",
+					"description": "Clear the input field before typing"
+				}
+			},
+			"required": ["selector", "text"]
+		}`),
+		Run: b.typeRun,
+	}
+}
+
+func (b *BrowseTools) typeRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input typeInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	actions := []chromedp.Action{
+		chromedp.WaitReady(input.Selector),
+		chromedp.WaitVisible(input.Selector),
+	}
+
+	if input.Clear {
+		actions = append(actions, chromedp.Clear(input.Selector))
+	}
+
+	actions = append(actions, chromedp.SendKeys(input.Selector, input.Text))
+
+	err = chromedp.Run(browserCtx, actions...)
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	return successResponse(), nil
+}
+
+// WaitForTool definition
+type waitForInput struct {
+	Selector  string `json:"selector"`
+	TimeoutMS int    `json:"timeout_ms,omitempty"`
+}
+
+// NewWaitForTool creates a tool for waiting for elements
+func (b *BrowseTools) NewWaitForTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_wait_for",
+		Description: "Wait for an element to be present in the DOM",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"selector": {
+					"type": "string",
+					"description": "CSS selector for the element to wait for"
+				},
+				"timeout_ms": {
+					"type": "integer",
+					"description": "Maximum time to wait in milliseconds (default: 30000)"
+				}
+			},
+			"required": ["selector"]
+		}`),
+		Run: b.waitForRun,
+	}
+}
+
+func (b *BrowseTools) waitForRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input waitForInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	timeout := 30000 // default timeout 30 seconds
+	if input.TimeoutMS > 0 {
+		timeout = input.TimeoutMS
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	timeoutCtx, cancel := context.WithTimeout(browserCtx, time.Duration(timeout)*time.Millisecond)
+	defer cancel()
+
+	err = chromedp.Run(timeoutCtx, chromedp.WaitReady(input.Selector))
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	return successResponse(), nil
+}
+
+// GetTextTool definition
+type getTextInput struct {
+	Selector string `json:"selector"`
+}
+
+type getTextOutput struct {
+	Text string `json:"text"`
+}
+
+// NewGetTextTool creates a tool for getting text from elements
+func (b *BrowseTools) NewGetTextTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_get_text",
+		Description: "Get the innerText of an element",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"selector": {
+					"type": "string",
+					"description": "CSS selector for the element to get text from"
+				}
+			},
+			"required": ["selector"]
+		}`),
+		Run: b.getTextRun,
+	}
+}
+
+func (b *BrowseTools) getTextRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input getTextInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	var text string
+	err = chromedp.Run(browserCtx,
+		chromedp.WaitReady(input.Selector),
+		chromedp.Text(input.Selector, &text),
+	)
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	output := getTextOutput{Text: text}
+	result, err := json.Marshal(output)
+	if err != nil {
+		return errorResponse(fmt.Errorf("failed to marshal response: %w", err)), nil
+	}
+
+	return string(result), nil
+}
+
+// EvalTool definition
+type evalInput struct {
+	Expression string `json:"expression"`
+}
+
+type evalOutput struct {
+	Result any `json:"result"`
+}
+
+// NewEvalTool creates a tool for evaluating JavaScript
+func (b *BrowseTools) NewEvalTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_eval",
+		Description: "Evaluate JavaScript in the browser context",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"expression": {
+					"type": "string",
+					"description": "JavaScript expression to evaluate"
+				}
+			},
+			"required": ["expression"]
+		}`),
+		Run: b.evalRun,
+	}
+}
+
+func (b *BrowseTools) evalRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input evalInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	var result any
+	err = chromedp.Run(browserCtx, chromedp.Evaluate(input.Expression, &result))
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	output := evalOutput{Result: result}
+	response, err := json.Marshal(output)
+	if err != nil {
+		return errorResponse(fmt.Errorf("failed to marshal response: %w", err)), nil
+	}
+
+	return string(response), nil
+}
+
+// ScreenshotTool definition
+type screenshotInput struct {
+	Selector string `json:"selector,omitempty"`
+	Format   string `json:"format,omitempty"`
+}
+
+type screenshotOutput struct {
+	ID string `json:"id"`
+}
+
+// NewScreenshotTool creates a tool for taking screenshots
+func (b *BrowseTools) NewScreenshotTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_screenshot",
+		Description: "Take a screenshot of the page or a specific element",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"selector": {
+					"type": "string",
+					"description": "CSS selector for the element to screenshot (optional)"	
+				},
+				"format": {
+					"type": "string",
+					"description": "Output format ('base64' or 'png'), defaults to 'base64'",
+					"enum": ["base64", "png"]
+				}
+			}
+		}`),
+		Run: b.screenshotRun,
+	}
+}
+
+func (b *BrowseTools) screenshotRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input screenshotInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	var buf []byte
+	var actions []chromedp.Action
+
+	if input.Selector != "" {
+		// Take screenshot of specific element
+		actions = append(actions,
+			chromedp.WaitReady(input.Selector),
+			chromedp.Screenshot(input.Selector, &buf, chromedp.NodeVisible),
+		)
+	} else {
+		// Take full page screenshot
+		actions = append(actions, chromedp.CaptureScreenshot(&buf))
+	}
+
+	err = chromedp.Run(browserCtx, actions...)
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	// Save the screenshot and get its ID
+	id := b.SaveScreenshot(buf)
+	if id == "" {
+		return errorResponse(fmt.Errorf("failed to save screenshot")), nil
+	}
+
+	// Return the ID in the response
+	output := screenshotOutput{ID: id}
+	response, err := json.Marshal(output)
+	if err != nil {
+		return errorResponse(fmt.Errorf("failed to marshal response: %w", err)), nil
+	}
+
+	return string(response), nil
+}
+
+// ScrollIntoViewTool definition
+type scrollIntoViewInput struct {
+	Selector string `json:"selector"`
+}
+
+// NewScrollIntoViewTool creates a tool for scrolling elements into view
+func (b *BrowseTools) NewScrollIntoViewTool() *llm.Tool {
+	return &llm.Tool{
+		Name:        "browser_scroll_into_view",
+		Description: "Scroll an element into view if it's not visible",
+		InputSchema: json.RawMessage(`{
+			"type": "object",
+			"properties": {
+				"selector": {
+					"type": "string",
+					"description": "CSS selector for the element to scroll into view"
+				}
+			},
+			"required": ["selector"]
+		}`),
+		Run: b.scrollIntoViewRun,
+	}
+}
+
+func (b *BrowseTools) scrollIntoViewRun(ctx context.Context, m json.RawMessage) (string, error) {
+	var input scrollIntoViewInput
+	if err := json.Unmarshal(m, &input); err != nil {
+		return errorResponse(fmt.Errorf("invalid input: %w", err)), nil
+	}
+
+	browserCtx, err := b.GetBrowserContext()
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	script := fmt.Sprintf(`
+		const el = document.querySelector('%s');
+		if (el) {
+			el.scrollIntoView({behavior: 'smooth', block: 'center'});
+			return true;
+		}
+		return false;
+	`, input.Selector)
+
+	var result bool
+	err = chromedp.Run(browserCtx,
+		chromedp.WaitReady(input.Selector),
+		chromedp.Evaluate(script, &result),
+	)
+	if err != nil {
+		return errorResponse(err), nil
+	}
+
+	if !result {
+		return errorResponse(fmt.Errorf("element not found: %s", input.Selector)), nil
+	}
+
+	return successResponse(), nil
+}
+
+// GetAllTools returns all browser tools
+func (b *BrowseTools) GetAllTools() []*llm.Tool {
+	return []*llm.Tool{
+		b.NewNavigateTool(),
+		b.NewClickTool(),
+		b.NewTypeTool(),
+		b.NewWaitForTool(),
+		b.NewGetTextTool(),
+		b.NewEvalTool(),
+		b.NewScreenshotTool(),
+		b.NewScrollIntoViewTool(),
+	}
+}
+
+// SaveScreenshot saves a screenshot to disk and returns its ID
+func (b *BrowseTools) SaveScreenshot(data []byte) string {
+	// Generate a unique ID
+	id := uuid.New().String()
+
+	// Save the file
+	filePath := filepath.Join(ScreenshotDir, id+".png")
+	if err := os.WriteFile(filePath, data, 0644); err != nil {
+		log.Printf("Failed to save screenshot: %v", err)
+		return ""
+	}
+
+	// Track this screenshot
+	b.screenshotsMutex.Lock()
+	b.screenshots[id] = time.Now()
+	b.screenshotsMutex.Unlock()
+
+	return id
+}
+
+// GetScreenshotPath returns the full path to a screenshot by ID
+func GetScreenshotPath(id string) string {
+	return filepath.Join(ScreenshotDir, id+".png")
+}

diff --git a/claudetool/browse/browse_test.go b/claudetool/browse/browse_test.go
new file mode 100644
index 0000000..b3168da
--- /dev/null
+++ b/claudetool/browse/browse_test.go

@@ -0,0 +1,241 @@
+package browse
+
+import (
+	"context"
+	"encoding/json"
+	"os"
+	"slices"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/chromedp/chromedp"
+	"sketch.dev/llm"
+)
+
+func TestToolCreation(t *testing.T) {
+	// Create browser tools instance
+	tools := NewBrowseTools(context.Background())
+
+	// Test each tool has correct name and description
+	toolTests := []struct {
+		tool          *llm.Tool
+		expectedName  string
+		shortDesc     string
+		requiredProps []string
+	}{
+		{tools.NewNavigateTool(), "browser_navigate", "Navigate", []string{"url"}},
+		{tools.NewClickTool(), "browser_click", "Click", []string{"selector"}},
+		{tools.NewTypeTool(), "browser_type", "Type", []string{"selector", "text"}},
+		{tools.NewWaitForTool(), "browser_wait_for", "Wait", []string{"selector"}},
+		{tools.NewGetTextTool(), "browser_get_text", "Get", []string{"selector"}},
+		{tools.NewEvalTool(), "browser_eval", "Evaluate", []string{"expression"}},
+		{tools.NewScreenshotTool(), "browser_screenshot", "Take", nil},
+		{tools.NewScrollIntoViewTool(), "browser_scroll_into_view", "Scroll", []string{"selector"}},
+	}
+
+	for _, tt := range toolTests {
+		t.Run(tt.expectedName, func(t *testing.T) {
+			if tt.tool.Name != tt.expectedName {
+				t.Errorf("expected name %q, got %q", tt.expectedName, tt.tool.Name)
+			}
+
+			if !strings.Contains(tt.tool.Description, tt.shortDesc) {
+				t.Errorf("description %q should contain %q", tt.tool.Description, tt.shortDesc)
+			}
+
+			// Verify schema has required properties
+			if len(tt.requiredProps) > 0 {
+				var schema struct {
+					Required []string `json:"required"`
+				}
+				if err := json.Unmarshal(tt.tool.InputSchema, &schema); err != nil {
+					t.Fatalf("failed to unmarshal schema: %v", err)
+				}
+
+				for _, prop := range tt.requiredProps {
+					if !slices.Contains(schema.Required, prop) {
+						t.Errorf("property %q should be required", prop)
+					}
+				}
+			}
+		})
+	}
+}
+
+func TestGetAllTools(t *testing.T) {
+	// Create browser tools instance
+	tools := NewBrowseTools(context.Background())
+
+	// Get all tools
+	allTools := tools.GetAllTools()
+
+	// We should have 8 tools
+	if len(allTools) != 8 {
+		t.Errorf("expected 8 tools, got %d", len(allTools))
+	}
+
+	// Check that each tool has the expected name prefix
+	for _, tool := range allTools {
+		if !strings.HasPrefix(tool.Name, "browser_") {
+			t.Errorf("tool name %q does not have prefix 'browser_'", tool.Name)
+		}
+	}
+}
+
+// TestBrowserInitialization verifies that the browser can start correctly
+func TestBrowserInitialization(t *testing.T) {
+	// Skip long tests in short mode
+	if testing.Short() {
+		t.Skip("skipping browser initialization test in short mode")
+	}
+
+	// Create browser tools instance
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
+
+	tools := NewBrowseTools(ctx)
+
+	// Initialize the browser
+	err := tools.Initialize()
+	if err != nil {
+		// If browser automation is not available, skip the test
+		if strings.Contains(err.Error(), "browser automation not available") {
+			t.Skip("Browser automation not available in this environment")
+		} else {
+			t.Fatalf("Failed to initialize browser: %v", err)
+		}
+	}
+
+	// Clean up
+	defer tools.Close()
+
+	// Get browser context to verify it's working
+	browserCtx, err := tools.GetBrowserContext()
+	if err != nil {
+		t.Fatalf("Failed to get browser context: %v", err)
+	}
+
+	// Try to navigate to a simple page
+	var title string
+	err = chromedp.Run(browserCtx,
+		chromedp.Navigate("about:blank"),
+		chromedp.Title(&title),
+	)
+	if err != nil {
+		t.Fatalf("Failed to navigate to about:blank: %v", err)
+	}
+
+	t.Logf("Successfully navigated to about:blank, title: %q", title)
+}
+
+// TestNavigateTool verifies that the navigate tool works correctly
+func TestNavigateTool(t *testing.T) {
+	// Skip long tests in short mode
+	if testing.Short() {
+		t.Skip("skipping navigate tool test in short mode")
+	}
+
+	// Create browser tools instance
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
+
+	tools := NewBrowseTools(ctx)
+	defer tools.Close()
+
+	// Check if browser initialization works
+	if err := tools.Initialize(); err != nil {
+		if strings.Contains(err.Error(), "browser automation not available") {
+			t.Skip("Browser automation not available in this environment")
+		}
+	}
+
+	// Get the navigate tool
+	navTool := tools.NewNavigateTool()
+
+	// Create input for the navigate tool
+	input := map[string]string{"url": "https://example.com"}
+	inputJSON, _ := json.Marshal(input)
+
+	// Call the tool
+	result, err := navTool.Run(ctx, json.RawMessage(inputJSON))
+	if err != nil {
+		t.Fatalf("Error running navigate tool: %v", err)
+	}
+
+	// Verify the response is successful
+	var response struct {
+		Status string `json:"status"`
+		Error  string `json:"error,omitempty"`
+	}
+
+	if err := json.Unmarshal([]byte(result), &response); err != nil {
+		t.Fatalf("Error unmarshaling response: %v", err)
+	}
+
+	if response.Status != "success" {
+		// If browser automation is not available, skip the test
+		if strings.Contains(response.Error, "browser automation not available") {
+			t.Skip("Browser automation not available in this environment")
+		} else {
+			t.Errorf("Expected status 'success', got '%s' with error: %s", response.Status, response.Error)
+		}
+	}
+
+	// Try to get the page title to verify the navigation worked
+	browserCtx, err := tools.GetBrowserContext()
+	if err != nil {
+		// If browser automation is not available, skip the test
+		if strings.Contains(err.Error(), "browser automation not available") {
+			t.Skip("Browser automation not available in this environment")
+		} else {
+			t.Fatalf("Failed to get browser context: %v", err)
+		}
+	}
+
+	var title string
+	err = chromedp.Run(browserCtx, chromedp.Title(&title))
+	if err != nil {
+		t.Fatalf("Failed to get page title: %v", err)
+	}
+
+	t.Logf("Successfully navigated to example.com, title: %q", title)
+	if title != "Example Domain" {
+		t.Errorf("Expected title 'Example Domain', got '%s'", title)
+	}
+}
+
+// TestScreenshotTool tests that the screenshot tool properly saves files
+func TestScreenshotTool(t *testing.T) {
+	// Create browser tools instance
+	ctx := context.Background()
+	tools := NewBrowseTools(ctx)
+
+	// Test SaveScreenshot function directly
+	testData := []byte("test image data")
+	id := tools.SaveScreenshot(testData)
+	if id == "" {
+		t.Fatal("SaveScreenshot returned empty ID")
+	}
+
+	// Get the file path and check if the file exists
+	filePath := GetScreenshotPath(id)
+	_, err := os.Stat(filePath)
+	if err != nil {
+		t.Fatalf("Failed to find screenshot file: %v", err)
+	}
+
+	// Read the file contents
+	contents, err := os.ReadFile(filePath)
+	if err != nil {
+		t.Fatalf("Failed to read screenshot file: %v", err)
+	}
+
+	// Check the file contents
+	if string(contents) != string(testData) {
+		t.Errorf("File contents don't match: expected %q, got %q", string(testData), string(contents))
+	}
+
+	// Clean up the test file
+	os.Remove(filePath)
+}

diff --git a/claudetool/browse/register.go b/claudetool/browse/register.go
new file mode 100644
index 0000000..a540c8f
--- /dev/null
+++ b/claudetool/browse/register.go

@@ -0,0 +1,28 @@
+package browse
+
+import (
+	"context"
+	"log"
+
+	"sketch.dev/llm"
+)
+
+// RegisterBrowserTools initializes the browser tools and returns all the tools
+// ready to be added to an agent. It also returns a cleanup function that should
+// be called when done to properly close the browser.
+func RegisterBrowserTools(ctx context.Context) ([]*llm.Tool, func()) {
+	browserTools := NewBrowseTools(ctx)
+
+	// Initialize the browser
+	if err := browserTools.Initialize(); err != nil {
+		log.Printf("Warning: Failed to initialize browser: %v", err)
+	}
+
+	// Return all tools and a cleanup function
+	return browserTools.GetAllTools(), func() {
+		browserTools.Close()
+	}
+}
+
+// Tool is an alias for llm.Tool to make the documentation clearer
+type Tool = llm.Tool
commit	33d282f80db786cc60ba521a38ed5166f23239ed	[log] [tgz]
author	Philip Zeyliger <philip@bold.dev>	Sat May 03 04:01:54 2025 +0000
committer	Philip Zeyliger <philip@bold.dev>	Tue May 06 10:23:39 2025 -0700
tree	9ed1f15c6d3081d5bef7d16b9d72e78a2c7780cf
parent	a9d87aa69cfefdc91ec7aaa6bc42907749748e76 [diff]