Add browse tool support.
I reviewed some MCPs (using OpenAI's deep research to help), and it
helped me choose chromedp as the relevant library and helped me come up
with an interface. This commit adds chrome to the Docker image which is
kind of big. (I've noticed that it's smaller on Ubuntu, where it doesn't
pull in X11.) go-playwright was a library contender as well.
Implement browser automation tooling using chromedp
This implementation adds browser automation capabilities to the system via the chromedp library,
enabling Claude to interact with web content effectively.
Key features include:
1. Core browser automation functionality:
- Created new browsertools package in claudetool/browser
- Implemented tools for navigating, clicking, typing, waiting for elements,
getting text, evaluating JavaScript, taking screenshots, and scrolling
- Added lazy browser initialization that defers until first use
- Integrated with the agent to expose these tools to Claude
2. Screenshot handling and display:
- Implemented screenshot storage with UUID-based IDs in /tmp/sketch-screenshots
- Added endpoint to serve screenshots via /screenshot/{id}
- Created dedicated UI component for displaying screenshots
- Ensured proper responsive design with loading states and error handling
- Fixed URL paths for proper rehomed URL support
- Modified tool calls component to auto-expand screenshot results
3. Error handling and reliability:
- Added graceful error handling for browser initialization failures
- Implemented proper cleanup of browser resources
The browser automation tools provide a powerful way for Claude to interact with web content,
making it possible to scrape data, test web applications, and automate web-based tasks.
Co-Authored-By: sketch <hello@sketch.dev>
diff --git a/loop/server/loophttp.go b/loop/server/loophttp.go
index 56fbdec..fa253e0 100644
--- a/loop/server/loophttp.go
+++ b/loop/server/loophttp.go
@@ -23,6 +23,7 @@
"sketch.dev/loop/server/gzhandler"
"github.com/creack/pty"
+ "sketch.dev/claudetool/browse"
"sketch.dev/llm/conversation"
"sketch.dev/loop"
"sketch.dev/webui"
@@ -522,6 +523,43 @@
json.NewEncoder(w).Encode(map[string]string{"prompt": suggestedPrompt})
})
+ // Handler for /screenshot/{id} - serves screenshot images
+ s.mux.HandleFunc("/screenshot/", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodGet {
+ http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
+ return
+ }
+
+ // Extract the screenshot ID from the path
+ pathParts := strings.Split(r.URL.Path, "/")
+ if len(pathParts) < 3 {
+ http.Error(w, "Invalid screenshot ID", http.StatusBadRequest)
+ return
+ }
+
+ screenshotID := pathParts[2]
+
+ // Validate the ID format (prevent directory traversal)
+ if strings.Contains(screenshotID, "/") || strings.Contains(screenshotID, "\\") {
+ http.Error(w, "Invalid screenshot ID format", http.StatusBadRequest)
+ return
+ }
+
+ // Get the screenshot file path
+ filePath := browse.GetScreenshotPath(screenshotID)
+
+ // Check if the file exists
+ if _, err := os.Stat(filePath); os.IsNotExist(err) {
+ http.Error(w, "Screenshot not found", http.StatusNotFound)
+ return
+ }
+
+ // Serve the file
+ w.Header().Set("Content-Type", "image/png")
+ w.Header().Set("Cache-Control", "max-age=3600") // Cache for an hour
+ http.ServeFile(w, r, filePath)
+ })
+
// Handler for POST /chat
s.mux.HandleFunc("/chat", func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {