browser: rename browser_read_image to read_image and auto-send screenshots to LLM

Rename browser_read_image tool to read_image and modify browser_take_screenshot
to automatically send image content to the LLM instead of requiring a separate
read_image tool call, streamlining the screenshot workflow.

Problem Analysis:
The current browser screenshot workflow required two separate tool calls:
1. browser_take_screenshot - saves screenshot and returns file path
2. browser_read_image - reads saved screenshot and sends to LLM

This two-step process was inefficient and created unnecessary round trips.
Additionally, browser_read_image was specific to browser automation but
the functionality of reading and encoding images is more general purpose.

Implementation Changes:

1. Screenshot Tool Behavior (claudetool/browse/browse.go):
   - Modified browser_take_screenshot to automatically return image content
   - Removed screenshotOutput struct as ID-only response no longer needed
   - Added base64 encoding of screenshot data directly in screenshotRun
   - Returns []llm.Content with both text description and image data
   - Still saves screenshot file for potential future reference
   - Uses same image encoding format as existing read_image tool

2. Tool Rename (claudetool/browse/browse.go):
   - Renamed browser_read_image tool to read_image
   - Updated tool name in NewReadImageTool from 'browser_read_image' to 'read_image'
   - Maintained all existing functionality and input/output format
   - Tool description and schema remain unchanged

3. UI Updates (termui/termui.go):
   - Updated template condition from 'browser_read_image' to 'read_image'
   - Maintains existing emoji and display format for read_image tool calls

4. WebUI Updates (webui/src/web-components/):
   - Updated sketch-tool-calls.ts to reference 'read_image' instead of 'browser_read_image'
   - Renamed sketch-tool-card-browser-read-image.ts to sketch-tool-card-read-image.ts
   - Updated component class name from SketchToolCardBrowserReadImage to SketchToolCardReadImage
   - Updated custom element name from 'sketch-tool-card-browser-read-image' to 'sketch-tool-card-read-image'
   - Updated import statement to reference new component file name
   - Removed old component file and updated TypeScript declarations

5. Test Updates (claudetool/browse/browse_test.go):
   - Modified TestGetTools to allow read_image tool without 'browser_' prefix
   - Added special case handling for read_image in tool naming convention check
   - All existing tests continue to pass with updated tool name

Technical Details:
- Screenshot auto-send uses same base64 encoding as existing read_image tool
- Content structure matches browser_read_image output format for consistency
- File saving still occurs for potential debugging or future reference
- Error handling preserves existing behavior with proper fallbacks
- Tool count remains the same (12 tools with screenshots, 10 without)

Benefits:
- Eliminates need for two-step screenshot workflow
- Reduces round trips and simplifies user experience
- More intuitive tool naming (read_image is general purpose)
- Maintains full backward compatibility for read_image functionality
- Consistent image encoding across all browser tools
- Automatic screenshot viewing improves debugging and validation workflows

Testing:
- All existing browser tool tests pass with updated expectations
- TestReadImageTool verifies renamed tool functionality
- Tool naming convention test updated to handle read_image exception
- TypeScript compilation successful with no type errors
- Web component functionality preserved across rename

This enhancement streamlines screenshot workflows while maintaining the
general-purpose read_image tool for reading arbitrary image files, creating
a more efficient and intuitive browser automation experience.

Co-Authored-By: sketch <hello@sketch.dev>
Change-ID: se3e81f997f30f01ek
7 files changed
tree: ea1a0743849495ca2489c6363d2dc689dd0a56a7
  1. .github/
  2. .vscode/
  3. bin/
  4. browser/
  5. claudetool/
  6. cmd/
  7. dockerimg/
  8. experiment/
  9. git_tools/
  10. httprr/
  11. llm/
  12. loop/
  13. skabandclient/
  14. skribe/
  15. termui/
  16. test/
  17. webui/
  18. .clabot
  19. .dockerignore
  20. .gitignore
  21. CONTRIBUTING.md
  22. dear_llm.md
  23. go.mod
  24. go.sum
  25. LICENSE
  26. README.md
README.md

Sketch

Go Reference Discord GitHub Workflow Status License

Sketch is an agentic coding tool. It draws the 🦉

🚀 Overview

Sketch runs in your terminal, has a web UI, understands your code, and helps you get work done. To keep your environment pristine, sketch starts a docker container and outputs its work onto a branch in your host git repository.

Sketch helps with most programming environments, but Sketch has extra goodies for Go.

📋 Quick Start

go install sketch.dev/cmd/sketch@latest
sketch

🔧 Requirements

Currently, Sketch runs on macOS and Linux. It uses Docker for containers.

PlatformInstallation
macOSbrew install colima (or Docker Desktop/Orbstack)
Linuxapt install docker.io (or equivalent for your distro)
WSL2Install Docker Desktop for Windows (docker entirely inside WSL2 is tricky)

The sketch.dev service is used to provide access to an LLM service and give you a way to access the web UI from anywhere.

🤝 Community & Feedback

📖 User Guide

Getting Started

Start Sketch by running sketch in a Git repository. It will open your browser to the Sketch chat interface, but you can also use the CLI interface. Use -open=false if you want to use just the CLI interface.

Ask Sketch about your codebase or ask it to implement a feature. It may take a little while for Sketch to do its work, so hit the bell (🔔) icon to enable browser notifications. We won't spam you or anything; it will notify you when the Sketch agent's turn is done, and there's something to look at.

How Sketch Works

When you start Sketch, it:

  1. Creates a Dockerfile
  2. Builds it
  3. Copies your repository into it
  4. Starts a Docker container with the "inside" Sketch running

This design lets you run multiple sketches in parallel since they each have their own sandbox. It also lets Sketch work without worry: it can trash its own container, but it can't trash your machine.

Sketch's agentic loop uses tool calls (mostly shell commands, but also a handful of other important tools) to allow the LLM to interact with your codebase.

Getting Your Git Changes Out

Sketch is trained to make Git commits. When those happen, they are automatically pushed to the git repository where you started sketch with branch names sketch/*.

Finding Sketch branches:

git branch -a --sort=creatordate | grep sketch/ | tail

The UI keeps track of the latest branch it pushed and displays it prominently. You can use standard Git workflows to pull those branches into your workspace:

git cherry-pick $(git merge-base origin/main sketch/foo)

or merge the branch

git merge sketch/foo

or reset to the branch

git reset --hard sketch/foo

Ie use the same workflows you would if you were pulling in a friend's Pull Request.

Advanced: You can ask Sketch to git fetch sketch-host and rebase onto another commit. This will also fetch where you started Sketch, and we do a bit of "git fetch refspec configuration" to make origin/main work as a git reference.

Don't be afraid of asking Sketch to help you rebase, merge/squash commits, rewrite commit messages, and so forth; it's good at it!

Reviewing Diffs

The diff view shows you changes since Sketch started. Leaving comments on lines adds them to the chat box, and, when you hit Send (at the bottom of the page), Sketch goes to work addressing your comments.

Connecting to Sketch's Container

You can interact directly with the container in three ways:

  1. Web UI Terminal: Use the "Terminal" tab in the UI
  2. SSH: Look at the startup logs or click the information icon to see a command like ssh sketch-ilik-eske-tcha-lott. We have automatically configured your SSH configuration to make these special hostnames work.
  3. Visual Studio Code: Look for a command line or magic link behind the information icon, or when Sketch starts up. This starts a new VSCode session "remoted into" the container. You can edit the code, use the terminal, review diffs, and so forth.

Using SSH (and/or VSCode) allows you to forward ports from the container to your machine. For example, if you want to start your development webserver, you can do something like this:

# Forward container port 8888 to local port 8000
ssh -L8000:localhost:8888 sketch-ilik-epor-tfor-ward go run ./cmd/server

This makes http://localhost:8000/ on your machine point to localhost:8888 inside the container.

Using Browser Tools

You can ask Sketch to browse a web page and take screenshots. There are tools both for taking screenshots and "reading images", the latter of which sends the image to the LLM. This functionality is handy if you're working on a web page and want to see what the in-progress change looks like.

❓ FAQ

"No space left on device"

Docker images, containers, and so forth tend to pile up. Ask Docker to prune unused images and containers:

docker system prune -a

🛠️ Development

Go Reference

See CONTRIBUTING.md for development guidelines.

📄 Open Source

Sketch is open source. It is right here in this repository! Have a look around and mod away.

If you want to run Sketch entirely without the sketch.dev service, you can set the flag -skaband-addr="" and then provide an ANTHROPIC_API_KEY environment variable. (More LLM services coming soon!)