browser: rename browser_read_image to read_image and auto-send screenshots to LLM
Rename browser_read_image tool to read_image and modify browser_take_screenshot
to automatically send image content to the LLM instead of requiring a separate
read_image tool call, streamlining the screenshot workflow.
Problem Analysis:
The current browser screenshot workflow required two separate tool calls:
1. browser_take_screenshot - saves screenshot and returns file path
2. browser_read_image - reads saved screenshot and sends to LLM
This two-step process was inefficient and created unnecessary round trips.
Additionally, browser_read_image was specific to browser automation but
the functionality of reading and encoding images is more general purpose.
Implementation Changes:
1. Screenshot Tool Behavior (claudetool/browse/browse.go):
- Modified browser_take_screenshot to automatically return image content
- Removed screenshotOutput struct as ID-only response no longer needed
- Added base64 encoding of screenshot data directly in screenshotRun
- Returns []llm.Content with both text description and image data
- Still saves screenshot file for potential future reference
- Uses same image encoding format as existing read_image tool
2. Tool Rename (claudetool/browse/browse.go):
- Renamed browser_read_image tool to read_image
- Updated tool name in NewReadImageTool from 'browser_read_image' to 'read_image'
- Maintained all existing functionality and input/output format
- Tool description and schema remain unchanged
3. UI Updates (termui/termui.go):
- Updated template condition from 'browser_read_image' to 'read_image'
- Maintains existing emoji and display format for read_image tool calls
4. WebUI Updates (webui/src/web-components/):
- Updated sketch-tool-calls.ts to reference 'read_image' instead of 'browser_read_image'
- Renamed sketch-tool-card-browser-read-image.ts to sketch-tool-card-read-image.ts
- Updated component class name from SketchToolCardBrowserReadImage to SketchToolCardReadImage
- Updated custom element name from 'sketch-tool-card-browser-read-image' to 'sketch-tool-card-read-image'
- Updated import statement to reference new component file name
- Removed old component file and updated TypeScript declarations
5. Test Updates (claudetool/browse/browse_test.go):
- Modified TestGetTools to allow read_image tool without 'browser_' prefix
- Added special case handling for read_image in tool naming convention check
- All existing tests continue to pass with updated tool name
Technical Details:
- Screenshot auto-send uses same base64 encoding as existing read_image tool
- Content structure matches browser_read_image output format for consistency
- File saving still occurs for potential debugging or future reference
- Error handling preserves existing behavior with proper fallbacks
- Tool count remains the same (12 tools with screenshots, 10 without)
Benefits:
- Eliminates need for two-step screenshot workflow
- Reduces round trips and simplifies user experience
- More intuitive tool naming (read_image is general purpose)
- Maintains full backward compatibility for read_image functionality
- Consistent image encoding across all browser tools
- Automatic screenshot viewing improves debugging and validation workflows
Testing:
- All existing browser tool tests pass with updated expectations
- TestReadImageTool verifies renamed tool functionality
- Tool naming convention test updated to handle read_image exception
- TypeScript compilation successful with no type errors
- Web component functionality preserved across rename
This enhancement streamlines screenshot workflows while maintaining the
general-purpose read_image tool for reading arbitrary image files, creating
a more efficient and intuitive browser automation experience.
Co-Authored-By: sketch <hello@sketch.dev>
Change-ID: se3e81f997f30f01ek
diff --git a/webui/src/web-components/sketch-tool-card-read-image.ts b/webui/src/web-components/sketch-tool-card-read-image.ts
new file mode 100644
index 0000000..ba9a5af
--- /dev/null
+++ b/webui/src/web-components/sketch-tool-card-read-image.ts
@@ -0,0 +1,65 @@
+import { css, html, LitElement } from "lit";
+import { customElement, property } from "lit/decorators.js";
+import { ToolCall } from "../types";
+
+@customElement("sketch-tool-card-read-image")
+export class SketchToolCardReadImage extends LitElement {
+ @property()
+ toolCall: ToolCall;
+
+ @property()
+ open: boolean;
+
+ static styles = css`
+ .summary-text {
+ font-family: monospace;
+ color: #444;
+ word-break: break-all;
+ }
+
+ .path-input {
+ font-family: monospace;
+ background: rgba(0, 0, 0, 0.05);
+ padding: 4px 8px;
+ border-radius: 4px;
+ display: inline-block;
+ word-break: break-all;
+ }
+ `;
+
+ render() {
+ // Parse the input to get path
+ let path = "";
+ try {
+ if (this.toolCall?.input) {
+ const input = JSON.parse(this.toolCall.input);
+ path = input.path || "";
+ }
+ } catch (e) {
+ console.error("Error parsing read image input:", e);
+ }
+
+ // Show just the filename in summary
+ const filename = path.split("/").pop() || path;
+
+ return html`
+ <sketch-tool-card .open=${this.open} .toolCall=${this.toolCall}>
+ <span slot="summary" class="summary-text"> 🖼️ ${filename} </span>
+ <div slot="input">
+ <div>Read image: <span class="path-input">${path}</span></div>
+ </div>
+ <div slot="result">
+ ${this.toolCall?.result_message?.tool_result
+ ? html`<pre>${this.toolCall.result_message.tool_result}</pre>`
+ : ""}
+ </div>
+ </sketch-tool-card>
+ `;
+ }
+}
+
+declare global {
+ interface HTMLElementTagNameMap {
+ "sketch-tool-card-read-image": SketchToolCardReadImage;
+ }
+}