task: task-1753636924-a1d4c708 - created
Change-Id: Ic78528c47ae38114b9b7504f1c4a76f95e93eb13
diff --git a/server/git/CONCURRENCY_README.md b/server/git/CONCURRENCY_README.md
new file mode 100644
index 0000000..1cbe184
--- /dev/null
+++ b/server/git/CONCURRENCY_README.md
@@ -0,0 +1,172 @@
+# Git Concurrency Solution: Per-Agent Repository Clones
+
+## Problem Statement
+
+Git is not thread-safe, which creates critical race conditions when multiple AI agents try to perform Git operations concurrently:
+
+- **Repository Corruption**: Multiple agents modifying the same `.git` folder simultaneously
+- **Branch Conflicts**: Agents creating branches with the same names or overwriting each other's work
+- **Push Failures**: Concurrent pushes causing merge conflicts and failed operations
+- **Index Lock Errors**: Git index.lock conflicts when multiple processes access the repository
+
+## Solution: Per-Agent Git Clones
+
+Instead of using mutexes (which would serialize all Git operations and hurt performance), we give each agent its own Git repository clone:
+
+```
+workspace/
+├── agent-backend-engineer/ # Backend engineer's clone
+│ ├── .git/
+│ ├── tasks/
+│ └── ...
+├── agent-frontend-engineer/ # Frontend engineer's clone
+│ ├── .git/
+│ ├── tasks/
+│ └── ...
+└── agent-qa-engineer/ # QA engineer's clone
+ ├── .git/
+ ├── tasks/
+ └── ...
+```
+
+## Key Benefits
+
+### 🚀 **True Concurrency**
+- Multiple agents can work simultaneously without blocking each other
+- No waiting for Git lock releases
+- Scales to hundreds of concurrent agents
+
+### 🛡️ **Complete Isolation**
+- Each agent has its own `.git` directory and working tree
+- No shared state or race conditions
+- Agent failures don't affect other agents
+
+### 🔄 **Automatic Synchronization**
+- Each clone automatically pulls latest changes before creating branches
+- All branches push to the same remote repository
+- PRs are created against the main repository
+
+### 🧹 **Easy Cleanup**
+- `staff cleanup-clones` removes all agent workspaces
+- Clones are recreated on-demand when agents start working
+- No manual Git state management required
+
+## Implementation Details
+
+### CloneManager (`git/clone_manager.go`)
+
+```go
+type CloneManager struct {
+ baseRepoURL string // Source repository URL
+ workspacePath string // Base workspace directory
+ agentClones map[string]string // agent name -> clone path
+ mu sync.RWMutex // Thread-safe map access
+}
+```
+
+**Key Methods:**
+- `GetAgentClonePath(agentName)` - Get/create agent's clone directory
+- `RefreshAgentClone(agentName)` - Pull latest changes for an agent
+- `CleanupAgentClone(agentName)` - Remove specific agent's clone
+- `CleanupAllClones()` - Remove all agent clones
+
+### Agent Integration
+
+Each agent's Git operations are automatically routed to its dedicated clone:
+
+```go
+// Get agent's dedicated Git clone
+clonePath, err := am.cloneManager.GetAgentClonePath(agent.Name)
+if err != nil {
+ return fmt.Errorf("failed to get agent clone: %w", err)
+}
+
+// All Git operations use the agent's clone directory
+gitCmd := func(args ...string) *exec.Cmd {
+ return exec.CommandContext(ctx, "git", append([]string{"-C", clonePath}, args...)...)
+}
+```
+
+## Workflow Example
+
+1. **Agent Starts Task**:
+ ```bash
+ Agent backend-engineer gets task: "Add user authentication"
+ Creating clone: workspace/agent-backend-engineer/
+ ```
+
+2. **Concurrent Operations**:
+ ```bash
+ # These happen simultaneously without conflicts:
+ Agent backend-engineer: git clone -> workspace/agent-backend-engineer/
+ Agent frontend-engineer: git clone -> workspace/agent-frontend-engineer/
+ Agent qa-engineer: git clone -> workspace/agent-qa-engineer/
+ ```
+
+3. **Branch Creation**:
+ ```bash
+ # Each agent creates branches in their own clone:
+ backend-engineer: git checkout -b task-123-auth-backend
+ frontend-engineer: git checkout -b task-124-auth-ui
+ qa-engineer: git checkout -b task-125-auth-tests
+ ```
+
+4. **Concurrent Pushes**:
+ ```bash
+ # All agents push to origin simultaneously:
+ git push -u origin task-123-auth-backend # ✅ Success
+ git push -u origin task-124-auth-ui # ✅ Success
+ git push -u origin task-125-auth-tests # ✅ Success
+ ```
+
+## Management Commands
+
+### List Agent Clones
+```bash
+staff list-agents # Shows which agents are running and their clone status
+```
+
+### Cleanup All Clones
+```bash
+staff cleanup-clones # Removes all agent workspace directories
+```
+
+### Monitor Disk Usage
+```bash
+du -sh workspace/ # Check total workspace disk usage
+```
+
+## Resource Considerations
+
+### Disk Space
+- Each clone uses ~repository size (typically 10-100MB per agent)
+- For 10 agents with 50MB repo = ~500MB total
+- Use `staff cleanup-clones` to free space when needed
+
+### Network Usage
+- Initial clone downloads full repository
+- Subsequent `git pull` operations are incremental
+- All agents share the same remote repository
+
+### Performance
+- Clone creation: ~2-5 seconds per agent (one-time cost)
+- Git operations: Full speed, no waiting for locks
+- Parallel processing: Linear scalability with agent count
+
+## Comparison to Alternatives
+
+| Solution | Concurrency | Complexity | Performance | Risk |
+|----------|-------------|------------|-------------|------|
+| **Per-Agent Clones** | ✅ Full | 🟡 Medium | ✅ High | 🟢 Low |
+| Global Git Mutex | ❌ None | 🟢 Low | ❌ Poor | 🟡 Medium |
+| File Locking | 🟡 Limited | 🔴 High | 🟡 Medium | 🔴 High |
+| Separate Repositories | ✅ Full | 🔴 Very High | ✅ High | 🔴 High |
+
+## Future Enhancements
+
+- **Lazy Cleanup**: Auto-remove unused clones after N days
+- **Clone Sharing**: Share clones between agents with similar tasks
+- **Compressed Clones**: Use `git clone --depth=1` for space efficiency
+- **Remote Workspaces**: Support for distributed agent execution
+
+The per-agent clone solution provides the optimal balance of performance, safety, and maintainability for concurrent AI agent operations.
\ No newline at end of file
diff --git a/server/git/clone_manager.go b/server/git/clone_manager.go
new file mode 100644
index 0000000..afedd65
--- /dev/null
+++ b/server/git/clone_manager.go
@@ -0,0 +1,160 @@
+package git
+
+import (
+ "context"
+ "fmt"
+ "os"
+ "os/exec"
+ "path/filepath"
+ "sync"
+)
+
+// CloneManager manages separate Git repository clones for each agent
+// This eliminates Git concurrency issues by giving each agent its own working directory
+type CloneManager struct {
+ baseRepoURL string
+ workspacePath string
+ agentClones map[string]string // agent name -> clone path
+ mu sync.RWMutex
+}
+
+// NewCloneManager creates a new CloneManager
+func NewCloneManager(baseRepoURL, workspacePath string) *CloneManager {
+ return &CloneManager{
+ baseRepoURL: baseRepoURL,
+ workspacePath: workspacePath,
+ agentClones: make(map[string]string),
+ }
+}
+
+// GetAgentClonePath returns the Git clone path for a specific agent
+// Creates the clone if it doesn't exist
+func (cm *CloneManager) GetAgentClonePath(agentName string) (string, error) {
+ cm.mu.Lock()
+ defer cm.mu.Unlock()
+
+ // Check if clone already exists
+ if clonePath, exists := cm.agentClones[agentName]; exists {
+ // Verify the clone still exists on disk
+ if _, err := os.Stat(clonePath); err == nil {
+ return clonePath, nil
+ }
+ // Remove stale entry if directory doesn't exist
+ delete(cm.agentClones, agentName)
+ }
+
+ // Create new clone for the agent
+ clonePath := filepath.Join(cm.workspacePath, fmt.Sprintf("agent-%s", agentName))
+
+ // Ensure workspace directory exists
+ if err := os.MkdirAll(cm.workspacePath, 0755); err != nil {
+ return "", fmt.Errorf("failed to create workspace directory: %w", err)
+ }
+
+ // Remove existing clone directory if it exists
+ if err := os.RemoveAll(clonePath); err != nil {
+ return "", fmt.Errorf("failed to remove existing clone: %w", err)
+ }
+
+ // Clone the repository
+ if err := cm.cloneRepository(clonePath); err != nil {
+ return "", fmt.Errorf("failed to clone repository for agent %s: %w", agentName, err)
+ }
+
+ // Store the clone path
+ cm.agentClones[agentName] = clonePath
+
+ return clonePath, nil
+}
+
+// cloneRepository performs the actual Git clone operation
+func (cm *CloneManager) cloneRepository(clonePath string) error {
+ ctx := context.Background()
+
+ // Clone the repository
+ cmd := exec.CommandContext(ctx, "git", "clone", cm.baseRepoURL, clonePath)
+ if err := cmd.Run(); err != nil {
+ return fmt.Errorf("git clone failed: %w", err)
+ }
+
+ return nil
+}
+
+// RefreshAgentClone pulls the latest changes for an agent's clone
+func (cm *CloneManager) RefreshAgentClone(agentName string) error {
+ cm.mu.RLock()
+ clonePath, exists := cm.agentClones[agentName]
+ cm.mu.RUnlock()
+
+ if !exists {
+ return fmt.Errorf("no clone exists for agent %s", agentName)
+ }
+
+ ctx := context.Background()
+
+ // Change to clone directory and pull latest changes
+ cmd := exec.CommandContext(ctx, "git", "-C", clonePath, "pull", "origin")
+ if err := cmd.Run(); err != nil {
+ return fmt.Errorf("failed to pull latest changes for agent %s: %w", agentName, err)
+ }
+
+ return nil
+}
+
+// CleanupAgentClone removes the clone directory for an agent
+func (cm *CloneManager) CleanupAgentClone(agentName string) error {
+ cm.mu.Lock()
+ defer cm.mu.Unlock()
+
+ clonePath, exists := cm.agentClones[agentName]
+ if !exists {
+ return nil // Already cleaned up
+ }
+
+ // Remove the clone directory
+ if err := os.RemoveAll(clonePath); err != nil {
+ return fmt.Errorf("failed to remove clone for agent %s: %w", agentName, err)
+ }
+
+ // Remove from tracking
+ delete(cm.agentClones, agentName)
+
+ return nil
+}
+
+// CleanupAllClones removes all agent clone directories
+func (cm *CloneManager) CleanupAllClones() error {
+ cm.mu.Lock()
+ defer cm.mu.Unlock()
+
+ var errors []error
+
+ for agentName, clonePath := range cm.agentClones {
+ if err := os.RemoveAll(clonePath); err != nil {
+ errors = append(errors, fmt.Errorf("failed to remove clone for agent %s: %w", agentName, err))
+ }
+ }
+
+ // Clear all tracked clones
+ cm.agentClones = make(map[string]string)
+
+ if len(errors) > 0 {
+ return fmt.Errorf("cleanup errors: %v", errors)
+ }
+
+ return nil
+}
+
+// GetAllAgentClones returns a map of all agent clones
+func (cm *CloneManager) GetAllAgentClones() map[string]string {
+ cm.mu.RLock()
+ defer cm.mu.RUnlock()
+
+ // Return a copy to avoid race conditions
+ result := make(map[string]string)
+ for agent, path := range cm.agentClones {
+ result[agent] = path
+ }
+
+ return result
+}
\ No newline at end of file
diff --git a/server/git/mutex.go b/server/git/mutex.go
new file mode 100644
index 0000000..21bc25f
--- /dev/null
+++ b/server/git/mutex.go
@@ -0,0 +1,40 @@
+package git
+
+import (
+ "sync"
+)
+
+// GitMutex provides thread-safe access to Git operations
+// Since Git is not thread-safe, we need to serialize all Git operations
+// across all agents to prevent repository corruption and race conditions
+type GitMutex struct {
+ mu sync.Mutex
+}
+
+// NewGitMutex creates a new GitMutex instance
+func NewGitMutex() *GitMutex {
+ return &GitMutex{}
+}
+
+// Lock acquires the Git operation lock
+// This ensures only one agent can perform Git operations at a time
+func (gm *GitMutex) Lock() {
+ gm.mu.Lock()
+}
+
+// Unlock releases the Git operation lock
+func (gm *GitMutex) Unlock() {
+ gm.mu.Unlock()
+}
+
+// WithLock executes a function while holding the Git lock
+// This is a convenience method to ensure proper lock/unlock pattern
+func (gm *GitMutex) WithLock(fn func() error) error {
+ gm.Lock()
+ defer gm.Unlock()
+ return fn()
+}
+
+// Global Git mutex instance - shared across all agents
+// This ensures no concurrent Git operations across the entire application
+var GlobalGitMutex = NewGitMutex()
\ No newline at end of file