task: task-1753636924-a1d4c708 - created

Change-Id: Ic78528c47ae38114b9b7504f1c4a76f95e93eb13
diff --git a/server/git/CONCURRENCY_README.md b/server/git/CONCURRENCY_README.md
new file mode 100644
index 0000000..1cbe184
--- /dev/null
+++ b/server/git/CONCURRENCY_README.md
@@ -0,0 +1,172 @@
+# Git Concurrency Solution: Per-Agent Repository Clones
+
+## Problem Statement
+
+Git is not thread-safe, which creates critical race conditions when multiple AI agents try to perform Git operations concurrently:
+
+- **Repository Corruption**: Multiple agents modifying the same `.git` folder simultaneously
+- **Branch Conflicts**: Agents creating branches with the same names or overwriting each other's work  
+- **Push Failures**: Concurrent pushes causing merge conflicts and failed operations
+- **Index Lock Errors**: Git index.lock conflicts when multiple processes access the repository
+
+## Solution: Per-Agent Git Clones
+
+Instead of using mutexes (which would serialize all Git operations and hurt performance), we give each agent its own Git repository clone:
+
+```
+workspace/
+├── agent-backend-engineer/     # Backend engineer's clone
+│   ├── .git/
+│   ├── tasks/
+│   └── ...
+├── agent-frontend-engineer/    # Frontend engineer's clone  
+│   ├── .git/
+│   ├── tasks/
+│   └── ...
+└── agent-qa-engineer/         # QA engineer's clone
+    ├── .git/
+    ├── tasks/
+    └── ...
+```
+
+## Key Benefits
+
+### 🚀 **True Concurrency**
+- Multiple agents can work simultaneously without blocking each other
+- No waiting for Git lock releases
+- Scales to hundreds of concurrent agents
+
+### 🛡️ **Complete Isolation** 
+- Each agent has its own `.git` directory and working tree
+- No shared state or race conditions
+- Agent failures don't affect other agents
+
+### 🔄 **Automatic Synchronization**
+- Each clone automatically pulls latest changes before creating branches
+- All branches push to the same remote repository
+- PRs are created against the main repository
+
+### 🧹 **Easy Cleanup**
+- `staff cleanup-clones` removes all agent workspaces
+- Clones are recreated on-demand when agents start working
+- No manual Git state management required
+
+## Implementation Details
+
+### CloneManager (`git/clone_manager.go`)
+
+```go
+type CloneManager struct {
+    baseRepoURL    string                // Source repository URL
+    workspacePath  string                // Base workspace directory  
+    agentClones    map[string]string     // agent name -> clone path
+    mu             sync.RWMutex          // Thread-safe map access
+}
+```
+
+**Key Methods:**
+- `GetAgentClonePath(agentName)` - Get/create agent's clone directory
+- `RefreshAgentClone(agentName)` - Pull latest changes for an agent
+- `CleanupAgentClone(agentName)` - Remove specific agent's clone
+- `CleanupAllClones()` - Remove all agent clones
+
+### Agent Integration
+
+Each agent's Git operations are automatically routed to its dedicated clone:
+
+```go
+// Get agent's dedicated Git clone
+clonePath, err := am.cloneManager.GetAgentClonePath(agent.Name)
+if err != nil {
+    return fmt.Errorf("failed to get agent clone: %w", err)
+}
+
+// All Git operations use the agent's clone directory
+gitCmd := func(args ...string) *exec.Cmd {
+    return exec.CommandContext(ctx, "git", append([]string{"-C", clonePath}, args...)...)
+}
+```
+
+## Workflow Example
+
+1. **Agent Starts Task**:
+   ```bash
+   Agent backend-engineer gets task: "Add user authentication"
+   Creating clone: workspace/agent-backend-engineer/
+   ```
+
+2. **Concurrent Operations**:
+   ```bash
+   # These happen simultaneously without conflicts:
+   Agent backend-engineer:  git clone -> workspace/agent-backend-engineer/
+   Agent frontend-engineer: git clone -> workspace/agent-frontend-engineer/  
+   Agent qa-engineer:       git clone -> workspace/agent-qa-engineer/
+   ```
+
+3. **Branch Creation**:
+   ```bash
+   # Each agent creates branches in their own clone:
+   backend-engineer:  git checkout -b task-123-auth-backend
+   frontend-engineer: git checkout -b task-124-auth-ui
+   qa-engineer:      git checkout -b task-125-auth-tests
+   ```
+
+4. **Concurrent Pushes**:
+   ```bash
+   # All agents push to origin simultaneously:
+   git push -u origin task-123-auth-backend    # ✅ Success
+   git push -u origin task-124-auth-ui         # ✅ Success  
+   git push -u origin task-125-auth-tests      # ✅ Success
+   ```
+
+## Management Commands
+
+### List Agent Clones
+```bash
+staff list-agents  # Shows which agents are running and their clone status
+```
+
+### Cleanup All Clones
+```bash
+staff cleanup-clones  # Removes all agent workspace directories
+```
+
+### Monitor Disk Usage
+```bash
+du -sh workspace/  # Check total workspace disk usage
+```
+
+## Resource Considerations
+
+### Disk Space
+- Each clone uses ~repository size (typically 10-100MB per agent)
+- For 10 agents with 50MB repo = ~500MB total
+- Use `staff cleanup-clones` to free space when needed
+
+### Network Usage
+- Initial clone downloads full repository
+- Subsequent `git pull` operations are incremental
+- All agents share the same remote repository
+
+### Performance
+- Clone creation: ~2-5 seconds per agent (one-time cost)
+- Git operations: Full speed, no waiting for locks
+- Parallel processing: Linear scalability with agent count
+
+## Comparison to Alternatives
+
+| Solution | Concurrency | Complexity | Performance | Risk |
+|----------|-------------|------------|-------------|------|
+| **Per-Agent Clones** | ✅ Full | 🟡 Medium | ✅ High | 🟢 Low |
+| Global Git Mutex | ❌ None | 🟢 Low | ❌ Poor | 🟡 Medium |
+| File Locking | 🟡 Limited | 🔴 High | 🟡 Medium | 🔴 High |
+| Separate Repositories | ✅ Full | 🔴 Very High | ✅ High | 🔴 High |
+
+## Future Enhancements
+
+- **Lazy Cleanup**: Auto-remove unused clones after N days
+- **Clone Sharing**: Share clones between agents with similar tasks
+- **Compressed Clones**: Use `git clone --depth=1` for space efficiency
+- **Remote Workspaces**: Support for distributed agent execution
+
+The per-agent clone solution provides the optimal balance of performance, safety, and maintainability for concurrent AI agent operations.
\ No newline at end of file
diff --git a/server/git/clone_manager.go b/server/git/clone_manager.go
new file mode 100644
index 0000000..afedd65
--- /dev/null
+++ b/server/git/clone_manager.go
@@ -0,0 +1,160 @@
+package git
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"sync"
+)
+
+// CloneManager manages separate Git repository clones for each agent
+// This eliminates Git concurrency issues by giving each agent its own working directory
+type CloneManager struct {
+	baseRepoURL    string
+	workspacePath  string
+	agentClones    map[string]string // agent name -> clone path
+	mu             sync.RWMutex
+}
+
+// NewCloneManager creates a new CloneManager
+func NewCloneManager(baseRepoURL, workspacePath string) *CloneManager {
+	return &CloneManager{
+		baseRepoURL:   baseRepoURL,
+		workspacePath: workspacePath,
+		agentClones:   make(map[string]string),
+	}
+}
+
+// GetAgentClonePath returns the Git clone path for a specific agent
+// Creates the clone if it doesn't exist
+func (cm *CloneManager) GetAgentClonePath(agentName string) (string, error) {
+	cm.mu.Lock()
+	defer cm.mu.Unlock()
+
+	// Check if clone already exists
+	if clonePath, exists := cm.agentClones[agentName]; exists {
+		// Verify the clone still exists on disk
+		if _, err := os.Stat(clonePath); err == nil {
+			return clonePath, nil
+		}
+		// Remove stale entry if directory doesn't exist
+		delete(cm.agentClones, agentName)
+	}
+
+	// Create new clone for the agent
+	clonePath := filepath.Join(cm.workspacePath, fmt.Sprintf("agent-%s", agentName))
+	
+	// Ensure workspace directory exists
+	if err := os.MkdirAll(cm.workspacePath, 0755); err != nil {
+		return "", fmt.Errorf("failed to create workspace directory: %w", err)
+	}
+
+	// Remove existing clone directory if it exists
+	if err := os.RemoveAll(clonePath); err != nil {
+		return "", fmt.Errorf("failed to remove existing clone: %w", err)
+	}
+
+	// Clone the repository
+	if err := cm.cloneRepository(clonePath); err != nil {
+		return "", fmt.Errorf("failed to clone repository for agent %s: %w", agentName, err)
+	}
+
+	// Store the clone path
+	cm.agentClones[agentName] = clonePath
+	
+	return clonePath, nil
+}
+
+// cloneRepository performs the actual Git clone operation
+func (cm *CloneManager) cloneRepository(clonePath string) error {
+	ctx := context.Background()
+	
+	// Clone the repository
+	cmd := exec.CommandContext(ctx, "git", "clone", cm.baseRepoURL, clonePath)
+	if err := cmd.Run(); err != nil {
+		return fmt.Errorf("git clone failed: %w", err)
+	}
+
+	return nil
+}
+
+// RefreshAgentClone pulls the latest changes for an agent's clone
+func (cm *CloneManager) RefreshAgentClone(agentName string) error {
+	cm.mu.RLock()
+	clonePath, exists := cm.agentClones[agentName]
+	cm.mu.RUnlock()
+
+	if !exists {
+		return fmt.Errorf("no clone exists for agent %s", agentName)
+	}
+
+	ctx := context.Background()
+	
+	// Change to clone directory and pull latest changes
+	cmd := exec.CommandContext(ctx, "git", "-C", clonePath, "pull", "origin")
+	if err := cmd.Run(); err != nil {
+		return fmt.Errorf("failed to pull latest changes for agent %s: %w", agentName, err)
+	}
+
+	return nil
+}
+
+// CleanupAgentClone removes the clone directory for an agent
+func (cm *CloneManager) CleanupAgentClone(agentName string) error {
+	cm.mu.Lock()
+	defer cm.mu.Unlock()
+
+	clonePath, exists := cm.agentClones[agentName]
+	if !exists {
+		return nil // Already cleaned up
+	}
+
+	// Remove the clone directory
+	if err := os.RemoveAll(clonePath); err != nil {
+		return fmt.Errorf("failed to remove clone for agent %s: %w", agentName, err)
+	}
+
+	// Remove from tracking
+	delete(cm.agentClones, agentName)
+	
+	return nil
+}
+
+// CleanupAllClones removes all agent clone directories
+func (cm *CloneManager) CleanupAllClones() error {
+	cm.mu.Lock()
+	defer cm.mu.Unlock()
+
+	var errors []error
+	
+	for agentName, clonePath := range cm.agentClones {
+		if err := os.RemoveAll(clonePath); err != nil {
+			errors = append(errors, fmt.Errorf("failed to remove clone for agent %s: %w", agentName, err))
+		}
+	}
+
+	// Clear all tracked clones
+	cm.agentClones = make(map[string]string)
+
+	if len(errors) > 0 {
+		return fmt.Errorf("cleanup errors: %v", errors)
+	}
+
+	return nil
+}
+
+// GetAllAgentClones returns a map of all agent clones
+func (cm *CloneManager) GetAllAgentClones() map[string]string {
+	cm.mu.RLock()
+	defer cm.mu.RUnlock()
+
+	// Return a copy to avoid race conditions
+	result := make(map[string]string)
+	for agent, path := range cm.agentClones {
+		result[agent] = path
+	}
+	
+	return result
+}
\ No newline at end of file
diff --git a/server/git/mutex.go b/server/git/mutex.go
new file mode 100644
index 0000000..21bc25f
--- /dev/null
+++ b/server/git/mutex.go
@@ -0,0 +1,40 @@
+package git
+
+import (
+	"sync"
+)
+
+// GitMutex provides thread-safe access to Git operations
+// Since Git is not thread-safe, we need to serialize all Git operations
+// across all agents to prevent repository corruption and race conditions
+type GitMutex struct {
+	mu sync.Mutex
+}
+
+// NewGitMutex creates a new GitMutex instance
+func NewGitMutex() *GitMutex {
+	return &GitMutex{}
+}
+
+// Lock acquires the Git operation lock
+// This ensures only one agent can perform Git operations at a time
+func (gm *GitMutex) Lock() {
+	gm.mu.Lock()
+}
+
+// Unlock releases the Git operation lock
+func (gm *GitMutex) Unlock() {
+	gm.mu.Unlock()
+}
+
+// WithLock executes a function while holding the Git lock
+// This is a convenience method to ensure proper lock/unlock pattern
+func (gm *GitMutex) WithLock(fn func() error) error {
+	gm.Lock()
+	defer gm.Unlock()
+	return fn()
+}
+
+// Global Git mutex instance - shared across all agents
+// This ensures no concurrent Git operations across the entire application
+var GlobalGitMutex = NewGitMutex()
\ No newline at end of file