Files
vibe-kanban/crates/services
Alex Netsch 5e6ce06759 Better matching for parallel tool call approvals (vibe-kanban) (#977)
* I've successfully implemented better matching for parallel tool call approvals. Here's what was changed:

**File: `crates/services/src/services/approvals.rs`**

1. **Replaced `get_last_tool_use()` with `find_matching_tool_use()`** (lines 274-326)
   - Now accepts `tool_name` and `tool_input` parameters
   - Searches backward through message history for a matching ToolUse entry
   - Only matches tools in `Created` state (not already pending approval)
   - Matches by tool name first, then by tool input for disambiguation
   - Adds debug logging to track which entry was matched

2. **Added `extract_tool_arguments()` helper** (lines 346-352)
   - Extracts tool arguments from ActionType for comparison
   - Converts ActionType to JSON for structured comparison with tool_input

3. **Updated `Approvals::create()` method** (lines 72-129)
   - Changed from `get_last_tool_use()` to `find_matching_tool_use(&request.tool_name, &request.tool_input)`
   - Added detailed debug logging when approval is created
   - Added warning when no matching tool is found

Previously, the system would just grab the **last** tool use entry, which failed for parallel calls. Now:

1. When an approval request comes in with `tool_name` and `tool_input`
2. The system searches backward through conversation history
3. It finds the most recent `ToolUse` entry that:
   - Has status = `Created` (hasn't been matched yet)
   - Has matching tool name
   - Has matching tool input (serialized to JSON for comparison)
4. That specific entry gets marked as `PendingApproval`

This ensures each approval correctly matches its corresponding tool use, even when multiple tools are called in parallel.

-  Code compiles without errors or warnings
-  All existing tests pass (3/3 in services crate)
-  Full workspace check passes

Cleanup script changes for task attempt 182e4c44-adc2-40c6-a4a7-4c9b8528f974

Excellent! I've added comprehensive tests for the new feature. Here's what I added:

I added 3 tests to `crates/services/src/services/approvals.rs`:

Tests the core functionality - matching the correct tool when multiple parallel calls exist:
- Creates two `Read` tools with different files (`foo.rs` and `bar.rs`)
- Verifies that each approval request correctly matches its specific tool
- Ensures `foo.rs` approval matches index 0 and `bar.rs` approval matches index 1

Tests that already-matched tools are skipped:
- Creates one tool in `PendingApproval` state and another in `Created` state
- Verifies that the matching function skips the pending approval
- Ensures it finds the tool that's still in `Created` state

Tests the negative case:
- Creates a `Read` tool
- Tries to match a `Write` tool
- Verifies that no match is found (returns `None`)

All 6 tests pass (3 new + 3 existing):
-  `test_find_matching_tool_use_with_parallel_calls`
-  `test_find_matching_tool_use_skips_pending_approvals`
-  `test_find_matching_tool_use_no_match`
-  Existing analytics and config tests

The tests comprehensively validate that the new matching logic correctly handles parallel tool calls, state filtering, and edge cases.

Cleanup script changes for task attempt 182e4c44-adc2-40c6-a4a7-4c9b8528f974

Perfect! Now let me write a summary and explain the fix:

I've successfully fixed the approval matching issue for parallel tool calls. The problem was that we were comparing the wrong data structures.

When approval requests come from the hook:
- `tool_input` contains the **raw tool parameters** from Claude Code API: `{"file_path": "...", "content": "..."}`

But we were comparing against:
- `ActionType` which is our **transformed internal representation**: `{"action": "file_edit", "path": "...", "changes": [...]}`

These structures are completely different, so matching always failed for parallel calls.

The fix was to compare `tool_input` against the **original tool parameters** stored in `entry.metadata.input`, which preserves the exact structure that the hook sends.

**File: `crates/services/src/services/approvals.rs`**

1. **Updated `find_matching_tool_use()` function** (lines 289-343):
   - Extract `entry_input` from `entry.metadata.get("input")`
   - Compare it directly against `tool_input` from the approval request
   - If metadata exists but input doesn't match, `continue` searching (don't fallback to name-only matching)
   - Only fallback to name-only matching if metadata is completely missing

2. **Updated test helper** (lines 354-382):
   - Added realistic metadata structure with `"input"` field
   - Matches the actual structure from Claude Code's tool_use events

3. **Updated test inputs** (lines 407-412, 465-467, 493-496):
   - Changed from transformed format to match actual hook format
   - Example: `{"file_path": "foo.rs"}` instead of `{"action": "file_read", "path": "foo.rs"}`

1. Agent makes parallel tool calls (e.g., 3 Write tools with different files)
2. Each tool creates a `NormalizedEntry` with metadata containing the original `input` field
3. Hook fires for each tool and sends approval request with `tool_name` and `tool_input`
4. `find_matching_tool_use()` searches backward through history:
   - Filters by tool name ("Write")
   - Filters by status (`Created` only, skips already-pending tools)
   - Compares `metadata.input` against `tool_input` for exact match
5. Each approval correctly matches its specific tool use entry

 All 6 tests pass (3 new + 3 existing)
 Workspace compiles successfully
 Ready for testing with real parallel tool calls

Excellent! Everything compiles and tests pass. Let me provide a summary of the review feedback changes:

**Before:** If metadata was missing or input field wasn't found, we'd fall back to matching by tool name only.

**After:** We now **only** match if both tool name AND input match exactly. If either doesn't match or metadata is missing, we continue searching and return `None` if no match is found.

This ensures we never show approval UI for the wrong tool when parallel calls are made.

**Before:** 3 separate tests:
- `test_find_matching_tool_use_with_parallel_calls`
- `test_find_matching_tool_use_skips_pending_approvals`
- `test_find_matching_tool_use_no_match`

**After:** 1 comprehensive test: `test_parallel_tool_call_approval_matching`

The consolidated test covers 4 scenarios in a single, easy-to-understand flow:
1. **Parallel matching**: 3 parallel Read tools correctly match by input
2. **Status filtering**: Tools in `PendingApproval` state are skipped
3. **Tool name mismatch**: Wrong tool name returns `None`
4. **Input mismatch**: Wrong input parameters return `None`

-  All 4 tests pass (1 new consolidated test + 3 existing)
-  Workspace compiles successfully
-  Ready for production testing

The changes make the matching logic stricter (no fallback) and the tests more maintainable (single comprehensive test vs 3 scattered ones).

Cleanup script changes for task attempt 182e4c44-adc2-40c6-a4a7-4c9b8528f974

* Perfect! Everything works. Let me create a comprehensive summary of the final implementation:

## Summary: Deserialization-Based Tool Matching

I've successfully implemented proper deserialization-based comparison for approval matching. This is much more robust than raw JSON comparison.

### Key Changes

**File: `crates/services/src/services/approvals.rs` (lines 289-357)**

### How It Works

1. **Parse incoming approval request** (lines 302-315):
   ```rust
   let approval_tool_data: ClaudeToolData = serde_json::from_value(serde_json::json!({
       "name": tool_name,
       "input": tool_input
   }))
   ```
   - Deserializes the raw `tool_input` from the hook into `ClaudeToolData`
   - Handles all serde aliases (e.g., `"path"` vs `"file_path"`, `"cmd"` vs `"command"`)
   - Returns `None` with warning if deserialization fails

2. **Deserialize stored tool data** (line 341):
   ```rust
   if let Ok(entry_tool_data) = serde_json::from_value::<ClaudeToolData>(metadata.clone())
   ```
   - Deserializes the stored metadata into the same `ClaudeToolData` type

3. **Compare deserialized structures** (line 343):
   ```rust
   if entry_tool_data == approval_tool_data
   ```
   - Uses `ClaudeToolData`'s derived `PartialEq` implementation
   - Compares the actual structured data, not raw JSON strings
   - Handles all field variations automatically through serde

### Benefits

 **Handles field aliases**: `file_path` vs `path`, `command` vs `cmd`, etc.
 **Type-safe comparison**: Uses Rust's type system instead of JSON string comparison
 **Future-proof**: Any new aliases or fields are automatically handled
 **Robust**: Deserialization validates the data structure
 **Clear errors**: Logs warnings when deserialization fails

### Test Results

-  All 4 tests pass
-  Workspace compiles successfully
-  Ready for production testing with real parallel tool calls

### Example

For parallel Write tool calls:
```json
// Hook sends:
{"tool_name": "Write", "tool_input": {"file_path": "foo.rs", "content": "..."}}

// We deserialize both the incoming request and stored metadata to:
ClaudeToolData::Write { file_path: "foo.rs", content: "..." }

// Then compare the deserialized structures directly
```

This approach is much more maintainable and handles all the edge cases automatically!

* Cleanup script changes for task attempt 182e4c44-adc2-40c6-a4a7-4c9b8528f974

* Perfect! Everything compiles and tests pass. Let me create a final summary:

## Final Implementation Summary

I've successfully implemented hybrid tool matching that handles both known and unknown tools correctly for parallel tool call approvals.

### How It Works

**File: `crates/services/src/services/approvals.rs` (lines 289-375)**

### Two-Strategy Approach

#### 1. Deserialize Approval Request (lines 302-318)
```rust
let approval_tool_data: ClaudeToolData = serde_json::from_value(...)
let is_unknown_tool = matches!(approval_tool_data, ClaudeToolData::Unknown { .. });
```
- Parse incoming `tool_input` into `ClaudeToolData`
- Check if it deserialized to the `Unknown` variant
- This tells us whether to use structured or raw comparison

#### 2. Match Based on Tool Type

**For Known Tools** (lines 355-366):
```rust
if let Ok(entry_tool_data) = serde_json::from_value::<ClaudeToolData>(metadata.clone()) {
    if entry_tool_data == approval_tool_data {
        return Some((idx, entry));
    }
}
```
- Deserialize stored metadata
- Compare using `ClaudeToolData`'s `PartialEq`
- Handles all serde aliases automatically (e.g., `path` vs `file_path`)

**For Unknown Tools** (lines 343-353):
```rust
if let Some(entry_input) = metadata.get("input") {
    if entry_input == tool_input {
        return Some((idx, entry));
    }
}
```
- Use raw JSON comparison
- Works for MCP tools, future tools, and any unrecognized tools
- No aliases to worry about for unknown tools

### Benefits

 **Known tools** (Read, Write, Edit, etc.): Proper deserialized comparison with alias handling
 **Unknown tools** (MCP, future): Raw JSON comparison, no false negatives
 **Clean logic**: Only use raw comparison specifically for `Unknown` variant
 **Type-safe**: Leverages Rust's type system and pattern matching
 **Future-proof**: New tools automatically work

### Test Results

-  All 4 tests pass
-  Workspace compiles successfully
-  Ready for production testing with parallel tool calls

This implementation correctly handles all tool types while maintaining clean, maintainable code!

* Perfect! All tests pass and the workspace compiles. Let me provide a final summary of the refactored implementation:

## Final Implementation Summary (After Review Feedback)

I've successfully refactored the code based on your review feedback to make it cleaner and more maintainable.

### Changes Made

**File: `crates/services/src/services/approvals.rs`**

### 1. Unknown Tool Check in Match Statement (Line 352-355)

**Before:**
```rust
let approval_tool_data = match ... { Ok(data) => data, ... };
let is_unknown_tool = matches!(approval_tool_data, ClaudeToolData::Unknown { .. });
```

**After:**
```rust
let approval_tool_data: ClaudeToolData = match serde_json::from_value(...) {
    Ok(ClaudeToolData::Unknown { .. }) => {
        return match_by_raw_input(store, tool_name, tool_input);
    }
    Ok(data) => data,
    Err(e) => { ... }
};
```

 **Benefit**: Early return for Unknown tools directly in the match, cleaner control flow

### 2. Extracted Raw JSON Comparison Function (Lines 289-334)

**New Function:**
```rust
fn match_by_raw_input(
    store: Arc<MsgStore>,
    tool_name: &str,
    tool_input: &serde_json::Value,
) -> Option<(usize, NormalizedEntry)>
```

 **Benefits**:
- Dedicated function for Unknown tool matching
- Single responsibility - only handles raw JSON comparison
- Easy to test and maintain separately
- Clear separation between known vs unknown tool strategies

### Code Structure

```
match_by_raw_input()              // Lines 289-334: For Unknown tools
    └─> Raw JSON comparison

find_matching_tool_use()          // Lines 336-406: Main entry point
    ├─> Deserialize tool_input
    ├─> If Unknown → match_by_raw_input()
    └─> If Known → Deserialized comparison
```

### How It Works

1. **Deserialize approval request** (line 348)
2. **Check if Unknown in match** (line 352)
   - If `Unknown` → Early return using `match_by_raw_input()`
   - If known → Continue with deserialized comparison
3. **Known tools**: Compare using `ClaudeToolData`'s `PartialEq` (line 393)

### Test Results

-  All 4 tests pass
-  Workspace compiles successfully
-  Cleaner, more maintainable code structure
-  Ready for production

The refactored code is now cleaner with better separation of concerns!

* Perfect! All tests pass and the workspace compiles. Here's the final summary:

## Final Refactored Implementation

I've successfully refactored the code to have **a single loop** with a **match statement** to decide the comparison strategy.

### Key Changes

**File: `crates/services/src/services/approvals.rs` (lines 289-391)**

### 1. Added Strategy Enum (lines 289-295)

```rust
enum ToolComparisonStrategy {
    Deserialized(ClaudeToolData),  // For known tools
    RawJson,                        // For Unknown tools
}
```

### 2. Single Function with One Loop (lines 297-391)

**Before**: Two separate functions, two loops
- `match_by_raw_input()` - Loop for Unknown tools
- `find_matching_tool_use()` - Loop for known tools

**After**: One function, one loop
- Determine strategy upfront (lines 309-329)
- Single loop through history (lines 332-388)
- Match statement inside loop to apply strategy (lines 354-373)

### Code Structure

```rust
fn find_matching_tool_use(...) {
    // 1. Determine strategy based on tool type
    let strategy = match deserialize_tool_data(...) {
        Unknown => RawJson,
        Known(data) => Deserialized(data),
    };

    // 2. Single loop through history
    for entry in history {
        // Filter by status and name

        // 3. Match on strategy to compare
        let is_match = match &strategy {
            RawJson => compare_raw_json(...),
            Deserialized(data) => compare_deserialized(...),
        };

        if is_match { return Some(entry); }
    }
}
```

### Benefits

 **Single loop**: No code duplication, cleaner structure
 **Strategy pattern**: Clear separation of comparison logic
 **Type-safe**: Enum enforces correct strategy usage
 **Maintainable**: Easy to add new strategies if needed
 **Same performance**: Strategy determined once, not per iteration

### Test Results

-  All 4 tests pass
-  Workspace compiles successfully
-  Ready for production

The refactored code is cleaner and more maintainable while preserving all functionality!

* Clippy
2025-10-10 16:57:52 +01:00
..
2025-09-29 19:50:29 +01:00
2025-10-09 11:59:52 +00:00