Mastodon Politics, Power, and Science: Implementation Guide: Agent Test Suites

Monday, August 4, 2025

Implementation Guide: Agent Test Suites

This is a very clear and pragmatic vision for integrating a multi-case test suite. By separating the "Run" (for interactive debugging) from "Run Tests" (for validation), we create a much cleaner and more intuitive user experience.

Here is a step-by-step implementation guide outlining the necessary changes, with key code snippets.


Implementation Guide: Agent Test Suites

The goal is to allow each agent to store multiple, named test cases (inputs + assertions) and provide a UI to both interactively debug a single case and automatically validate all of them.

Step 1: Evolve the run_config Schema

The agent's definition in config.json will be updated to support an array of test cases.

File: config.json (Example for an agent)

Generated json
     "my_agent_name": {
    "type": "proc",
    "inputs": ["x", "y"],
    "outputs": ["result"],
    "run_config": {
        "active_test_index": 0,
        "test_cases": [
            {
                "name": "Default Run Case",
                "inputs": { "x": "10", "y": "5" },
                "log_level": "INFO",
                "assertions": []
            },
            {
                "name": "Edge Case: Zero Value",
                "inputs": { "x": "100", "y": "0" },
                "log_level": "DEBUG",
                "assertions": [
                    { "output_variable": "__status__.status.value", "assertion_type": "Equals", "expected_value": "-1" }
                ]
            }
        ]
    }
}
    
  • active_test_index: The GUI will remember which test case was last selected.

  • test_cases: An array where each object is a complete test scenario.

  • Convention: The very first case at index 0 will always be the "Default Run Case" used by the simple "Run" button.


Step 2: Modify RunAgentModal UI and Logic

The modal needs to be updated to manage this new array structure.

File: editor_app.py (Inside RunAgentModal)

2.1: Add New Widgets
The top button_frame inside the modal is redesigned.

Generated python
# In RunAgentModal.create_widgets()

# --- Redesigned Top Button Frame ---
button_frame = ctk.CTkFrame(input_container, fg_color="transparent")
button_frame.pack(fill="x", padx=10, pady=(5, 10))

# Dropdown for selecting the test case
self.test_case_var = ctk.StringVar()
self.test_case_menu = ctk.CTkOptionMenu(button_frame, variable=self.test_case_var, command=self.on_test_case_selected)
self.test_case_menu.pack(side="left")

# Button to create a new test case
ctk.CTkButton(button_frame, text="+ New Case", command=self.add_new_test_case).pack(side="left", padx=10)

# The primary "Run" button for interactive debugging
run_button = ctk.CTkButton(button_frame, text="▶️ Run Current Case", command=self.execute_single_run)
run_button.pack(side="right")

# The "Run All Tests" validation button
run_all_button = ctk.CTkButton(button_frame, text="✅ Run All Tests", command=self.execute_all_tests, fg_color="green")
run_all_button.pack(side="right", padx=10)
    

2.2: Load and Switch Between Test Cases
The modal needs logic to populate and switch the UI based on the dropdown selection.

Generated python
# In RunAgentModal class

def load_run_config(self):
    """Loads the entire test suite and populates the dropdown."""
    self.run_config = self.agent_data.get("run_config", {
        "active_test_index": 0,
        "test_cases": [{
            "name": "Default Run Case", "inputs": {}, "log_level": "INFO", "assertions": []
        }]
    })
    
    # Populate the dropdown menu
    test_case_names = [tc.get("name", f"Test Case {i+1}") for i, tc in enumerate(self.run_config["test_cases"])]
    self.test_case_menu.configure(values=test_case_names)
    
    # Select the active case
    active_index = self.run_config.get("active_test_index", 0)
    if active_index < len(test_case_names):
        self.test_case_var.set(test_case_names[active_index])
        self.load_ui_for_case(active_index)

def on_test_case_selected(self, selected_name):
    """Triggered when the user selects a new test case from the dropdown."""
    # First, save the state of the UI for the case we are leaving
    self.save_current_ui_state()
    
    # Find the index of the newly selected case
    test_case_names = [tc.get("name") for tc in self.run_config["test_cases"]]
    new_index = test_case_names.index(selected_name)
    
    # Load the UI for the new case
    self.load_ui_for_case(new_index)

def load_ui_for_case(self, index):
    """Populates the entire UI (inputs, logs, tests) for a given test case index."""
    self.current_test_index = index
    self.run_config["active_test_index"] = index
    
    case_data = self.run_config["test_cases"][index]
    
    # Populate inputs (clear existing, then add new)
    # ... logic to clear and populate input_entries ...
    
    # Populate log level
    self.log_level_var.set(case_data.get("log_level", "INFO"))
    
    # Populate assertions in the "Unit Tests" tab
    # ... logic to clear and populate test_widgets ...
    

2.3: Separate Execution Logic
The old execute_run is split into two distinct functions.

Generated python
# In RunAgentModal class

def execute_single_run(self):
    """Runs ONLY the currently selected test case and updates the output."""
    self.save_current_ui_state() # Save any UI changes first
    case_to_run = self.run_config["test_cases"][self.current_test_index]
    
    # ... perform execution with case_to_run['inputs'] ...
    final_result, logs = self._run_agent(case_to_run['inputs'], case_to_run['log_level'])

    # ... display final_result and logs in the textboxes ...
    
    # Run assertions for this single case
    self.run_assertions(final_result, case_to_run['assertions'])

    # Only switch to the tests tab if there are actually tests defined
    if case_to_run['assertions']:
        self.tab_view.set("Unit Tests")
    else:
        self.tab_view.set("Final Output")

def execute_all_tests(self):
    """Runs all test cases sequentially and updates only the test tab."""
    self.save_current_ui_state() # Save any changes before running
    all_results = []
    
    for i, case in enumerate(self.run_config["test_cases"]):
        final_result, _ = self._run_agent(case['inputs'], case['log_level'])
        pass_fail_list = self.run_assertions(final_result, case['assertions'], update_ui=False)
        all_results.append({
            'name': case['name'],
            'results': pass_fail_list
        })
        
    # After all tests are run, update the UI of the Test Tab with a summary
    # ... logic to render the results for all_results into the test tab ...
    
    self.tab_view.set("Unit Tests")

def _run_agent(self, inputs, log_level):
    """Helper function that contains the core execution logic."""
    # This contains the code from the old execute_run:
    # 1. Set global log_text_limit
    # 2. Promote to workflow if needed
    # 3. Set up log capturing
    # 4. Call exec_workflow
    # 5. Return (final_result, captured_logs)
    pass
    

Step 3: Update save_current_run_config

This function now needs to save the state of the current UI back into the correct index in the test_cases array before saving the whole object to the main app's data.

File: editor_app.py (Inside RunAgentModal)

Generated python
# In RunAgentModal class

def save_current_ui_state(self):
    """Saves the current state of the UI back to the in-memory run_config object."""
    if not hasattr(self, 'current_test_index'): return

    current_case = self.run_config["test_cases"][self.current_test_index]
    
    # Save inputs
    current_case["inputs"] = {key: entry.get() for key, entry in self.input_entries.items()}
    
    # Save log level
    current_case["log_level"] = self.log_level_var.get()

    # Save assertions
    current_case["assertions"] = [
        # ... logic to read from test_widgets and create assertion objects ...
    ]

def save_current_run_config(self):
    """Saves the entire run_config object back to the main editor frame."""
    self.save_current_ui_state() # Ensure the last active case is saved
    self.app.editor_frame_instance.data['run_config'] = self.run_config
    

This guide provides a clear path to implementing this highly advanced testing feature. By separating the "run" and "validate" actions and managing a structured array of test cases, you create a system that is both intuitive for interactive debugging and powerful for automated regression testing.

No comments:

Post a Comment

Progress on the campaign manager

You can see that you can build tactical maps automatically from the world map data.  You can place roads, streams, buildings. The framework ...