Mastodon Politics, Power, and Science: Implementation Guide: Agent Test Suites

Monday, August 4, 2025

Implementation Guide: Agent Test Suites

This is a very clear and pragmatic vision for integrating a multi-case test suite. By separating the "Run" (for interactive debugging) from "Run Tests" (for validation), we create a much cleaner and more intuitive user experience.

Here is a step-by-step implementation guide outlining the necessary changes, with key code snippets.


Implementation Guide: Agent Test Suites

The goal is to allow each agent to store multiple, named test cases (inputs + assertions) and provide a UI to both interactively debug a single case and automatically validate all of them.

Step 1: Evolve the run_config Schema

The agent's definition in config.json will be updated to support an array of test cases.

File: config.json (Example for an agent)

Generated json
     "my_agent_name": {
    "type": "proc",
    "inputs": ["x", "y"],
    "outputs": ["result"],
    "run_config": {
        "active_test_index": 0,
        "test_cases": [
            {
                "name": "Default Run Case",
                "inputs": { "x": "10", "y": "5" },
                "log_level": "INFO",
                "assertions": []
            },
            {
                "name": "Edge Case: Zero Value",
                "inputs": { "x": "100", "y": "0" },
                "log_level": "DEBUG",
                "assertions": [
                    { "output_variable": "__status__.status.value", "assertion_type": "Equals", "expected_value": "-1" }
                ]
            }
        ]
    }
}
    
  • active_test_index: The GUI will remember which test case was last selected.

  • test_cases: An array where each object is a complete test scenario.

  • Convention: The very first case at index 0 will always be the "Default Run Case" used by the simple "Run" button.


Step 2: Modify RunAgentModal UI and Logic

The modal needs to be updated to manage this new array structure.

File: editor_app.py (Inside RunAgentModal)

2.1: Add New Widgets
The top button_frame inside the modal is redesigned.

Generated python
# In RunAgentModal.create_widgets()

# --- Redesigned Top Button Frame ---
button_frame = ctk.CTkFrame(input_container, fg_color="transparent")
button_frame.pack(fill="x", padx=10, pady=(5, 10))

# Dropdown for selecting the test case
self.test_case_var = ctk.StringVar()
self.test_case_menu = ctk.CTkOptionMenu(button_frame, variable=self.test_case_var, command=self.on_test_case_selected)
self.test_case_menu.pack(side="left")

# Button to create a new test case
ctk.CTkButton(button_frame, text="+ New Case", command=self.add_new_test_case).pack(side="left", padx=10)

# The primary "Run" button for interactive debugging
run_button = ctk.CTkButton(button_frame, text="▶️ Run Current Case", command=self.execute_single_run)
run_button.pack(side="right")

# The "Run All Tests" validation button
run_all_button = ctk.CTkButton(button_frame, text="✅ Run All Tests", command=self.execute_all_tests, fg_color="green")
run_all_button.pack(side="right", padx=10)
    

2.2: Load and Switch Between Test Cases
The modal needs logic to populate and switch the UI based on the dropdown selection.

Generated python
# In RunAgentModal class

def load_run_config(self):
    """Loads the entire test suite and populates the dropdown."""
    self.run_config = self.agent_data.get("run_config", {
        "active_test_index": 0,
        "test_cases": [{
            "name": "Default Run Case", "inputs": {}, "log_level": "INFO", "assertions": []
        }]
    })
    
    # Populate the dropdown menu
    test_case_names = [tc.get("name", f"Test Case {i+1}") for i, tc in enumerate(self.run_config["test_cases"])]
    self.test_case_menu.configure(values=test_case_names)
    
    # Select the active case
    active_index = self.run_config.get("active_test_index", 0)
    if active_index < len(test_case_names):
        self.test_case_var.set(test_case_names[active_index])
        self.load_ui_for_case(active_index)

def on_test_case_selected(self, selected_name):
    """Triggered when the user selects a new test case from the dropdown."""
    # First, save the state of the UI for the case we are leaving
    self.save_current_ui_state()
    
    # Find the index of the newly selected case
    test_case_names = [tc.get("name") for tc in self.run_config["test_cases"]]
    new_index = test_case_names.index(selected_name)
    
    # Load the UI for the new case
    self.load_ui_for_case(new_index)

def load_ui_for_case(self, index):
    """Populates the entire UI (inputs, logs, tests) for a given test case index."""
    self.current_test_index = index
    self.run_config["active_test_index"] = index
    
    case_data = self.run_config["test_cases"][index]
    
    # Populate inputs (clear existing, then add new)
    # ... logic to clear and populate input_entries ...
    
    # Populate log level
    self.log_level_var.set(case_data.get("log_level", "INFO"))
    
    # Populate assertions in the "Unit Tests" tab
    # ... logic to clear and populate test_widgets ...
    

2.3: Separate Execution Logic
The old execute_run is split into two distinct functions.

Generated python
# In RunAgentModal class

def execute_single_run(self):
    """Runs ONLY the currently selected test case and updates the output."""
    self.save_current_ui_state() # Save any UI changes first
    case_to_run = self.run_config["test_cases"][self.current_test_index]
    
    # ... perform execution with case_to_run['inputs'] ...
    final_result, logs = self._run_agent(case_to_run['inputs'], case_to_run['log_level'])

    # ... display final_result and logs in the textboxes ...
    
    # Run assertions for this single case
    self.run_assertions(final_result, case_to_run['assertions'])

    # Only switch to the tests tab if there are actually tests defined
    if case_to_run['assertions']:
        self.tab_view.set("Unit Tests")
    else:
        self.tab_view.set("Final Output")

def execute_all_tests(self):
    """Runs all test cases sequentially and updates only the test tab."""
    self.save_current_ui_state() # Save any changes before running
    all_results = []
    
    for i, case in enumerate(self.run_config["test_cases"]):
        final_result, _ = self._run_agent(case['inputs'], case['log_level'])
        pass_fail_list = self.run_assertions(final_result, case['assertions'], update_ui=False)
        all_results.append({
            'name': case['name'],
            'results': pass_fail_list
        })
        
    # After all tests are run, update the UI of the Test Tab with a summary
    # ... logic to render the results for all_results into the test tab ...
    
    self.tab_view.set("Unit Tests")

def _run_agent(self, inputs, log_level):
    """Helper function that contains the core execution logic."""
    # This contains the code from the old execute_run:
    # 1. Set global log_text_limit
    # 2. Promote to workflow if needed
    # 3. Set up log capturing
    # 4. Call exec_workflow
    # 5. Return (final_result, captured_logs)
    pass
    

Step 3: Update save_current_run_config

This function now needs to save the state of the current UI back into the correct index in the test_cases array before saving the whole object to the main app's data.

File: editor_app.py (Inside RunAgentModal)

Generated python
# In RunAgentModal class

def save_current_ui_state(self):
    """Saves the current state of the UI back to the in-memory run_config object."""
    if not hasattr(self, 'current_test_index'): return

    current_case = self.run_config["test_cases"][self.current_test_index]
    
    # Save inputs
    current_case["inputs"] = {key: entry.get() for key, entry in self.input_entries.items()}
    
    # Save log level
    current_case["log_level"] = self.log_level_var.get()

    # Save assertions
    current_case["assertions"] = [
        # ... logic to read from test_widgets and create assertion objects ...
    ]

def save_current_run_config(self):
    """Saves the entire run_config object back to the main editor frame."""
    self.save_current_ui_state() # Ensure the last active case is saved
    self.app.editor_frame_instance.data['run_config'] = self.run_config
    

This guide provides a clear path to implementing this highly advanced testing feature. By separating the "run" and "validate" actions and managing a structured array of test cases, you create a system that is both intuitive for interactive debugging and powerful for automated regression testing.

No comments:

Post a Comment

The universe held up a mirror and we did not recognize our own arbitrary scales.

  Because the reflection was so perfect we mistook our own face for the face of God. We built the axes. We invented length, duration, mass —...