Monday, July 15, 2024

Rapidly progressing on the Dynamic Agent Workflow tool!

Hey everyone! I'm thrilled to share that my dynamic agent workflow project is progressing incredibly well! I'm still actively adding new features and refining the configuration files, so I'm holding off on releasing an open-source version for now. However, I can't wait to share it with you all once everything is stable.

Currently, the tool can successfully connect to various services, send and receive messages, interacting with my local Ollama and Stable Diffusion APIs, enabling me to select elements in the JSON response containing large blocks of encoded text, and then pipe that data into image files from the command line. I am able to select api features and build and send agent prompts and get results back from these services.

I've also added support for connecting to JSON, XML, HTML, and text services. While I haven't tested it yet, the HTML service support should allow for basic web scraping and returning specific blocks of HTML. A --dryrun option spoofs the message and bypasses the network call, sending a message the response handler sees as a valid message and it is easy to see in the next prompt in chain. This speeds up debugging substantially.

As I continued refining the tool, an epiphany struck me: agents and workflows share a fundamental structure, with inputs, outputs, and parameters. This realization led me to rename the existing agents as "endpoint types" and transform workflows into another type of agent. So now every agent is identically structured in the config file. The behavior is the same as before but the capability is now waiting to be utilized.

Moving forward, I'm working on adding processing agents to the workflows and enabling the nesting of agents within workflows, and even workflows within other workflows. A set of base64 widgets will let me save image files directly to disk from the Stable Diffusion api.

Remarkably, these new features are organically emerging without increasing code complexity, which makes me believe I'm onto something really cool here. The tool can not only handle its initial task of connecting to endpoints and fetching images and text but could potentially perform a wide range of processing tasks.

I can envision a future where my current hard-coded API agents become processing agents with an embedded workflow that creates the headers, formats the message, creates the url, calls the http client agent that does the connection and returns the object requested for further processing. This would allow for fully customizable processing, and if these agents can be embedded in a more extensive workflow, the entire program flow could become entirely scriptable without any hardcoded elements in the code.

It's truly exhilarating when the creative process unfolds like this, with one idea leading to another like dominoes.

I can't wait to see where this journey takes me next!

No comments:

Post a Comment