Saturday, August 31, 2024

Grothendieckian math gives you the ability to map a sphere to a cube if you didn't want to use PI to calculate the area.

Using Grothendieckian concepts it is possible to directly map data patterns into the transformer patterns by just mapping the spaces between these two different patterns using a functor. 

In Grothendieckian mathematics, a functor is a mapping between categories that preserves the structure and relationships within the categories. In this case, we could define categories for the input data and the neural network, with objects representing the data points or neurons and morphisms representing the relationships or connections between them.

The functor would then map between these categories in a way that preserves the structure and relationships within them. For example, the functor might map similar data points in the input category to similar neurons in the neural network category, or preserve the connectivity patterns between neurons when mapping between categories.

By choosing an appropriate functor for the transformation process, we could potentially ensure that the neural network is initialized in a way that is well-suited to the structure and patterns within the input data, leading to faster and more efficient training.

Furthermore, by analyzing the properties of the functor and its effects on the input data and neural network categories, we could potentially gain new insights into the training process and develop new optimization techniques that further improve performance.

Saturday, August 24, 2024

Adding RAG to my Dynamic AI Agent Workflows

I modified the script to properly created paragraph sized chunks to be returned on semantic search. I modified the script that uses the data to properly return rag enhancements to a prompt.

Here's a Python script to create a RAG database from text and PDF documents. This script will:

  1. Process text and PDF files
  2. Create embeddings for the content
  3. Build a FAISS index for efficient retrieval

Here's the script:

import os
import fitz # PyMuPDF
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
import re

# readme
# add these libraries
#
# pip install PyMuPDF faiss-cpu numpy sentence-transformers
#

def read_text_file(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
return file.read()

def read_pdf_file(file_path):
doc = fitz.open(file_path)
text = ""
for page in doc:
text += page.get_text()
return text

def process_documents(directory):
documents = []
for filename in os.listdir(directory):
file_path = os.path.join(directory, filename)
if filename.endswith('.txt'):
text = read_text_file(file_path)

pattern = r'(?<=[.!?])\s*\n+'
paragraphs = re.split(pattern, text)
paragraphs = [para.strip() for para in paragraphs if para.strip()]

documents.extend([para.strip() for para in paragraphs if para.strip()]) # Add non-empty paragraphs
elif filename.endswith('.pdf'):
text = read_pdf_file(file_path)
#print (text)
#paragraphs = text.split(".") # Split by double newline (common paragraph separator)
pattern = r'(?<=[.!?])\s*\n+'
paragraphs = re.split(pattern, text)
paragraphs = [para.strip() for para in paragraphs if para.strip()]
documents.extend([para.strip() for para in paragraphs if para.strip()]) # Add non-empty paragraphs
return documents

def create_rag_database(directory):
# Process documents
documents = process_documents(directory)
# Create embeddings
embedder = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = embedder.encode(documents)
# Create FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings.astype('float32'))
return index, documents

# Usage
directory = "docs/"
index, documents = create_rag_database(directory)

# Save the index and documents for later use
faiss.write_index(index, "rag_index.faiss")
np.save("document_chunks.npy", documents)

print(f"RAG database created with {len(documents)} document chunks.")

To use this script:

  1. Install required libraries:
    pip install PyMuPDF faiss-cpu numpy sentence-transformers
  2. Replace /path/to/your/documents with the actual path to the directory containing your text and PDF files.
  3. Run the script. It will create two files:
    • rag_index.faiss: The FAISS index for efficient similarity search
    • documents.npy: A NumPy array containing the original documents

Now you can load these files in your RAG-enhanced workflow:


import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
import textwrap
def rag_annotate(prompt:str, path:str, output: list, k:int =5)-> tuple[str, Dict]:
# Check if data is already loaded

k = int(k)
if not hasattr(rag_annotate, 'index'):
# Load data for the first time
rag_annotate.index = faiss.read_index(f"{path}/rag_index.faiss")
rag_annotate.document_chunks = np.load(f"{path}/document_chunks.npy", allow_pickle=True)
rag_annotate.embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Use the loaded data
prompt_embedding = rag_annotate.embedder.encode([prompt])
_, I = rag_annotate.index.search(prompt_embedding, k)
retrieved_chunks = [rag_annotate.document_chunks[i] for i in I[0]]
context = "\n".join(retrieved_chunks)
return (f"Context:\n{context}\n\nQuestion: {prompt}\nAnswer:",
            {"status": {"value": 0, "reason": "Success"}})

This setup allows you to create a RAG database from your documents and then use it in your workflow to enhance prompts before sending them to an LLM. The database creation is done separately, so you only need to run it when you want to update your document set.


Tuesday, August 20, 2024

Fixed up an old web server I was writing 12 years ago.

It used epoll, sendfile, and threads. I stopped working on it because I didn't know how to make mutexs work for the threads.  But the big AI models know how to do that and fixed me right up. Once I got that working I uploaded it to github as the Simple Advanced Web Sever.

The webserver.c program looks like this:



But I did not stop there.  My next trick was to add python scripting to the c program so it can load and run python scripts.  And today I hooked the url parameters into the inputs of the python script.  



I am exploring a way to allow the threads to just queue up what needs to be done and move onto the next request in order to speed up the processing.  This is needed in order to properly handle ssl in the application anyway.   

The server will create an array of pointer to request queues, one for each request thread.  There will always be one node on the list, the threads will create nodes and add it to the end of the list, keeping a a pointer to the current end of their request queue.  This prevents any thread from accessing the same data at the same time.  The last node will always be left on the list, just a done flag set to true. 

These nodes will be pulled off the end of the queue and placed into a work list, if a node waits more than a few milliseconds it will be copied, the copy put onto the work list, and the done flag set so we know we processed it.  Once that node has a next element then we can advance the array pointer to the next node and mark free the current node. 

To process the working list in the server.c file I want to set everything to non blocking and create different lists if they are ssl or fd sendfile or just fd. 

The data structure identifying the connection will be a netstream object. This object will have things added by each layer.


struct net_stream {
    int fd;
    SSL *ssl;
    int is_ssl;
    thread * context;

    int done;
    char* filename;
    char* data;
    net_stream * list;
}


This handles the connection and the processing to be done depending on what is set in the object. It contains enough memory to function for different tasks depending on what you want to do. 

At that point the request threads become very fast if all they ever do is call a function with process the header request and continue on. 

Sunday, August 18, 2024

Efficient Dependency Resolution Algorithm for Large-Scale Package Management Systems

A brief overview of the technique:

  1. Initialization: Create a hash table, a working list, and a done list.
  2. Setup: Add the list of packages with dependencies to the working list.
  3. Sorting: Sort the working list by the number of dependencies.
  4. Iteration: Iterate through the working list.
  5. Dependency Check: Check if dependencies in the hash table, if they are then remove dependency from the list of needed dependencies for that package. 
  6. Reordering: If any dependencies are missing, move the package to the back of the working list.
  7. Completion: Add package to the hash table and add it to the end of the done list when the dependency list for that package is empty. 
  8. Repeat: Continue iterating until the working list is empty.
  9. Circular dependency detection:  If you ever go two loops through the working list without removing anything from that list then quit iterating and dump the list to identify circular dependencies.
Abstract:
Dependency resolution is a critical task in package management systems, and the performance of existing algorithms can be a bottleneck for large-scale installations. In this paper, we present a novel and efficient dependency resolution algorithm that leverages a hash table and efficient list management to achieve significant improvements in computation time. Our algorithm iterates through a sorted working list of packages, checking and resolving dependencies in an organized and scalable manner.

Introduction

Package management systems play a crucial role in software development and system administration by automating the installation, configuration, and maintenance of software packages. A key challenge in package management is resolving dependencies among packages, which can become computationally intensive for large-scale installations.
In this paper, we propose an efficient and scalable dependency resolution algorithm that combines the use of a hash table and effective list management techniques.
  1. Proposed Algorithm
Our algorithm consists of the following steps:
3.1. Initialization
Create a hash table for quick lookups, a working list to store packages with unresolved dependencies, and a done list to store packages with resolved dependencies.
3.2. Sorting
Sort the working list by the number of dependencies to prioritize packages with fewer dependencies.
3.3. Iteration
Iterate through the working list to resolve dependencies.
3.4. Dependency Check
For each package, check if its dependencies are in the hash table. If a dependency is found, remove it from the package's dependency list.
3.5. Reordering
If any dependencies are missing, move the package to the back of the working list to revisit it after resolving other packages' dependencies.
3.6. Completion
Add the package to the hash table and the end of the done list when its dependency list is empty.
3.7. Repeat
Continue iterating until the working list is empty.

Conclusion
Our proposed dependency resolution algorithm effectively addresses the challenges of managing dependencies in large-scale package management systems.

Optional inputs to functions for Dynamic AI Agent workflow program is DONE!!!


Today I finally finished the optional input feature on my Dynamic AI Agent workflow program today. Yesterday I fought the program for hours, not able to see the optional values coming into the function.  It was because I had two bugs how I promoted an agent to run as a workflow to be able to run a single agent  with optional inputs when I just ran the agent by itself.  This allows the program to test a single agent.  The agent would work as part of a workflow, but I couldn't run just the agent alone.   Well, there was a disconnect between getting the command line options into the program, and mapping them into the singleton agent workflow. 

First thing this morning I fixed both bugs.  

I had to pass in cli_args to the promote_agent_to_workflow function and add these lines:

for opt_input in agent_config.get('optional_inputs', []):

        if cli_args.get(opt_input) is not None:

            # Add to workflow inputs

            temp_workflow['inputs'].append(opt_input)

            # Add to steps params

            temp_workflow['steps'][0]['params'][opt_input] = f"${opt_input}"

In the config handling I had to move the above function call until after we calculated the command line arguments and also remove entries from the cli_args that didn't actually appear on the command line. 

    cli_args = {k: v for k, v in vars(args).items() if v is not None}

This is the flow of the data through my system, you can see how the command line becomes the cli_args then to the step parameters being passed to the the function, and the output of the function from the input.  You can see how the ENV_ values are replaced by the actual real values from the environmental variables.







Added another agent to my workflows, this one captures the google custom search api.

 Added another proc agent tonight. When I pair it up with the exec_api_get agent it retrieves a google custom search.  See all the optional parameters? 


This is the workflow with the google search agent and the network get request agent. 



   The "num": "5" line is an optional parameter to the function. You don't have to have it in the list, and if you don't have it, it just doesn't get added to the request url.  


And this is the first full function that takes advantage of the optional parameters to capture an api.  I don't have any unit checking because I want to add units and grep strings to find valid matches for the inputs automagically in my config files. 


If you notice, not a single optional parameter is mentioned by name in the body of the function. It magically creates a list of the optional parameters and only adds the ones that aren't the empty string.  So you can update the list of parameters and never touch the body of the code.  This was the hardest one to write, now that I have this as an example, they get trivially easy moving forward.

I started by testing just the agent alone where it gets automatically promoted to a workflow by using the singleton workflow. It took me a few hours to realize that the reason I could not add an optional parameter was because of a bug in the create singleton where it needs to compare the agents optional parameters to the cli_args list and add inputs to both the workflow step parameters, but also add them as an input parameter in the step, mapped to the input in the workflow.





Saturday, August 17, 2024

More than doubled the speed of my neural network training using cblas and SSE.

This was actually very easy and strait forward to do. 




void add_biases_simd(double *output, double *biases, int output_size) {
for (int i = 0; i < output_size; i += 4) {
__m128d voutput = _mm_loadu_pd(&output[i]);
__m128d vbiases = _mm_loadu_pd(&biases[i]);
voutput = _mm_add_pd(voutput, vbiases);
_mm_storeu_pd(&output[i], voutput);
}
}

void forward_layer(double *input, double *weights, double *biases, int input_size, int output_size, double *output, activation_function activate, ActivationFunction activation) {
// Use SIMD or BLAS for matrix multiplication
cblas_dgemv(CblasRowMajor, CblasNoTrans, output_size, input_size, 1.0, weights, input_size, input, 1, 0.0, output, 1);
// Add biases using SIMD
add_biases_simd(output, biases, output_size);
// Apply activation function
if (activation == ACTIVATION_SOFTMAX) {
softmax(output, output, output_size);
} else if (activate) {
for (int i = 0; i < output_size; ++i) {
output[i] = activate(output[i]);
}
}
}
void forward_layer_simple(double *input, double *weights, double *biases, int input_size, int output_size, double *output, activation_function activate, ActivationFunction activation)
{
for (int i = 0; i < output_size; ++i)
{
output[i] = 0.0;
for (int j = 0; j < input_size; ++j)
{output[i] += input[j] * weights[i * input_size + j]; }
output[i] += biases[i];
}

if (activation == ACTIVATION_SOFTMAX)
{ softmax(output, output, output_size);
} else if (activate) {
for (int i = 0; i < output_size; ++i)
{ output[i] = activate(output[i]); }
}
}


This an improvement of the backward pass.  I now need to implement my optimizations in terms of sse and cblas.  I am not concerned about keeping these old functions around because I won't be running the backward pass on embedded machines. 


void backward_pass(NeuralNet *net, double *input, double *expected, double *output) {
int i, j, k;
Layer *output_layer = &net->layers[net->num_layers - 1];

// Calculate output layer errors
if (output_layer->activation == ACTIVATION_SOFTMAX) {
for (i = 0; i < output_layer->output_size; ++i) {
output_layer->errors[i] = output[i] - expected[i];
}
} else {
for (i = 0; i < output_layer->output_size; ++i) {
output_layer->errors[i] = net->calculate_error_derivative(net, output[i], expected[i]) *
output_layer->activate_derivative(output_layer->activations[i]);
}
}

// Backpropagate errors and update weights
for (i = net->num_layers - 1; i > 0; --i) {
Layer *current_layer = &net->layers[i];
Layer *prev_layer = &net->layers[i - 1];

// Calculate gradients and update weights and biases
for (j = 0; j < current_layer->output_size; ++j) {
for (k = 0; k < current_layer->input_size; k += 2) {
int weight_index = j * current_layer->input_size + k;
double gradient0 = current_layer->errors[j] * prev_layer->activations[k] / net->batchsize;
double gradient1 = current_layer->errors[j] * prev_layer->activations[k + 1] / net->batchsize;

double jitter0 = jitter(net);
double jitter1 = jitter(net);

// Use SSE2 to update weights in pairs
__m128d grad = _mm_set_pd(gradient1 - jitter1, gradient0 - jitter0);
__m128d weight = _mm_loadu_pd(&current_layer->weights[weight_index]);
__m128d update = _mm_sub_pd(weight, _mm_mul_pd(grad, _mm_set1_pd(net->learningrate)));

_mm_storeu_pd(&current_layer->weights[weight_index], update);
}
// Update biases with jitter
double bias_update = current_layer->errors[j] / net->batchsize - jitter(net);
current_layer->biases[j] -= net->learningrate * bias_update;
}

// Calculate errors for previous layer (if not input layer)
if (i > 1) {
cblas_dgemv(CblasRowMajor, CblasTrans, current_layer->output_size, current_layer->input_size, 1.0,
current_layer->weights, current_layer->input_size, current_layer->errors, 1, 0.0,
prev_layer->errors, 1);

for (j = 0; j < prev_layer->output_size; ++j) {
prev_layer->errors[j] *= prev_layer->activate_derivative(prev_layer->activations[j]);
}
}
}
}


Now, I did lose the optimized updates to weights and biases.  They now have to be rewritten in terms of  sse and cblas.