Creating an API-Free Machine Learning Workflow with MLE-Agent and Ollama

In this tutorial, we’ll illustrate how to integrate MLE-Agent with Ollama to create a fully local machine learning workflow that operates without the need for external APIs. By using Google Colab, we’ll set up a reproducible environment, generate a synthetic dataset, and guide our MLE-Agent to draft a training script. Throughout this process, we’ll also implement strategies for sanitizing code, ensure correct imports, and develop a robust fallback script. This way, we can streamline our workflow while still benefiting from the power of automation in machine learning.

Setting Up Our Environment

To get started, we first need to establish our working environment in Google Colab. The initial imports and function definitions will help us execute shell commands, making our Python scripts more interactive and responsive.

python
import os, re, time, textwrap, subprocess, sys
from pathlib import Path

def sh(cmd, check=True, env=None, cwd=None):
print(f"$ {cmd}")
p = subprocess.run(cmd, shell=True, env={os.environ, (env or {})} if env else None,
cwd=cwd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
print(p.stdout)
if check and p.returncode != 0:
raise RuntimeError(p.stdout)
return p.stdout

This sh() function will execute shell commands and print their outputs in real-time. It raises an error if any command fails, allowing us to monitor the execution process efficiently.

Defining Our Workspace

Next, we define our workspace directories and file paths. This setup includes paths for our dataset, model files, and training scripts. Additionally, we will install the necessary Python packages.

python
WORK = Path("/content/mle_colab_demo")
WORK.mkdir(parents=True, exist_ok=True)
PROJ = WORK / "proj"
PROJ.mkdir(exist_ok=True)
DATA = WORK / "data.csv"
MODEL = WORK / "model.joblib"
PREDS = WORK / "preds.csv"
SAFE = WORK / "train_safe.py"
RAW = WORK / "agent_train_raw.py"
FINAL = WORK / "train.py"
MODEL_NAME = os.environ.get("OLLAMA_MODEL", "llama3.2:1b")

sh("pip -q install –upgrade pip")
sh("pip -q install mle-agent==0.4.* scikit-learn pandas numpy joblib")
sh("curl -fsSL https://ollama.com/install.sh | sh")
sv = subprocess.Popen("ollama serve", shell=True)
time.sleep(4)
sh(f"ollama pull {MODEL_NAME}")

This section of our script ensures that we have all the necessary Python dependencies installed and prepares our local Ollama environment. We initiate the server to allow local model processing without needing external API keys.

Generating the Synthetic Dataset

To train our model effectively, we’ll first need to create a synthetic dataset:

python
import numpy as np, pandas as pd

np.random.seed(0)
n = 500
X = np.random.rand(n, 4)
y = ([0.4, -0.2, 0.1, 0.5] @ X.T + 0.15 * np.random.randn(n) > 0.55).astype(int)
pd.DataFrame(np.c_[X, y], columns=["f1", "f2", "f3", "f4", "target"]).to_csv(DATA, index=False)

Here, we generate 500 samples with four features and a target variable based on a linear combination of the features, effectively setting up our dataset for training.

Configuring MLE-Agent with Ollama

Next, we set environment variables and construct a strict prompt that will instruct MLE-Agent to generate the train.py script.

python
env = {
"OPENAI_API_KEY": "",
"ANTHROPIC_API_KEY": "",
"GEMINI_API_KEY": "",
"OLLAMA_HOST": "http://127.0.0.1:11434",
"MLE_LLM_ENGINE": "ollama",
"MLE_MODEL": MODEL_NAME
}

prompt = f"""
Return ONE fenced python code block only. Write train.py that reads {DATA}; 80/20 split (random_state=42, stratify);
Pipeline: SimpleImputer + StandardScaler + LogisticRegression(class_weight="balanced", max_iter=1000, random_state=42);
Print ROC-AUC & F1; print sorted coefficient magnitudes; save model to {MODEL} and preds to {PREDS};
Use only sklearn, pandas, numpy, joblib; no extra text.
"""

def extract(txt: str) -> str | None:
txt = re.sub(r"x1B[[0-?][ -/][@-~]", "", txt)
m = re.search(r"(?:python)?s([sS]?)", txt, re.I)
if m:
return m.group(1).strip()
if txt.strip().lower().startswith("python"):
return txt.strip()[6:].strip()
m = re.search(r"(?:^|n)(froms+[^n]+|imports+[^n]+)([sS]*)", txt);
return (m.group(1) + m.group(2)).strip() if m else None

out = sh(f’printf %s "{prompt}" | mle chat’, check=False, cwd=str(PROJ), env=env)
code = extract(out) or sh(f’printf %s "{prompt}" | ollama run {MODEL_NAME}’, check=False, env=env)
code = extract(code) if code and not isinstance(code, str) else (code or "")
(Path(RAW)).write_text(code or "", encoding="utf-8")

This segment guides MLE-Agent through the code generation process, relying on both MLE-Agent’s capabilities as well as Ollama’s if needed. The generated script, if any, is saved for further sanitization.

Sanitizing the Generated Script

Next up, we sanitize the generated script to ensure it adheres to coding standards and avoids common pitfalls:

python
def sanitize(src: str) -> str:
if not src:
return ""

s = src
fixes = {
    r"froms+sklearn.pipelines+imports+SimpleImputer": "from sklearn.impute import SimpleImputer",
    r"froms+sklearn.preprocessings+imports+SimpleImputer": "from sklearn.impute import SimpleImputer",
    # Add other common data sanitation fixes here
}

for pat, rep in fixes.items():
    s = re.sub(pat, rep, s)

if "SimpleImputer" in s and "from sklearn.impute import SimpleImputer" not in s:
    s = "from sklearn.impute import SimpleImputer\n" + s

return s

san = sanitize(code)

safe = textwrap.dedent(f"""
import pandas as pd
import numpy as np
import joblib
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, f1_score
from sklearn.compose import ColumnTransformer

DATA = Path("{DATA}")
MODEL = Path("{MODEL}")
PREDS = Path("{PREDS}")

df = pd.read_csv(DATA)
X = df.drop(columns=["target"])
y = df["target"].astype(int)

Add the rest of your deterministic pipeline here

""").strip()

This function scans for and auto-fixes common errors in the script, ensuring it runs cleanly without missing imports. We also prepare a safe version of the training script as a fallback.

Finalizing and Running the Training Script

Finally, we decide whether to run the sanitized or safe script based on the quality of the generated code:

python
chosen = san if ("import " in san and "sklearn" in san and "read_csv" in san) else safe
Path(SAFE).write_text(safe, encoding="utf-8")
Path(FINAL).write_text(chosen, encoding="utf-8")
print("Using train.py (first 800 chars):", chosen[:800])

sh(f"python {FINAL}")
print("Artifacts:", [str(p) for p in WORK.glob(‘*’)])

By executing this code, we can evaluate the performance metrics of our model, print the coefficients, and save all necessary files and outputs.

Through this structured approach, we can see how integrating local LLMs like MLE-Agent with traditional ML pipelines promotes reliability and unconditional control over the entire machine learning process, eliminating the need for external API calls.

Feel free to explore the FULL CODES here and take a look at our GitHub Page for Tutorials. Follow us on Twitter and join the 100k+ ML SubReddit to stay updated on machine learning trends!

The Symbolic Strategy Letter

Premium features

Creating a Robust Local Machine Learning Pipeline with MLE-Agent and Ollama

Creating an API-Free Machine Learning Workflow with MLE-Agent and Ollama

Setting Up Our Environment

Defining Our Workspace

Generating the Synthetic Dataset

Configuring MLE-Agent with Ollama

Sanitizing the Generated Script

Add the rest of your deterministic pipeline here

Finalizing and Running the Training Script

Table of contents [hide]

Boosting Results: Merging Computer Science with Culturally Responsive Education

Unlocking Consumer Insights: 3 Ways Retail Banks Can Leverage Natural Language Processing

Netflix Expands Its Generative AI Strategy for Streaming and Production

How to Create a Client Onboarding Checklist for Freelancers

Amazon Launches AI-Enhanced Augmented Reality Glasses for Delivery Drivers

Related updates

Exploring SU(d)-Symmetric Random Unitaries: Quantum Scrambling, Error Correction, and Machine Learning

Predicting N2 Lymph Node Metastasis in Non-Small Cell Lung Cancer Using Machine Learning

Interpretable Machine Learning for Classifying Metal Passivity from Minimal EIS Data

Optimizing Lithofacies Prediction in the Lower Goru Formation Using Diverse Machine Learning Algorithms

Boosting Results: Merging Computer Science with Culturally Responsive Education

Unlocking Consumer Insights: 3 Ways Retail Banks Can Leverage...

Netflix Expands Its Generative AI Strategy for Streaming and...

UC Davis Invites Participants to BCI Speech Prediction Challenge

Comprehensive Market Insights: Analysis, Trends, Forecasts, and Growth Strategies

Opioid Metabolite Screening Powered by Machine Learning