Model rerouting

Route requests to the cheapest model that meets your reliability threshold. When risk is high, reroute to a stronger model.

Overview

Problem

A single-model policy either wastes money (always using the best model) or ships errors (always using the cheapest model).

Reality Signal

Convert a score into prob_est + uncertainty and a boolean decision.

Policy

Try a fast/cheap model first. If prob_est is below threshold or uncertainty is high, reroute to a stronger model and re-run the task.

Architecture

Base model produces a score for the decision you care about.
Send the score to /decide.
Use prob_est + uncertainty to route safely.
When the true outcome is known, call /feedback to improve calibration.

Routing actions: automate, reroute, reprompt, or escalate.

1) Decide whether to reroute

Call /decide with your model’s score. Route based on calibrated probability and uncertainty.

Request

bash

curl -X POST https://onprem-api-sowl.jollysand-1b9ed42e.swedencentral.azurecontainerapps.io/decide \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "features": { "score": 0.82 }
  }'

Response

json

{
  "decision_id": 123,
  "prob_est": 0.62,
  "uncertainty": 0.08,
  "decision": false
}

Use prob_est + uncertainty as routing signals.

2) Feedback from final outcome

When ground truth is known, send /feedback with the decision_id so calibration improves over time.

Feedback (cURL)

bash

curl -X POST https://onprem-api-sowl.jollysand-1b9ed42e.swedencentral.azurecontainerapps.io/feedback \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "decision_id": 123,
    "feedback": 1,
    "force_retrain": false
  }'

Reference implementation

Python

python

import requests

API_URL = "https://onprem-api-sowl.jollysand-1b9ed42e.swedencentral.azurecontainerapps.io"
HEADERS = {"x-api-key": "YOUR_API_KEY"}

def rc_decide(score: float):
    r = requests.post(f"{API_URL}/decide", json={"features": {"score": score}, headers=HEADERS)
    r.raise_for_status()
    return r.json()

answer, score = cheap_model(prompt)
rc = rc_decide(score)

if rc["decision"] and rc["uncertainty"] <= 0.15:
    final = answer
else:
    final = strong_model(prompt)

If you also reprompt, treat reprompt as an intermediate action before rerouting.