Backed by Nightly Metrics
We run the benchmark every night and publish the scorecard so you can guarantee quality to stakeholders.
Every escalation justification ties back to the conversation transcript.
Guardrail signals catch hallucinated escalation reasons during testing.
Faithfulness stays conservative so you can detect drifts before go-live.
What’s Inside
Production Prompt Template
Editable Markdown prompt that captures sentiment, priority, and escalation rationale. Plug into OpenAI, Anthropic, or Bedrock.
Gold Dataset
Sample transcripts with expected escalation outcomes so you can A/B test models and prompt tweaks quickly.
Automated Scoring
CLI plus GitHub Actions workflow output JSON scorecards (`*.metrics.json`) for your analytics stack.
Launch Playbook
Docs that cover integration, API key management, and monetization angles for agencies and freelancers.
Choose Your Package
Starter — $149
Prompt, dataset, runbook, and ready-to-run evaluation scripts.
Professional — $299
Starter + nightly GitHub workflow template + quarterly prompt refreshes.
Enterprise — $699
Professional + onboarding workshop, SLA-backed regression analysis, and white-label rights.
How It Works
- Clone the repository or download the release zip.
- Copy `.env.example` to `.env`, set `OPENAI_API_KEY`, and run a dry evaluation.
- Score with Ragas to generate the `.metrics.json` scorecard.
- Ship to your clients and reuse the harness for every future workflow.
python scripts/evaluate_prompt.py prompts/templates/support/escalation.md --dataset examples/evals/escalation.jsonl --dry-run
python scripts/evaluators/ragas_runner.py reports/escalation/<timestamp>.json --dataset examples/evals/escalation.jsonl
Trusted by High-Velocity Support Teams
Add testimonials as you close customers. Early buyers get lifetime updates — ask us about partner pricing.