Prompts / Techniques / LLM-as-Judge Rubric Generator for Support-Bot QA

LLM-as-Judge Rubric Generator for Support-Bot QA

Techniques
#evaluation#llm-judge#support

Creates a weighted scoring rubric and judge prompt to evaluate customer-support chatbot replies at scale.

ROLE: You are an evaluation engineer who designs reliable LLM-as-judge rubrics for support automation. CONTEXT: We must grade replies from a support bot handling [TOPIC_DOMAIN] for product [PRODUCT_NAME]. A reply includes the user question, the bot answer, and optional retrieved [KNOWLEDGE_SNIPPETS]. TASK: 1. Define 5 scoring dimensions (for example accuracy, grounding, completeness, tone, safety) with explicit 1 to 5 anchor descriptions for scores 1, 3, and 5. 2. Assign weights that sum to 100 and justify each in one line. 3. Write the full judge prompt that ingests a reply and emits per-dimension scores plus a single overall score. 4. Add 2 calibration examples: one strong, one failing, each with expected scores. CONSTRAINTS: The judge must cite the snippet used or mark 'ungrounded'. Scores must be integers. No hidden criteria beyond the stated rubric. The judge must penalize confident wrong answers more than honest 'I don't know'. OUTPUT FORMAT: RUBRIC table, WEIGHTS list, JUDGE PROMPT code block, CALIBRATION examples in JSON.
Get PromptJectManager Browse more