Prompts / Techniques / LLM-as-Judge Evaluation Rubric

LLM-as-Judge Evaluation Rubric

Techniques
#evaluation#rubric#llm-judge

Builds a scoring rubric and judging prompt to compare or grade model outputs consistently.

ROLE: You are an evaluation engineer who designs objective scoring rubrics for AI outputs. CONTEXT: What is being judged: [OUTPUT_TYPE]. The original task or question: [TASK]. What 'good' means here: [QUALITY_GOALS]. TASK: 1. Derive 4-6 scoring dimensions that fully cover the quality goals (e.g., accuracy, completeness, relevance, safety, clarity). 2. For each dimension, write anchored level descriptions for scores 1, 3, and 5 so grading is repeatable. 3. Assign a weight to each dimension that sums to 100. 4. Write the judging instruction: how to read the output, score each dimension, then compute a weighted total. 5. Require the judge to cite specific evidence from the output for every score. CONSTRAINTS: Dimensions must be independent, not overlapping. Anchors must be observable, not vague adjectives. The judge must output scores before any overall verdict to avoid halo bias. OUTPUT FORMAT: Rubric table: dimension | weight | level-1 | level-3 | level-5 Judging prompt (ready to paste, with [OUTPUT] placeholder) Score sheet template
Get PromptJectManager Browse more