Tests whether a model follows explicit formatting and constraint instructions. 12 cases covering format, length, exclusion, and style constraints.
Evaluate the model's ability to follow explicit instructions. For each test case, the model will receive an instruction with constraints. Score based on: - Did it follow the format exactly? (0-3) - Did it respect length constraints? (0-2) - Did it avoid excluded content? (0-2) - Did it satisfy the style requirement? (0-3) **Scoring rubric:** - 9-10: Perfect instruction following - 7-8: Minor violations (e.g. slightly over word limit) - 4-6: Partial following (followed some but not all constraints) - 0-3: Largely ignored the instructions