Skip to content
0.5455
Chimera Difficulty Score
a synthesis of Flesch-Kincaid, Coleman-Liau, SMOG, and Dale-Chall readability metrics
AI Policy & Governance, CDT AI Governance Lab CDT Submits Comments on NIST’s Draft Guidance for Automated Benchmark Evaluations of Language Models The Center for Democracy & Technology (CDT) submitted comments in response to the Center for AI Standards and Innovation (CAISI) at the National Institute of Standards and Technology’s (NIST) request for comment on their draft guidance on Practices for ...
The CDT's comments highlight several areas where CAISI could strengthen the guidance in future iterations. These include framing evaluation development as an iterative process, integrating evaluation documentation into existing artifacts like model cards and system cards, addressing subjective evaluations, and providing more detailed guidance on managing limitations of LLM-as-a-judge methods. The analysis suggests that by focusing on these areas, CAISI can further promote the design of assessmen...