AI Policy & Governance, CDT AI Governance Lab
CDT Submits Comments on NIST’s Draft Guidance for Automated Benchmark Evaluations of Language Models
The Center for Democracy & Technology (CDT) submitted comments in response to the Center for AI Standards and Innovation (CAISI) at the National Institute of Standards and Technology’s (NIST) request for comment on their draft guidance on Practices for ...
The CDT's comments highlight several areas where CAISI could strengthen the guidance in future iterations. These include framing evaluation development as an iterative process, integrating evaluation documentation into existing artifacts like model cards and system cards, addressing subjective evaluations, and providing more detailed guidance on managing limitations of LLM-as-a-judge methods. The analysis suggests that by focusing on these areas, CAISI can further promote the design of assessmen...
