Automatic Evaluation of H
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
arXiv:2502.06666v1 Announce Type: new
Abstract: Current Large Language Models (LLMs) benchmarks are often based on open-ended or close-ended QA evaluations, avoiding the requirement of human labor. Close-ended measurements evaluate the factuality of responses but lack expressiveness. Open-ended cap…