Rain

Modeling Human Beliefs a

Modeling Human Beliefs ab
Modeling Human Beliefs about AI Behavior for Scalable Oversight

Contemporary work in AI alignment often relies on human feedback to teach AI systems human preferences and values. Yet as AI systems grow more capable, human feedback becomes increasingly unreliable. This raises the problem of scalable oversight: How can we supervise AI systems that exceed human cap…

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *