Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe everyone should run their own evals on their own tasks or use cases.

Shameless plug, but I made a simple app for anyone to create their own evals locally:

https://eval.16x.engineer/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: