Discussion about this post

User's avatar
Keller Scholl's avatar

You're far too harsh:

1. On a fixed budget, I think a higher number of tasks, even with a lower number of participant-completions, was the right move. Yes, they've got very few, but 1 would be non-zero informative and 3 is in fact something. And I'd much rather 3 people doing 100 tasks than 100 people doing 3 tasks.

2. 50% reliability seems reasonable if your plan is "set one of my 5-8 simultaneously running agents off to do something, check back with it in a few hours". And that seems to be the workflow of many AI users I know. So long as the task is checkable...

3. One of the benefits of recruiting from people in your social network, rather than randoms you're paying, is that they're not going to deliberately slow their work, and you have some evidence of competence.

Anatol Wegner, PhD's avatar

If anyone is interested I did a critical analysis of the METR paper from a statistics point of view a while ago: https://aichats.substack.com/p/are-ai-time-horizon-doubling-every?r=4tn68o

18 more comments...

No posts

Ready for more?