But they do make sure to spend a page and half talking about how they vewwy carefuwwy tested to make sure that it doesn't have "emergent properties" that would let is "create and act on long-term plans" (sec 2.9).
>>
Top-level
But they do make sure to spend a page and half talking about how they vewwy carefuwwy tested to make sure that it doesn't have "emergent properties" that would let is "create and act on long-term plans" (sec 2.9). >> 1 comment
|
I also lol'ed at "GPT-4 was evaluated on a variety of exams originally designed for humans": They seem to think this is a point of pride, but it's actually a scientific failure. No one has established the construct validity of these "exams" vis a vis language models.
For more on missing construct validity and how it undermines claims of 'general' 'AI' capabilities, see:
https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/084b6fbb10729ed4da8c3d3f5a3ae7c9-Abstract-round2.html
>>