@simon That is a superb use of LLMs. I've seen a lot of text-classification tasks (that previously required expensive model training) can now be done rather cheaply using LLMs + engineered prompts. Cost and development velocity has improved quite a bit with this new LLM-as-rater approach compared to previous approaches of custom-model-training.

The next bottleneck is human evals, but I guess we can't completely remove them until LLMs stop making mistakes.