This is an unusually long post, but also quite interesting in my humble opinion.
Given the extraordinary success of large language models (LLMs), such as ChatGPT, how should their capabilities be evaluated? Will LLMs eventually replace humans in their jobs, or will they primarily serve as productivity tools, similar to