I've been a computer scientist in the field of AI for almost 15 years. Much of my time has been devoted to classical AI; things like planning, reasoning, clustering, induction, logic, etc. This...
I've been a computer scientist in the field of AI for almost 15 years. Much of my time has been devoted to classical AI; things like planning, reasoning, clustering, induction, logic, etc. This has included (but had rarely been my focus) machine learning tasks (lots of Case-Based Reasoning). For whatever reason though, the deep learning trend never really interested me until recently. It really just felt like they were claiming huge AI advancements when all they really found was an impressive way to store learned data (I know this is an understatement).
Over time my opinion on that has changed slightly, and I have been blown away with the boom that is happening with transformers (GPTs specifically) and large language models. Open source projects are creating models comparable to OpenAIs behemoths with far less training and parameters which is making me take another look into GPTs.
What I find surprising though is that they seem to have only experimented with language. As far as I understand the inputs/outputs, the language is tokenized into bytes before prediction anyway. Why does it seem like (or rather the community act like) the technology can only be used for LLMs?
For example, what about a planning domain? You can specify actions in a domain in such a manner that tokenization would be trivial, and have far fewer tokens then raw text. Similarly you could generate a near infinite amount of training data if you wanted via other planning algorithms or simulations. Is there some obvious flaw I'm not seeing? Other examples might include behavior and/or state prediction.
I'm not saying that out of the box a standard GPT architecture is a guaranteed success for plan learning/planning... But it seems like it should be viable and no one is trying?