Differences Between Probing A Probabilistic Model For Knowledge (Chat GPT) Vs Querying A Curated Knowledge Graph
Stephen Wolfram has written a wonderful essay about differences between probing a probabilistic model for knowledge (Chat GPT) vs querying a curated knowledge graph (Wolfram Alpha).
If Chat GPT could be trained to generate queries for Wolfram Alpha for relevant instructions. As a result, the two systems could work in a complementary manner.
AI Researchers love to train tasks in an end-to-end manner. There’s an input and you have a desired output and the model learns to get the right output as it sees more examples. Intelligence of the model becomes spread across the billions of parameters that learn this task. One may argue that this is more human like.
But we humans have learnt abstractions, formulae, chords, legal precedents to be able to break a problem down into parts and solve it piece by piece. We picture our solution as a sum of the parts. And when someone asks us to explain, we can break it down for them.
On the other hand, once an AI language model like GPT-3 or Flan-T5 is trained from all the text in the internet. Furthermore, they become further trained end-to-end with a couple of thousand datasets. To produce a meaningful output given an input (instruction). I believe breaking these down into a pipeline of conceptual tasks with each task relying on a model, information lookup from a dataset, API code, or self generated code would be a more modular way of training and we can have models with lesser parameters perform as intelligently as GPT-3.
In conclusion, this is similar to training a model to create new music simply by sampling audio as opposed teaching a model about notes, chords and chord progressions. Wouldn’t it be more efficient to teach the theory first and then let it generate from that knowledge? Lastly, this is what Bayesian learning is! Where the model learns by starting with priors based on a model of the world and use the data to train the priors. Hopefully someday soon we will be seeing hybrid models with emphasis on concept learning.