Saturday, August 10, 2024

 Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.
https://techxplore.com/news/2024-08-inbred-gibberish-mad-ai.html?utm_source=nwletter&utm_medium=email&utm_campaign=daily-nwletter

This sounds remarkably like the behaviour of some quiz masters, who make an enquiry on a search engine, and look no further than the listed results, which include the terms used in the query, but don't necessarily confirm the answer.

Another example is the lions at Battle Abbey query, quoted in  https://www.academia.edu/116629864/Artificial_Intelligence_the_Way_Forward

And I am sure there are thousands, if not millions more.