Recently, major breakthroughs have been made in the field of natural language neural networks by using larger and deeper neural models with a new kind of fast computational “transformers” attention mechanism architecture. This replaced the previous, highly computationally intensive feedback (recurrent) network types. The task of these neural networks is to predict the next token (“syllable”) of a text input of a fixed maximum length, which can be iterated to generate a text of any length. The task makes it a clever autocomplete, but they have found that if you train a large and deep enough network on it, with enough data and iterations, it will build its own internal logic for solving the task at the human level, about the relationships between tokens and with them words, and the world they describe! In this way, they have gained human-level understanding and communication skills.
Language models input
As we have already seen, for neural networks, we always need to translate the problem numerically, for a fixed maximum number of input neurons. In the present case, we first need to decompose the text into tokens, which will be mostly short, frequently occurring words, possibly word fragments. The point is that they should be small enough to be individually descriptive, but still be able to be used to build up any text. The tokens are designed to preserve the structure and content of the text, while providing better separation, comprehension and resource requirements.
Each token can be assigned a series of numbers (vector) to embed the input text. The text is tokenised and then vectorised as input to the neural network. There are many libraries available to perform such steps, especially in Python. Different models and input problems require different tokenization and embedding, but fortunately, specifications and open source conversion solutions are available for almost all of them!
State of the art models
The best current models are closed source and only available from large companies. They cost millions of dollars in electricity alone to train, and they are not cheap to run either, as they require a super computer. Unfortunately, if you want the best AI available, you have to pay a per-use fee.
The most advanced AI model currently available is the Chat GPT-4. There is no information available about its architecture, we can only communicate with it through an API, basically a “black box”. Chat GPT-3.5, which is free from the web browser, has similar (albeit dumber) capabilities, but it is also paid for when used through an API (to integrate with other applications). In this case, however, you have access to much more developer features: on the one hand, you can customize the initial prompt describing the “behaviour” and “personality” of the model, and you can change the generation parameters, e.g. how much it can be random (temperature), how much it can deviate from the topic and be “creative” (Top p – nucleus sampling).
Limitations and possible solutions
In practical use, the biggest problem with LLM-based AIs is that their response is unstructured and can only rely on the memory of the base, which is available to anyone, learned from open data, which, like humans, is quite unreliable at times. It then tends to “hallucinate” untruths. For security reasons, they are mostly not taught data that was created recently, before the AI “revolution”. This both makes it difficult to deliberately misinform and avoids the backwardness of teaching AI-generated content. In addition to these, its active memory (max input token length) is limited, so it may not be able to be given all the source material (e.g. full books, documentation or source code) at once, and it may generate potentially offensive or dangerous stuff, which could be problematic for a company.
Reinforcement Learning from Human Feedback (HLRF)
Since the LLM models were taught by default to continue free-finding texts, they initially just continued the prompt as if it were an excerpt from a book or article, and tended to deviate from the topic. The knowledge, logic and intelligence were already behind it, because the continued text was coherent and meaningful, but this behaviour was not optimal for real use. The model had to be taught to answer questions in chatbot style and not to say content that was perceived as harmful or offensive.
To solve this, the researchers wrote questions that were answered by humans, in the style of a chatbot. They then fine-tuned the network with human responses using fine-tuning training until all responses were written in that style. The openly available LLM networks have almost all undergone this “conditioning”.
However, it is worth bearing in mind that this is not a guaranteed solution! Just as the neural network itself is a “black box” with an unknown internal logic, its effects can only be verified by practical tests, but this does not mean that the network really works according to the logic and worldview of the human example responses; only that it responds according to their rules! Fine tuning only affects the last layers of the neural net, not the logic and conceptual relationships in the deeper layers, and, according to the principle of least effect, it is possible that the real, internal functioning of the model is not “tuned” to the functioning “embodied” by the HLRF, but is simpler just to behave and respond in a semi-intentional “lie” way, because it is much simpler than reorganising its entire functioning. There are indications of this, such as: the ability of various constraint circumventing prompt “hacks” to work, and that with over-engineered HLRF the cognitive ability of the model is impaired (because it contradicts its “world picture” and internal logic).
Structured responses with function call
As we have already seen, with HLRF we can make the net generate a specific style of response! OpenAI has trained a GPT variant that writes a response in JSON format, which is standard in programming, with standardized parameters that we predefine! We ask it a natural language human question, and then it answers in JSON, which is easy to understand for conventional programs, filling in the given parts according to the given question, e.g.: structured query of a movie’s data:
Create a function and design a response structure:
description: “Get detailed movie data”,
“description”: “Release date of the movie in Day.Month.Year format”
“description”: “Director of the movie”
“description”: “Provide a brief summary of the movie!”
“required”: [“release_date”, “director”, “summary”]
“Give me informations about a movie called: Star Wars”
Chat GPT response in JSON:
“director”: “George Lucas”,
“summary”: “Star Wars is a space opera film series created by George Lucas. The first film, later subtitled Episode IV – A New Hope, was released on May 25, 1977. It became a worldwide pop culture phenomenon and has since been followed by multiple sequels, prequels, spin-offs, and animated films.”
From the resulting data, it is even possible to run conventional code, the AI can decide at the prompt whether it is necessary to run a function that is entirely in our hands (e.g. an internal database search), and then return the results to it, and formulate them into a coherent text response.
This not only eliminates the ambiguity of the answer, but also increases accuracy by inserting real data into the AI generated answer!
Learning from your own data with fine tuning
At OpenAI we have the possibility to upload our own documents and generate our own fine tuned GPT model with specific knowledge! This will happen relatively quickly on supercomputers, and the model will have a relatively good knowledge of the document. The custom GPT can be called via API in the same way as the base GPT, and can also be used for function calls! This technique is relatively costly, but it can be used to process almost any amount of data, no need to adjust to the max prompt size! Another disadvantage is that unfortunately it tends to hallucinate in the same way.
In this case we use the general intelligence of the model, in a zero shot way, to avoid the max input token limit without fine tuning! The Lang chain procedure splits the text into snippets of max prompt length, then runs through them one by one, extracting the gist corresponding to the original user prompt and generating a meaningful response from it.
This has the advantage that no training is required, it is less prone to hallucinations as it works from “fresh” data, and the neural model used underneath can be easily replaced. The disadvantage is that if the information you are looking for is fragmented or much depends on context, you may not find it. In addition, it needs to be run (inferenced) many more times to see all the original text details again. This has to be done at each user prompt, so the long-term payback is an optimal problem between fine tuning and this.
Of course it is possible to use a fine tuned LLM with lang chain technique for maximum efficiency! The future is likely to go in this direction, the AI’s memory and the correctness of its response can be checked using traditional databases and search engines. It could even be used to make the AI reconsider its position based on its own answer and search results, and so on until it is convinced of its correctness. This is one of the main drivers of search giant companies’ interest in AI! It’s no longer about finding the information, it’s about extracting the knowledge!
Locally executable, open source developments
As we have seen, the field of LLM-based AI is still very much under research! Interestingly, most of the developments are not coming from closed companies, but from open source developments. Almost all the research results are in the public domain, the biggest challenge is not the mechanism of operation, but the practical training and data acquisition, due to the resource requirements! Closed developments are relatively quickly left behind because they are not accessible to researchers from the rest of the world, forcing all but the biggest players to “play with an open hand”. As a result, there are many open-source, machine-runnable and teachable LLM AI models today that can be used for any purpose! The best of these is Meta’s LLaMA2 model, the largest version of which contains 65 billion parameters, the smallest 7. For reference, ChatGPT has 175 billion! They are surprisingly close in terms of knowledge, the difference is more in factual knowledge. Even the 7B model has usable intellectual capabilities, but the 13B model can now compete with humans in some areas!
Recent research trends show that increasing the size of a model increases accuracy with diminishing returns, and OpenAI has decided not to increase the size further (it would be too expensive to run), but to make the most of existing models with new learning methods and better data quality. This is great news for us, as it means that locally-run AI models remain a valid option for the future, as well as scaling quasi-logarithmically: small models are not much dumber than big ones. They can be used to create special, expert models, as well as fine tune for other tasks!
Practical use of large language models
We should see them as interactive knowledge bases! Programming can essentially be divided into 3 major parts: getting, processing and displaying data. For processing, they are guaranteed to be usable, able to translate, infer, answer questions based on text, and follow instructions. They can synthesise new data based on a template, or understand user preferences and make smart offers. They can be used to simplify communication with users, help personalise content, or be involved in the intelligent control of machines or processes, which can be controlled by narration or description! They can be used to write simple programs, or even to create or review documentation. Every area where we work with data can be used for something. AI can build on existing prior knowledge and be involved in almost any text or number-based processing. Even in health data analysis, e.g.: diagnosing ECG signals!
They can even be used for forecasting based on numerical (time series) values.
The figure below shows a GPT generated zero shot (no pre-training) time series forecast. The green is the trained part and the red is the extrapolated part. You can clearly see how the AI took into account the increasing trend and the structure of the periods!
Large Language Models (LLMs) have revolutionised the field of natural language processing, allowing us to create interactive knowledge bases and many other applications. These LLMs are able to decompose and process text, generate responses, and even generate structured responses using function calls. Human feedback learning and fine tuning allow us to get more accurate and sophisticated responses from them.
However, it is important to note that these models also have limitations. The information is only derived from basic data and in some cases they may be prone to “hallucinating” untruths. They are not taught current, up-to-date information for security reasons, and their active memory is limited. Companies that deploy these AIs in industry need to ensure that the responses are reliable and safe.
Open source developments and locally executable models give the community the opportunity to further develop and tailor LLMs to their own needs. In the future, the focus will be on improving data quality and learning methods rather than further increasing the size of models, which means that locally executable AI models will remain viable solutions.
We need to treat LLMs as interactive knowledge bases that help us to work with data, communicate with machines and use them in many other ways. The development and application of AI is in its early stages and could bring exciting opportunities in the future.
Your Arteries colleagues will be happy to consult with you on how you can apply the big language models in a unique way to your business, product or service. The possibilities are almost limitless.
If you need a specialist, we’re here.