8 min read

TechTalk: LLMs and ChatGPT Overview

TechTalk: LLMs and ChatGPT Overview
Imagined by MidJourney

Agenda:

LLMs:

What are LLMs?

LLMs, or Large Language Models, are advanced AI systems capable of processing and understanding human language on a vast scale.

  • These models utilize complex algorithms and neural networks to analyze and generate natural language.
  • Trained on massive amounts of text data.
  • Used for a wide range of applications, including language translation, chatbots, content generation, and much more. Famous LLMs
    GPT-3,BERT,XLNet,RoBERTa,T5


Interesting algorithms used in LLMs

Word embedding:
Word embedding is a type of algorithm used in natural language processing (NLP) that maps words or phrases to vectors of numerical values. The idea is to create a dense representation of words in a high-dimensional space, where each dimension captures some aspect of the word's meaning or context.

The most commonly used word embedding algorithm is called Word2Vec, which uses a neural network to learn word embeddings from large amounts of text data. The resulting embeddings can be used for a variety of NLP tasks, such as text classification, sentiment analysis, and machine translation.

Word embeddings have become a popular technique in NLP because they can capture complex relationships between words and enable machines to process natural language more effectively. By representing words as vectors, we can perform operations on them such as finding nearest neighbors, computing similarities, and even performing analogies (e.g. "king" - "man" + "woman" = "queen").

Attention:

The attention mechanism is a technique used in neural network architectures, particularly in LLMs, to improve the model's ability to process long input sequences. The idea behind attention is to allow the model to focus on the most relevant parts of the input sequence at each step of the computation.

In LLMs, the attention mechanism works by associating a weight with each input token, indicating how much attention the model should pay to that token when computing the output at a particular step. These weights are typically learned by the model during training, based on the relevance of each token to the current context.

During the computation, the attention mechanism combines the input tokens with their corresponding weights to produce a weighted sum, which is used to inform the model's output at that step. This allows the model to give more weight to the parts of the input sequence that are most relevant to the current context, and to ignore the parts that are less relevant.

Transformers:

The key innovation of Transformers is their ability to process input sequences in parallel, rather than sequentially. This is achieved using a self-attention mechanism, which allows the model to attend to different parts of the input sequence at different times, depending on their relevance to the current context.

The self-attention mechanism works by computing a set of attention scores between each pair of input tokens. These scores are then used to compute a weighted sum of the input tokens, which is used to inform the model's output at each step. This allows the model to capture long-range dependencies and focus on the most relevant parts of the input sequence at each step.

In addition to their parallel processing capabilities, Transformers are also known for their ability to handle variable-length input sequences, which makes them well-suited for a variety of NLP tasks. Transformers have achieved state-of-the-art results on a variety of benchmark datasets, including language translation, question answering, and text classification.

There are 2 main parts to a transformer:

  1. Encoder, its purpose is to encode the input to later be consumed.

2. Decoder, its purpose is to generate text sequences. It takes the whole input and all the output it already produced during each step.


A simple illustration of how a transformer works.

source: youtube

ChatGPT

  1. Generative pre-training is (at a high level) the operation of spanning the train data to the vector space. ChatGPT was trained on ~45TB - kind of whole the text in the "visible" internet.
  2. Supervised fine-tuning, OpenAI operators were playing both sides (a user and a chatbot) and trained the model on how responses should look like.
  3. Reinforcement learning from human feedback, OpenAI engineers set a target function for the model and ranked the outputs of the model. The model goal is to maximize the target function so it "learn" to prefer the high ranked style responses.

Strengths

Answering questions

Generating text

Chatting

Providing personalized recommendations

Generating ideas

Weaknesses

Lack of real-world knowledge
ChatGPT is based on large amounts of text data, which means that it may not have access to the same real-world knowledge and experiences as a human conversational partner.

Arithmetic Operations
ChatGPT generates human-like text based on complex algorithms and large amounts of training data. Math is a field that requires understanding and application of mathematical concepts. While a language model may generate text that sounds like it understands math, it cannot actually do math, just as a chef skilled with a knife is not qualified to perform surgery.

Wrong Answers/difficulty handling specific or niche topics
ChatGPT may struggle to generate accurate or relevant responses to highly specialized or niche topics, especially if those topics are not well-represented in the model's training data. This can limit the usefulness of the model in certain scenarios.

Risk of biased or offensive responses
Because ChatGPT is trained on large amounts of text data, there is a risk that it may generate biased or offensive responses based on the biases and prejudices that exist in that data.

By the time of writing this blog post, OpenAI add more restrictions on the model output. Examples can still be found around the internet. 

Tendency to generate generic responses
Because ChatGPT is trained on a large corpus of text data, it may sometimes generate generic or clichéd responses that do not feel authentic or natural to the conversation. This can make it difficult for the model to engage in deeper or more nuanced discussions.
Inability to reason or understand complex logic:
While ChatGPT is capable of generating coherent and meaningful text, it does not have the same level of reasoning or logical understanding as a human conversational partner. This can make it difficult for the model to engage in complex or nuanced discussions that require logical reasoning or critical thinking.

OpenAI API

Prompts and completions:

To get better results on prompts you can ask ChatGPT, for example, to take on an expert persona like a Social media specialist, a Software developer, or a Technical writer, this can fine-tune the output for your needs. You can ask to include data sources, links, or resources or write in a specific style like serious or humorous.

Tokens:

“Our models understand and process text by breaking it down into tokens. Tokens can be words or just chunks of characters. For example, the word “hamburger” gets broken up into the tokens “ham”, “bur” and “ger”, while a short and common word like “pear” is a single token. Many tokens start with a whitespace, for example “ hello” and “ bye”.

Checkout the OpenAI tokenizer tool

Why Tokens and not words?

  1. Space/Search optimization.
    Words can share tokens so theoretically on a large enough dataset you can save space with that method. In addition, you can build a prefix tree from the tokens to search/build words faster.
  2. Understanding new words.
    "Your code is over pythonized" - tokenization can help the model understand new words/concepts.
  3. Ignore Typos.
    Tokenization helps the model filter typos when certain tokens have small probabilities.

API Parameters:

model: This parameter specifies which pre-trained model to use for generating the text. There are several models available, including davinci (most capable and versatile), curie (faster and less expensive than Davinci), babbage (slightly less capable than Curie), and ada (primarily trained on text in the medical domain). The choice of model will depend on the specific use case and the balance between speed, cost, and accuracy.

prompt: This is the starting text that the AI should use to generate new text. It can be a sentence or a longer piece of text that provides context for the generated output.

temperature: This parameter controls the "creativity" of the AI's output. A higher temperature will lead to more diverse and unpredictable output, while a lower temperature will lead to more predictable output. The default value is 0.5, but it can be adjusted between 0 and 1.

max_tokens: This sets the maximum number of tokens (words or sub-words) that the AI should generate in response to the prompt. The default value is 2048, but it can be set as high as 4096.

stop: This parameter allows you to specify one or more stopping phrases that will cause the AI to stop generating text. For example, you could set stop=["Thank you", "Best regards"] to ensure that the generated text ends with one of those phrases.

n: This parameter controls the number of responses that the AI should generate. The default value is 1, but it can be set as high as 10.

presence_penalty: This parameter can be used to encourage the AI to generate diverse output by penalizing the probability of generating words or phrases that have already appeared in the prompt or generated text. The default value is 0, but it can be adjusted between 0 and 1.

frequency_penalty: This parameter can be used to discourage the AI from repeating itself by penalizing the probability of generating the same word or phrase more than once. The default value is 0, but it can be adjusted between 0 and 1.

You can explore the different parameters and configurations on OpenApi Playground

Example use Case:

you can see a full walkthrough of this example on my blog post here -> HTML page generator with python and ChatGPT.

CatBreeds

CatBreeds

CatBreeds is a website dedicated to the appreciation of cats and their many breeds. We strive to provide the latest information on all the different breeds of cats and their characteristics. We also provide advice on how to care for cats and how to choose the right breed for you. We are passionate about cats and the joy they bring to our lives. We hope you will find our website to be a great source of information on all the different breeds of cats and the wonderful companions they can be.