Speak AI
A Simple Glossary for Understanding Multilingual AI and Language Tools
Artificial Intelligence (AI)
Computer systems are designed to perform tasks that typically require human intelligence, such as understanding language, recognising patterns, and making decisions.
Attention
A mechanism that helps a model decide which words in a sentence are most relevant to each other.
Bias in AI
Systematic errors or unfairness in AI outputs often reflect prejudices present in training data.
Decoder
The part of a model that generates output text by predicting the next token step by step.
Embedding
An embedding is a vector, a set of numbers in multiple dimensions, that represents a word and captures its meaning and relationships with other words.
Encoder
The part of a model that processes input text and captures its context and relationships.
Fairness
The principle that AI systems should treat all users and languages equitably without discrimination.
Generative AI (GenAI)
A type of AI that creates new content such as text, images, or audio based on input prompts.
Large Language Model (LLM)
An AI model trained on massive amounts of text data to understand and generate human-like language. Examples include GPT-4 and BLOOM.
Low-Resource Languages
Languages with limited digital text data make it harder for AI to learn and perform well on them.
Machine Translation
The automatic translation of text or speech from one language to another by computers.
Multilingual AI
AI systems are capable of understanding and generating text in multiple languages.
Multilingual Prompting
Writing prompts that combine two or more languages to test or utilize an AI’s multilingual capabilities.
Natural Language Processing (NLP)
A field of AI focused on enabling computers to understand, interpret, and generate human language.
Prompt Engineering
The craft of designing and refining input instructions (prompts) to guide AI models to produce better or more accurate outputs.
Tokens
Small pieces of text (words or word parts) that AI models process. The number of tokens affects how much computation and cost is required.
Transformer
A neural network architecture that uses attention to process and generate sequences of text efficiently.
Vector
A list of numbers used by AI models to represent words, where distance reflects similarity in meaning.