AI vocabulary has become its own ecosystem. One week you’re hearing “LLM” everywhere, the next it’s “agentic AI,” and suddenly everyone is debating hallucinations, alignment, and evaluation like they’re old friends. The speed of adoption is part of the story: AI tools are moving from research labs into products, workplaces, and consumer apps faster than most people can build a mental model for what’s happening under the hood. But there’s another reason the jargon sticks—many terms are used in slightly different ways depending on who’s speaking: engineers, product teams, investors, regulators, and journalists all bring their own emphasis.
So instead of treating AI terms as trivia, it helps to treat them as a map. Each word points to a specific capability, risk, or engineering choice. And when you understand what each term is really describing, the hype becomes easier to spot—and the real progress becomes easier to recognize.
Let’s start with the broadest umbrella, because it explains why everything else feels so confusing.
Artificial Intelligence (AI): the umbrella that hides the details
AI is the catch-all phrase for systems that perform tasks associated with human intelligence—language understanding, image recognition, prediction, decision-making, and more. The problem is that “AI” doesn’t tell you how the system works or what guarantees it has. A rule-based system that filters spam can be called AI. A deep learning model trained on millions of examples can also be called AI. So can a hybrid system that combines machine learning with explicit logic.
When someone says “AI,” the useful follow-up question is: what kind of AI? That’s where the next term matters.
Machine Learning (ML): learning patterns instead of writing rules
Machine learning is a subset of AI where the system learns from data rather than being explicitly programmed for every scenario. In traditional software, developers write if-then rules. In ML, developers provide training data and a learning algorithm that adjusts internal parameters until the model performs well on the task.
This shift changes what “accuracy” means. ML systems don’t simply follow instructions; they generalize from examples. That’s powerful, but it also means performance depends on the quality, coverage, and representativeness of the training data. If the world changes—or if the input looks different from what the model saw during training—the model may struggle.
Generative AI: creating new outputs, not just labeling
Generative AI is a category of models designed to produce new content. Instead of classifying an image as “cat” or “dog,” generative systems can write text, generate images, synthesize audio, or produce code. The key idea is that these models learn statistical patterns from training data and then generate plausible continuations or structures.
This is why generative AI feels different to users. It doesn’t just respond with a label; it produces something that looks like it could have been written by a person. That “looks like” is important. Generative models are optimized to produce outputs that match learned patterns, not necessarily outputs that are factually correct in every case.
Large Language Models (LLMs): the engine behind many text experiences
LLMs are a specific type of generative AI trained on large amounts of text. They learn relationships between words and phrases and can generate coherent responses to prompts. LLMs are often used for tasks like summarization, question answering, drafting emails, translating languages, and assisting with coding.
But LLMs are not search engines. They don’t inherently “retrieve” facts from the internet at the moment you ask a question. Unless they’re connected to external tools, they generate answers based on patterns learned during training. That distinction is one of the reasons hallucinations show up so often in early deployments—and why modern systems increasingly combine LLMs with retrieval, verification, or tool use.
Prompt: the steering wheel, not the whole car
A prompt is the input you give the AI—your question, instruction, constraints, or context. People sometimes treat prompting like magic, but it’s better to think of it as a control interface. The prompt shapes what the model tries to produce, but it doesn’t guarantee correctness.
A strong prompt usually does three things:
It clarifies the task (what you want).
It provides context (what the model should consider).
It sets constraints (format, tone, boundaries, what to avoid).
In practice, prompting is also a way to manage uncertainty. If you ask for a confident answer without asking for sources or uncertainty handling, you’re more likely to get fluent but unreliable output. If you ask the model to explain assumptions, cite retrieved information, or separate facts from guesses, you often get a more usable response.
Hallucination: when confidence outruns truth
Hallucination is one of the most discussed terms in AI because it captures a real failure mode: the model generates information that is inaccurate, misleading, or not grounded in reality. The word “hallucination” can sound mystical, but the underlying issue is straightforward. The model is trained to produce likely text sequences. When it can’t reliably determine the correct answer—because the question is ambiguous, the information isn’t in its training distribution, or the model lacks access to current data—it may still generate something that sounds right.
The danger isn’t only that the output is wrong. It’s that the output can be persuasive. LLMs are optimized for coherence, not truth. Coherence can create a false sense of certainty.
Modern systems try to reduce hallucinations through several approaches:
Retrieval-augmented generation (RAG), where the model uses external documents as grounding.
Tool use, where the model calls functions to fetch data, run calculations, or check constraints.
Verification steps, where another model or a rule-based checker evaluates claims.
Better prompting, where the system is instructed to admit uncertainty or request missing information.
Even with these improvements, hallucinations don’t disappear entirely. They become less frequent, more detectable, or easier to mitigate depending on the application.
Bias: when training data becomes destiny
Bias in AI refers to systematic unfairness in outputs caused by skewed patterns in training data or in the way the model is optimized. Bias can show up as uneven performance across groups, stereotypes in generated language, or discriminatory decisions in automated systems.
Bias is not always obvious. Sometimes it appears as a measurable accuracy gap. Other times it shows up as subtle differences in how the model responds to different prompts. For example, a model might produce more detailed or more dismissive answers depending on demographic cues embedded in the input.
Bias mitigation is complex because it involves both technical and social choices:
What fairness definition should be used?
Which metrics matter?
How do you handle trade-offs between accuracy and fairness?
How do you test across real-world scenarios?
The important takeaway is that bias isn’t just a “bad behavior” problem. It’s often a data and objective function problem. If the training data reflects historical inequities, the model can learn those patterns unless corrected.
Safety and Alignment: making capabilities usable and controllable
Safety and alignment are terms that often get lumped together, but they point to different concerns.
Safety generally refers to efforts to make AI systems reliable and less harmful in real-world use. That includes preventing dangerous outputs, reducing misuse, and ensuring the system behaves within acceptable boundaries.
Alignment is broader: it’s about aligning the model’s behavior with human intent and values. In other words, the system should do what we mean—not just what we say. This becomes especially important as models become more capable and more autonomous.
A practical way to understand alignment is to think about incentives. If a model is optimized to satisfy a user request, but the request is vague or conflicting, the model may choose an interpretation that maximizes “helpfulness” in a narrow sense while violating the spirit of the goal. Alignment work tries to reduce these mismatches through training techniques, policy constraints, and evaluation.
Evaluation: the discipline behind “it works”
Evaluation is the process of testing AI performance before deployment. It’s where many teams either earn trust or lose it. Evaluation isn’t just about whether the model can produce a good answer once. It’s about measuring performance across a range of inputs, edge cases, and conditions.
Good evaluation typically includes:
Accuracy or task success metrics (did it do the job?)
Reliability (does it behave consistently?)
Robustness (how does it handle unusual inputs?)
Safety checks (does it produce harmful or disallowed content?)
Quality measures (is the output useful, clear, and appropriately formatted?)
Evaluation also matters because it forces clarity. If a team can’t define what “good” means, it’s hard to improve. And if they can’t measure failure modes, they can’t reduce them.
Agentic AI: from answering to acting
Agentic AI is one of the newest buzz terms, and it describes a shift in how AI systems are used. Traditional chatbots respond to prompts. Agentic systems are designed to take actions toward goals. Instead of only generating text, they can plan steps, call tools, and execute workflows.
This is where the conversation moves from “what does the model say?” to “what does the system do?”
An agent might:
Break down a goal into sub-tasks.
Decide which tools to use (search, database queries, code execution, scheduling).
Execute steps in sequence.
Check results and adjust if something fails.
The promise is huge: agents can reduce manual work and coordinate complex processes. But the risks also change. When an AI can act, errors can become costly. A wrong action isn’t just a wrong sentence—it can be a wrong transaction, a corrupted file, or a misconfigured system.
That’s why agentic AI pushes safety and evaluation even further. Systems need guardrails, permissioning, audit logs, and mechanisms to prevent runaway behavior. They also need clear boundaries: what the agent is allowed to do, what it must ask before doing, and how it should recover from uncertainty.
A unique way to see the whole landscape: AI terms describe different layers
If you zoom out, these terms form a layered stack
