Machine learning development is an empirical process. It’s hard to know in advance the result of a hyperparameter choice, dataset, or prompt to a large language model (LLM). You just have to try it, get a result, and decide on the next step. Still, understanding how the underlying technology works is very helpful for picking a promising direction. For example, when prompting an LLM, which of the following is more effective?
Prompt 1: [Problem/question description] State the answer and then explain your reasoning.
Prompt 2: [Problem/question description] Explain your reasoning and then state the answer.
These two prompts are nearly identical, and the former matches the wording of many university exams. But the second prompt is much more likely to get an LLM to give you a good answer. Here’s why: An LLM generates output by repeatedly guessing the most likely next word (or token). So if you ask it to start by stating the answer, as in the first prompt, it will take a stab at guessing the answer and then try to justify what might be an incorrect guess. In contrast, prompt 2 directs it to think things through before it reaches a conclusion. This principle also explains the effectiveness of widely discussed prompts such as “Let’s think step by step.”
The image above illustrates this difference using a question with one right answer. But similar considerations apply when asking an LLM to make judgment calls when there is no single right answer; for example, how to phrase an email, what to say to someone who is upset, or the proper department to route a customer email to.
That’s why it’s helpful to understand, in depth, how an algorithm works. And that means more than memorizing specific words to include in prompts or studying API calls. These algorithms are complex, and it’s hard to know all the details. Fortunately, there’s no need to. After all, you don’t need to be an expert in GPU compute allocation algorithms to use LLMs. But digging one or two layers deeper than the API documentation to understand how key pieces of the technology work will shape your insights. For example, in the past week, knowing how long-context transformer networks process input prompts and how tokenizers turn an input into tokens shaped how I used them.
A deep understanding of technology is especially helpful when the technology is still maturing. Most of us can get a mature technology like GPS to perform well without knowing much about how it works. But LLMs are still an immature technology, and thus your prompts can have non-intuitive effects. Developers who understand the technology in depth are likely to build more effective applications, and build them faster and more easily, than those who don't. Technical depth also helps you to decide when you can’t tell what’s likely to work in advance, and when the best approach is to try a handful of promising prompts, get a result, and keep iterating.
P.S. Our short course on fine-tuning LLMs is now available! As I wrote last week, many developers are not only prompting LLMs but also fine-tuning them — that is, taking a pretrained model and training it further on their own data. Fine-tuning can deliver superior results, and it can be done relatively inexpensively. In this course, Sharon Zhou, CEO and co-founder of Lamini (disclosure: I’m a minor shareholder) shows you how to recognize when fine-tuning can be helpful and how to do it with an open-source LLM. Learn to fine-tune your own models here.