Paper Summary: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper Summary: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Last updated:

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

chain-of-thought-article-front-page Chain-of-Though Prompting Elicits Reasoning in Large Language Models. Source

WHAT

Chain-of-Thought (CoT) is a technique whereby the first part of the output is augmenting the input and more thoroughly explaining it. And only then providing the output.

chain-of-thought-prompting-example Chain-of-Thought prompting example. Source

WHY

Because even large LLMs struggle with tasks that require multiple steps of reasoning and/or symbolic logic. And adding more parameters doesn't seem to help much.

HOW

Using the few-shot prompting technique, one adds examples of how to "augment" the input such that the task is more clearly explained and the model has more data to work with. Then, proceed as normal.

CLAIMS/QUOTES

  • Types of tasks amenable to CoT: "... chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks"

  • Interpretability: CoT adds some level of interpretability to the way the LLM is thinking as it produces an output. This is interesting.

  • CoT can be added to any model without needing retraining or fine-tuning: You can add CoT capabilities to any pre-trained LLM by just adding CoT examples and then using few-shot prompting.

  • Emergent behavior: CoT only works in large models: "... chain-of-thought prompting does not positively impact performance for small models, and only yields performance gains when used with models of ∼100B parameters."

    • In smaller models, the chains produced were "fluent but illogical", actually making results worse than with normal prompting.
  • Performance vs Special-purpose models: Vanilla LLMs with CoT in-context learning outperform larger LLMs that have been fine-tuned to excel in specific domains (e.g. GPT-3 fine-tuned on math).

  • CoT helps more, the more complex a task is. The performance gains of CoT is larger for tasks that require multi-step reasoning, such as logic and common-sense tasks.

NOTES

  • The CoT needs not be shown to a user using the model. It can just be hidden.

  • The experiments used just 8 examples of CoT in the context, for few-shot prompting.

MY 2¢

  • It's important to realize that CoT is an inference-type technique. It does not change the training-time setup of a model at all!

    • "No language models were finetuned in the process of writing this paper."
  • This wasn't discussed in the paper, but there definitely are latency tradeoffs when adding CoT to a model.


REFERENCES

Dialogue & Discussion