DSPy7 Image

DSPy: Programming with Foundation Models

This notebook introduces the DSPy framework for Programming with Foundation Models, i.e., language models (LMs) and retrieval models (RMs).

DSPy emphasizes programming over prompting. It unifies techniques for prompting and fine-tuning LMs as well as improving them with reasoning and tool/retrieval augmentation, all expressed through a minimalistic set of Pythonic operations that compose and learn.

DSPy provides composable and declarative modules for instructing LMs in a familiar Pythonic syntax. On top of that, DSPy introduces an automatic compiler that teaches LMs how to conduct the declarative steps in your program. The DSPy compiler will internally trace your program and then craft high-quality prompts for large LMs (or train automatic finetunes for small LMs) to teach them the steps of your task.

0] Setting Up

As we’ll start to see below, DSPy can routinely teach powerful models like GPT-3.5 and local models like T5-base or Llama2-13b to be much more reliable at complex tasks. DSPy will compile the same program into different few-shot prompts and/or finetunes for each LM.

Let’s begin by setting things up. The snippet below will also install DSPy if it’s not there already.

%load_ext autoreload
%autoreload 2

import sys
import os

try: # When on google Colab, let's clone the notebook so we download the cache.
    import google.colab
    repo_path = 'dspy'
    !git -C $repo_path pull origin || git clone https://github.com/stanfordnlp/dspy $repo_path
except:
    repo_path = '.'

if repo_path not in sys.path:
    sys.path.append(repo_path)

# Set up the cache for this notebook
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(repo_path, 'cache')

import pkg_resources # Install the package if it's not installed
if not "dspy-ai" in {pkg.key for pkg in pkg_resources.working_set}:
    !pip install -U pip
    !pip install dspy-ai
    !pip install openai~=0.28.1
    # !pip install -e $repo_path

import dspy

1] Getting Started

We’ll start by setting up the language model (LM) and retrieval model (RM). DSPy supports multiple API and local models. In this notebook, we’ll work with GPT-3.5 (gpt-3.5-turbo) and the retriever ColBERTv2.

To make things easy, we’ve set up a ColBERTv2 server hosting a Wikipedia 2017 “abstracts” search index (i.e., containing first paragraph of each article from this 2017 dump), so you don’t need to worry about setting one up! It’s free.

Note: If you want to run this notebook without changing the examples, you don’t need an API key. All examples are already cached internally so you can inspect them!

turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

In the last line above, we configure DSPy to use the turbo LM and the ColBERTv2 retriever (over Wikipedia 2017 abstracts) by default. This will be easy to overwrite for local parts of our programs if needed.

A word on the workflow

You can build your own DSPy programs for various tasks, e.g., question answering, information extraction, or text-to-SQL.

Whatever the task, the general workflow is:

  1. Collect a little bit of data. Define examples of the inputs and outputs of your program (e.g., questions and their answers). This could just be a handful of quick examples you wrote down. If large datasets exist, the more the merrier!
  2. Write your program. Define the modules (i.e., sub-tasks) of your program and the way they should interact together to solve your task.
  3. Define some validation logic. What makes for a good run of your program? Maybe the answers need to have a certain length or stick to a particular format? Specify the logic that checks that.
  4. Compile! Ask DSPy to compile your program using your data. The compiler will use your data and validation logic to optimize your program (e.g., prompts and modules) so it’s efficient and effective!
  5. Iterate. Repeat the process by improving your data, program, validation, or by using more advanced features of the DSPy compiler.

Let’s now see this in action.

2] Task Examples

DSPy accomodates a wide variety of applications and tasks. In this intro notebook, we will work on the example task of multi-hop question answering (QA).

Other notebooks and tutorials will present different tasks. Now, let us load a tiny sample from the HotPotQA multi-hop dataset.

from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)
(20, 50)

We just loaded trainset (20 examples) and devset (50 examples). Each example in our training set contains just a question and its (human-annotated) answer.

DSPy typically requires very minimal labeling. Whereas your pipeline may involve six or seven complex steps, you only need labels for the initial question and the final answer. DSPy will bootstrap any intermediate labels needed to support your pipeline. If you change your pipeline in any way, the data bootstrapped will change accordingly!

Now, let’s look at some data examples.

train_example = trainset[0]
print(f"Question: {train_example.question}")
print(f"Answer: {train_example.answer}")
Question: At My Window was released by which American singer-songwriter?
Answer: John Townes Van Zandt

Examples in the dev set contain a third field, namely, titles of relevant Wikipedia articles. This is not essential but, for the sake of this intro, it’ll help us get a sense of how well our programs are doing.

dev_example = devset[18]
print(f"Question: {dev_example.question}")
print(f"Answer: {dev_example.answer}")
print(f"Relevant Wikipedia Titles: {dev_example.gold_titles}")
Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Answer: English
Relevant Wikipedia Titles: {'Robert Irvine', 'Restaurant: Impossible'}

After loading the raw data, we’d applied x.with_inputs('question') to each example to tell DSPy that our input field in each example will be just question. Any other fields are labels or metadata that are not given to the system.

print(f"For this dataset, training examples have input keys {train_example.inputs().keys()} and label keys {train_example.labels().keys()}")
print(f"For this dataset, dev examples have input keys {dev_example.inputs().keys()} and label keys {dev_example.labels().keys()}")
For this dataset, training examples have input keys ['question'] and label keys ['answer']
For this dataset, dev examples have input keys ['question'] and label keys ['answer', 'gold_titles']

Note that there’s nothing special about the HotPotQA dataset: it’s just a list of examples.

You can define your own examples as below. A future notebook will guide you through creating your own data in unusual or data-scarce settings, which is a context where DSPy excels.

dspy.Example(field1=value, field2=value2, ...)

3] Building Blocks

In DSPy, we will maintain a clean separation between defining your modules in a declarative way and calling them in a pipeline to solve the task.

This allows you to focus on the information flow of your pipeline. DSPy will then take your program and automatically optimize how to prompt (or finetune) LMs for your particular pipeline so it works well.

If you have experience with PyTorch, you can think of DSPy as the PyTorch of the foundation model space. Before we see this in action, let’s first understand some key pieces.

Using the Language Model: Signatures & Predictors

Every call to the LM in a DSPy program needs to have a Signature.

A signature consists of three simple elements:

  • A minimal description of the sub-task the LM is supposed to solve.
  • A description of one or more input fields (e.g., input question) that will we will give to the LM.
  • A description of one or more output fields (e.g., the question’s answer) that we will expect from the LM.

Let’s define a simple signature for basic question answering.

class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In BasicQA, the docstring describes the sub-task here (i.e., answering questions). Each InputField or OutputField can optionally contain a description desc too. When it’s not given, it’s inferred from the field’s name (e.g., question).

Notice that there isn’t anything special about this signature in DSPy. We can just as easily define a signature that takes a long snippet from a PDF and outputs structured information, for instance.

Anyway, now that we have a signature, let’s define and use a Predictor. A predictor is a module that knows how to use the LM to implement a signature. Importantly, predictors can learn to fit their behavior to the task!

# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

# Call the predictor on a particular input.
pred = generate_answer(question=dev_example.question)

# Print the input and the prediction.
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")
Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Predicted Answer: American

In the example above, we asked the predictor about the the chef featured in “Restaurant: Impossible”. The model outputs an answer (“American”).

For visibility, we can inspect how this extremely basic predictor implemented our signature. Let’s inspect the history of our LM (turbo).

turbo.inspect_history(n=1)




Answer questions with short factoid answers.

---

Follow the following format.

Question: ${question}
Answer: often between 1 and 5 words

---

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Answer: American


It happens that this chef is both British and American, but we have no way of knowing if the model just guessed “American” because it’s a common answer. In general, adding retrieval and learning will help the LM be more factual, and we’ll explore this in a minute!

But before we do that, how about we just change the predictor? It would be nice to allow the model to elicit a chain of thought along with the prediction.

# Define the predictor. Notice we're just changing the class. The signature BasicQA is unchanged.
generate_answer_with_chain_of_thought = dspy.ChainOfThought(BasicQA)

# Call the predictor on the same input.
pred = generate_answer_with_chain_of_thought(question=dev_example.question)

# Print the input, the chain of thought, and the prediction.
print(f"Question: {dev_example.question}")
print(f"Thought: {pred.rationale.split('.', 1)[1].strip()}")
print(f"Predicted Answer: {pred.answer}")
Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Thought: We know that the chef and restaurateur featured in Restaurant: Impossible is Robert Irvine.
Predicted Answer: British

This is indeed a better answer: the model figures out that the chef in question is Robert Irvine and correctly identifies that he’s British.

These predictors (dspy.Predict and dspy.ChainOfThought) can be applied to any signature. As we’ll see below, they can also be optimized to learn from your data and validation logic.

Using the Retrieval Model

Using the retriever is pretty simple. A module dspy.Retrieve(k) will search for the top-k passages that match a given query.

By default, this will use the retriever we configured at the top of this notebook, namely, ColBERTv2 over a Wikipedia index.

retrieve = dspy.Retrieve(k=3)
topK_passages = retrieve(dev_example.question).passages

print(f"Top {retrieve.k} passages for question: {dev_example.question} \n", '-' * 30, '\n')

for idx, passage in enumerate(topK_passages):
    print(f'{idx+1}]', passage, '\n')
Top 3 passages for question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible? 
 ------------------------------ 

1] Restaurant: Impossible | Restaurant: Impossible is an American reality television series, featuring chef and restaurateur Robert Irvine, that aired on Food Network from 2011 to 2016. 

2] Jean Joho | Jean Joho is a French-American chef and restaurateur. He is chef/proprietor of Everest in Chicago (founded in 1986), Paris Club Bistro & Bar and Studio Paris in Chicago, The Eiffel Tower Restaurant in Las Vegas, and Brasserie JO in Boston. 

3] List of Restaurant: Impossible episodes | This is the list of the episodes for the American cooking and reality television series "Restaurant Impossible", produced by Food Network. The premise of the series is that within two days and on a budget of $10,000, celebrity chef Robert Irvine renovates a failing American restaurant with the goal of helping to restore it to profitability and prominence. Irvine is assisted by a designer (usually Taniya Nayak, Cheryl Torrenueva, or Lynn Keagan, but sometimes Vanessa De Leon, Krista Watterworth, Yvette Irene, or Nicole Faccuito), along with general contractor Tom Bury, who sometimes does double duty as both general contractor and designer. After assessing the problems with the restaurant, Robert Irvine typically creates a plan for the new decor, oversees the cleaning of the restaurant, reduces the size of the menu and improves the food, develops a promotional activity, educates the restaurant's owners, or trains the staff, as needed by each restaurant. 

Feel free to any other queries you like.

retrieve("When was the first FIFA World Cup held?").passages[0]
'History of the FIFA World Cup | The FIFA World Cup was first held in 1930, when FIFA president Jules Rimet decided to stage an international football tournament. The inaugural edition, held in 1930, was contested as a final tournament of only thirteen teams invited by the organization. Since then, the World Cup has experienced successive expansions and format remodeling to its current 32-team final tournament preceded by a two-year qualifying process, involving over 200 teams from around the world.'

4] Program 1: Basic Retrieval-Augmented Generation (“RAG”)

Let’s define our first complete program for this task. We’ll build a retrieval-augmented pipeline for answer generation.

Given a question, we’ll search for the top-3 passages in Wikipedia and then feed them as context for answer generation.

Let’s start by defining this signature: context, question --> answer.

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

Great. Now let’s define the actual program. This is a class that inherits from dspy.Module.

It needs two methods:

  • The __init__ method will simply declare the sub-modules it needs: dspy.Retrieve and dspy.ChainOfThought. The latter is defined to implement our GenerateAnswer signature.
  • The forward method will describe the control flow of answering the question using the modules we have.
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)
Compiling the RAG program

Having defined this program, let’s now compile it. Compiling a program will update the parameters stored in each module. In our setting, this is primarily in the form of collecting and selecting good demonstrations for inclusion in your prompt(s).

Compiling depends on three things:

  1. A training set. We’ll just use our 20 question–answer examples from trainset above.
  2. A metric for validation. We’ll define a quick validate_context_and_answer that checks that the predicted answer is correct. It’ll also check that the retrieved context does actually contain that answer.
  3. A specific teleprompter. The DSPy compiler includes a number of teleprompters that can optimize your programs.

Teleprompters: Teleprompters are powerful optimizers that can take any program and learn to bootstrap and select effective prompts for its modules. Hence the name, which means “prompting at a distance”.

Different teleprompters offer various tradeoffs in terms of how much they optimize cost versus quality, etc. We will use a simple default BootstrapFewShot in this notebook.

If you’re into analogies, you could think of this as your training data, your loss function, and your optimizer in a standard DNN supervised learning setup. Whereas SGD is a basic optimizer, there are more sophisticated (and more expensive!) ones like Adam or RMSProp.

from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)
 50%|█████     | 10/20 [00:00<00:00, 121.99it/s]
Bootstrapped 4 full traces after 11 examples in round 0.

Now that we’ve compiled our RAG program, let’s try it out.

# Ask any question you like to this simple RAG program.
my_question = "What castle did David Gregory inherit?"

# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = compiled_rag(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
Question: What castle did David Gregory inherit?
Predicted Answer: Kinnairdy Castle
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']

Excellent. How about we inspect the last prompt for the LM?

turbo.inspect_history(n=1)




Answer questions with short factoid answers.

---

Question: At My Window was released by which American singer-songwriter?
Answer: John Townes Van Zandt

Question: "Everything Has Changed" is a song from an album released under which record label ?
Answer: Big Machine Records

Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Answer: 1950

Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?
Answer: Aleem Sarwar Dar

Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?
Answer: "Outfield of Dreams"

Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?
Answer: Aleksandr Danilovich Aleksandrov

Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?
Answer: 2010

Question: Tombstone stared an actor born May 17, 1955 known as who?
Answer: Bill Paxton

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield
Answer: 1874

Question: which American actor was Candace Kita guest starred with
Answer: Bill Murray

Question: Which is taller, the Empire State Building or the Bank of America Tower?
Answer: The Empire State Building

Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?
Answer: Buena Vista Distribution

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

---

Context:
[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»
[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»
[3] «Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including "Fight to Win", "Best of the Best", "Bloodsport II", and "Bloodsport III". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both "Black Belt" magazine's Hall of Fame and "Tae Kwon Do Times" magazine's Hall of Fame.»

Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?

Reasoning: Let's think step by step in order to produce the answer. We know from the context that "Tae Kwon Do Times" is a magazine that covers taekwondo and other Korean martial arts. It has published articles by authors like Scott Shaw. On the other hand, there is no information about Southwest Art magazine in the context.

Answer: Tae Kwon Do Times

---

Context:
[1] «Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an American actress, producer, singer, comic book writer, and political activist. She made her film debut in the 1995 teen drama "Kids". Her subsequent film roles include "He Got Game", "Men in Black II", "25th Hour", "Rent", "Sin City", "Death Proof", "Seven Pounds", "", and "Top Five". Dawson has also provided voice-over work for Disney and DC.»
[2] «Sarai Gonzalez | Sarai Isaura Gonzalez (born 2005) is an American Latina child actress who made her professional debut at the age of 11 on the Spanish-language ""Soy Yo"" ("That's Me") music video by Bomba Estéreo. Cast as a "nerdy" tween with a "sassy" and "confident" attitude, her performance turned her into a "Latina icon" for "female empowerment, identity and self-worth". She subsequently appeared in two get out the vote videos for Latinos in advance of the 2016 United States elections.»
[3] «Gabriela (2001 film) | Gabriela is a 2001 American romance film, starring Seidy Lopez in the title role alongside Jaime Gomez as her admirer Mike. The film has been cited as an inspiration behind the Premiere Weekend Club, which supports Latino film-making.»

Question: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?

Reasoning: Let's think step by step in order to produce the answer. We know that the actress made her film debut in 1995 and co-founded Voto Latino.

Answer: Rosario Dawson

---

Context:
[1] «Battle of Kursk | The Battle of Kursk was a Second World War engagement between German and Soviet forces on the Eastern Front near Kursk (450 km south-west of Moscow) in the Soviet Union during July and August 1943. The battle began with the launch of the German offensive, Operation Citadel (German: "Unternehmen Zitadelle" ), on 5 July, which had the objective of pinching off the Kursk salient with attacks on the base of the salient from north and south simultaneously. After the German offensive stalled on the northern side of the salient, on 12 July the Soviets commenced their Kursk Strategic Offensive Operation with the launch of Operation Kutuzov (Russian: Кутузов ) against the rear of the German forces in the northern side. On the southern side, the Soviets also launched powerful counterattacks the same day, one of which led to a large armoured clash, the Battle of Prokhorovka. On 3 August, the Soviets began the second phase of the Kursk Strategic Offensive Operation with the launch of Operation Polkovodets Rumyantsev (Russian: Полководец Румянцев ) against the German forces in the southern side of the Kursk salient.»
[2] «Operation Mars | Operation Mars, also known as the Second Rzhev-Sychevka Offensive Operation (Russian: Вторая Ржевско-Сычёвская наступательная операция), was the codename for an offensive launched by Soviet forces against German forces during World War II. It took place between 25 November and 20 December 1942 around the Rzhev salient in the vicinity of Moscow.»
[3] «Kholm Pocket | The Kholm Pocket (German: "Kessel von Cholm" ; Russian: Холмский котёл ) was the name given for the encirclement of German troops by the Red Army around Kholm south of Leningrad, during World War II on the Eastern Front, from 23 January 1942 until 5 May 1942. A much larger pocket was simultaneously surrounded in Demyansk, about 100 km to the northeast. These were the results of German retreat following their defeat during the Battle of Moscow.»

Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?

Reasoning: Let's think step by step in order to produce the answer. We know that the German offensive that started the Battle of Kursk was called Operation Citadel.

Answer: Operation Citadel

---

Context:
[1] «Kerry Condon | Kerry Condon (born 4 January 1983) is an Irish television and film actress, best known for her role as Octavia of the Julii in the HBO/BBC series "Rome," as Stacey Ehrmantraut in AMC's "Better Call Saul" and as the voice of F.R.I.D.A.Y. in various films in the Marvel Cinematic Universe. She is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet."»
[2] «Corona Riccardo | Corona Riccardo (c. 1878October 15, 1917) was an Italian born American actress who had a brief Broadway stage career before leaving to become a wife and mother. Born in Naples she came to acting in 1894 playing a Mexican girl in a play at the Empire Theatre. Wilson Barrett engaged her for a role in his play "The Sign of the Cross" which he took on tour of the United States. Riccardo played the role of Ancaria and later played Berenice in the same play. Robert B. Mantell in 1898 who struck by her beauty also cast her in two Shakespeare plays, "Romeo and Juliet" and "Othello". Author Lewis Strang writing in 1899 said Riccardo was the most promising actress in America at the time. Towards the end of 1898 Mantell chose her for another Shakespeare part, Ophelia im Hamlet. Afterwards she was due to join Augustin Daly's Theatre Company but Daly died in 1899. In 1899 she gained her biggest fame by playing Iras in the first stage production of Ben-Hur.»
[3] «Judi Dench | Dame Judith Olivia "Judi" Dench, {'1': ", '2': ", '3': ", '4': "} (born 9 December 1934) is an English actress and author. Dench made her professional debut in 1957 with the Old Vic Company. Over the following few years, she performed in several of Shakespeare's plays in such roles as Ophelia in "Hamlet", Juliet in "Romeo and Juliet", and Lady Macbeth in "Macbeth". Although most of her work during this period was in theatre, she also branched into film work and won a BAFTA Award as Most Promising Newcomer. She drew strong reviews for her leading role in the musical "Cabaret" in 1968.»

Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ?

Reasoning: Let's think step by step in order to produce the answer. We know that the actress we are looking for is the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." We also know that she acted in the short film The Shore.

Answer: Kerry Condon

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of Monte Cassino in Ascoli. In 1004, he fortified and expanded the castle of Dragonara on the Fortore. He gave it three circular towers and one square one. He also strengthened Lucera.»
[3] «David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University of Edinburgh, Savilian Professor of Astronomy at the University of Oxford, and a commentator on Isaac Newton's "Principia".»

Question: What castle did David Gregory inherit?

Reasoning: Let's think step by step in order to produce the answer. We know that David Gregory inherited a castle. The name of the castle is Kinnairdy Castle.

Answer: Kinnairdy Castle


Even though we haven’t written any of this detailed demonstrations, we see that DSPy was able to bootstrap this 3,000 token prompt for 3-shot retrieval augmented generation with hard negative passages and chain of thought from our extremely simple program.

This illustrates the power of composition and learning. Of course, this was just generated by a particular teleprompter, which may or may not be perfect in each setting. As you’ll see in DSPy, there is a large but systematic space of options you have to optimize and validate the quality and cost of your programs.

If you’re so inclined, you can easily inspect the learned objects themselves.

for name, parameter in compiled_rag.named_predictors():
    print(name)
    print(parameter.demos[0])
    print()
generate_answer
Example({'augmented': True, 'context': ['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.', "Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.", 'Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including "Fight to Win", "Best of the Best", "Bloodsport II", and "Bloodsport III". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both "Black Belt" magazine\'s Hall of Fame and "Tae Kwon Do Times" magazine\'s Hall of Fame.'], 'question': 'Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?', 'rationale': 'produce the answer. We know from the context that "Tae Kwon Do Times" is a magazine that covers taekwondo and other Korean martial arts. It has published articles by authors like Scott Shaw. On the other hand, there is no information about Southwest Art magazine in the context.', 'answer': 'Tae Kwon Do Times'}) (input_keys=None)
Evaluating the Answers

We can now evaluate our compiled_rag program on the dev set. Of course, this tiny set is not meant to be a reliable benchmark, but it’ll be instructive to use it for illustration.

For a start, let’s evaluate the accuracy (exact match) of the predicted answer.

from dspy.evaluate.evaluate import Evaluate

# Set up the `evaluate_on_hotpotqa` function. We'll use this many times below.
evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

# Evaluate the `compiled_rag` program with the `answer_exact_match` metric.
metric = dspy.evaluate.answer_exact_match
evaluate_on_hotpotqa(compiled_rag, metric=metric)
Average Metric: 22 / 50  (44.0): 100%|██████████| 50/50 [00:00<00:00, 116.45it/s]
Average Metric: 22 / 50  (44.0%)
  question example_answer gold_titles context pred_answer answer_exact_match
0 Are both Cangzhou and Qionghai in the Hebei province of China? no {'Cangzhou', 'Qionghai'} ['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up ("or metro") area... No ✔️ [True]
1 Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season? National Hockey League {'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'} ['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was... National Hockey League ✔️ [True]
2 The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay... Steve Yzerman {'2006–07 Detroit Red Wings season', 'Steve Yzerman'} ['Steve Yzerman | Stephen Gregory "Steve" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager... Steve Yzerman ✔️ [True]
3 What river is near the Crichton Collegiate Church? the River Tyne {'Crichton Castle', 'Crichton Collegiate Church'} ["Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is... River Tyne ✔️ [True]
4 In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king? King Alfred the Great {'Æthelweard (son of Alfred)', 'Ealhswith'} ["Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-century king of East Anglia, the long-lived Anglo-Saxon kingdom which today includes the English counties... King Alfred the Great ✔️ [True]
... 45 more rows not displayed ...
44.0
Evaluating the Retrieval

It may also be instructive to look at the accuracy of retrieval. There are multiple ways to do this. Often, we can just check whether the rertieved passages contain the answer.

That said, since our dev set includes the gold titles that should be retrieved, we can just use these here.

def gold_passages_retrieved(example, pred, trace=None):
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)
Average Metric: 13 / 50  (26.0): 100%|██████████| 50/50 [00:00<00:00, 671.76it/s]
Average Metric: 13 / 50  (26.0%)
  question example_answer gold_titles context pred_answer gold_passages_retrieved
0 Are both Cangzhou and Qionghai in the Hebei province of China? no {'Cangzhou', 'Qionghai'} ['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up ("or metro") area... No ❌ [False]
1 Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season? National Hockey League {'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'} ['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was... National Hockey League ✔️ [True]
2 The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay... Steve Yzerman {'2006–07 Detroit Red Wings season', 'Steve Yzerman'} ['Steve Yzerman | Stephen Gregory "Steve" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager... Steve Yzerman ✔️ [True]
3 What river is near the Crichton Collegiate Church? the River Tyne {'Crichton Castle', 'Crichton Collegiate Church'} ["Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is... River Tyne ✔️ [True]
4 In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king? King Alfred the Great {'Æthelweard (son of Alfred)', 'Ealhswith'} ["Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-century king of East Anglia, the long-lived Anglo-Saxon kingdom which today includes the English counties... King Alfred the Great ❌ [False]
... 45 more rows not displayed ...

Although this simple compiled_rag program is able to answer a decent fraction of the questions correctly (on this tiny set, over 40%), the quality of retrieval is much lower.

This potentially suggests that the LM is often relying on the knowledge it memorized during training to answer questions. To address this weak retrieval, let’s explore a second program that involves more advanced search behavior.

5] Program 2: Multi-Hop Search (“Baleen”)

From exploring the harder questions in the training/dev sets, it becomes clear that a single search query is often not enough for this task. For instance, this can be seen when a question ask about, say, the birth city of the writer of “Right Back At It Again”. A search query identifies the author correctly as “Jeremy McKinnon”, but it wouldn’t figure out when he was born.

The standard approach for this challenge in the retrieval-augmented NLP literature is to build multi-hop search systems, like GoldEn (Qi et al., 2019) and Baleen (Khattab et al., 2021). These systems read the retrieved results and then generate additional queries to gather additional information if necessary. Using DSPy, we can easily simulate such systems in a few lines of code.

We’ll still use the GenerateAnswer signature from the RAG implementation above. All we need now is a signature for the “hop” behavior: taking some partial context and a question, generate a search query to find missing information.

class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

Note: We could have written context = GenerateAnswer.signature.context to avoid duplicating the description of the context field.

Now, let’s define the program itself SimplifiedBaleen. There are many possible ways to implement this, but we’ll keep this version down to the key elements for simplicity.

from dsp.utils import deduplicate

class SimplifiedBaleen(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

As we can see, the __init__ method defines a few key sub-modules:

  • generate_query: For each hop, we will have one dspy.ChainOfThought predictor with the GenerateSearchQuery signature.
  • retrieve: This module will do the actual search, using the generated queries.
  • generate_answer: This dspy.Predict module will be used after all the search steps. It has a GenerateAnswer, to actually produce an answer.

The forward method uses these sub-modules in simple control flow.

  1. First, we’ll loop up to self.max_hops times.
  2. In each iteration, we’ll generate a search query using the predictor at self.generate_query[hop].
  3. We’ll retrieve the top-k passages using that query.
  4. We’ll add the (deduplicated) passages to our accumulator of context.
  5. After the loop, we’ll use self.generate_answer to produce an answer.
  6. We’ll return a prediction with the retrieved context and predicted answer.
Inspect the zero-shot version of the Baleen program

We will also compile this program shortly. But, before that, we can try it out in a “zero-shot” setting (i.e., without any compilation).

Using a program in zero-shot (uncompiled) setting doesn’t mean that quality will be bad. It just means that we’re bottlenecked directly by the reliability of the underlying LM to understand our sub-tasks from minimal instructions.

This is often just fine when using the most expensive/powerful models (e.g., GPT-4) on the easiest and most standard tasks (e.g., answering simple questions about popular entities).

However, a zero-shot approach quickly falls short for more specialized tasks, for novel domains/settings, and for more efficient (or open) models. DSPy can help you in all of these settings.

# Ask any question you like to this simple RAG program.
my_question = "How many storeys are in the castle that David Gregory inherited?"

# Get the prediction. This contains `pred.context` and `pred.answer`.
uncompiled_baleen = SimplifiedBaleen()  # uncompiled (i.e., zero-shot) program
pred = uncompiled_baleen(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
Question: How many storeys are in the castle that David Gregory inherited?
Predicted Answer: five
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'The Boleyn Inheritance | The Boleyn Inheritance is a novel by British author Philippa Gregory which was first published in 2006. It is a direct sequel to her previous novel "The Other Boleyn Girl," an...', 'Gregory of Gaeta | Gregory was the Duke of Gaeta from 963 until his death. He was the second son of Docibilis II of Gaeta and his wife Orania. He succeeded his brother John II, who had left only daugh...', 'Kinnairdy Castle | Kinnairdy Castle is a tower house, having five storeys and a garret, two miles south of Aberchirder, Aberdeenshire, Scotland. The alternative name is Old Kinnairdy....', 'Kinnaird Head | Kinnaird Head (Scottish Gaelic: "An Ceann Àrd" , "high headland") is a headland projecting into the North Sea, within the town of Fraserburgh, Aberdeenshire on the east coast of Scotla...', 'Kinnaird Castle, Brechin | Kinnaird Castle is a 15th-century castle in Angus, Scotland. The castle has been home to the Carnegie family, the Earl of Southesk, for more than 600 years....']

Let’s inspect the last three calls to the LM (i.e., generating the first hop’s query, generating the second hop’s query, and generating the answer).

turbo.inspect_history(n=3)




Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context: N/A

Question: How many storeys are in the castle that David Gregory inherited?

Reasoning: Let's think step by step in order to find the answer to this question. First, we need to find information about David Gregory and the castle he inherited. Then, we can search for details about the castle's architecture or any historical records that mention the number of storeys.

Query: "David Gregory castle inheritance"







Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «The Boleyn Inheritance | The Boleyn Inheritance is a novel by British author Philippa Gregory which was first published in 2006. It is a direct sequel to her previous novel "The Other Boleyn Girl," and one of the additions to her six-part series on the Tudor royals. (The other titles in the series are "The Constant Princess," "The Queen's Fool," "The Virgin's Lover,and The Other Queen.") * The novel is told through the first-person narratives of – Anne of Cleves, Katherine Howard, and Jane Boleyn, who was mentioned in "The Other Boleyn Girl." It covers a period from 1539 until 1542 and chronicles the fourth and fifth marriages of King Henry VIII of England.»
[3] «Gregory of Gaeta | Gregory was the Duke of Gaeta from 963 until his death. He was the second son of Docibilis II of Gaeta and his wife Orania. He succeeded his brother John II, who had left only daughters. Gregory rapidly depleted the "publicum" (public land) of the Duchy of Gaeta by doling it out to family members as grants. Gregory disappears from the records in 964 and was succeeded by his younger brother Marinus of Fondi over the heads of his three sons. It is possible that there was an internal power struggle between factions of the Docibilan family and that Gregory was forced out. On the other hand, perhaps he died and his sons fought a losing battle for their inheritance to Gaeta.»

Question: How many storeys are in the castle that David Gregory inherited?

Reasoning: Let's think step by step in order to produce the query. We know that David Gregory inherited Kinnairdy Castle, so we need to find information about the castle and its characteristics.

Query: "Kinnairdy Castle number of storeys"







Answer questions with short factoid answers.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «The Boleyn Inheritance | The Boleyn Inheritance is a novel by British author Philippa Gregory which was first published in 2006. It is a direct sequel to her previous novel "The Other Boleyn Girl," and one of the additions to her six-part series on the Tudor royals. (The other titles in the series are "The Constant Princess," "The Queen's Fool," "The Virgin's Lover,and The Other Queen.") * The novel is told through the first-person narratives of – Anne of Cleves, Katherine Howard, and Jane Boleyn, who was mentioned in "The Other Boleyn Girl." It covers a period from 1539 until 1542 and chronicles the fourth and fifth marriages of King Henry VIII of England.»
[3] «Gregory of Gaeta | Gregory was the Duke of Gaeta from 963 until his death. He was the second son of Docibilis II of Gaeta and his wife Orania. He succeeded his brother John II, who had left only daughters. Gregory rapidly depleted the "publicum" (public land) of the Duchy of Gaeta by doling it out to family members as grants. Gregory disappears from the records in 964 and was succeeded by his younger brother Marinus of Fondi over the heads of his three sons. It is possible that there was an internal power struggle between factions of the Docibilan family and that Gregory was forced out. On the other hand, perhaps he died and his sons fought a losing battle for their inheritance to Gaeta.»
[4] «Kinnairdy Castle | Kinnairdy Castle is a tower house, having five storeys and a garret, two miles south of Aberchirder, Aberdeenshire, Scotland. The alternative name is Old Kinnairdy.»
[5] «Kinnaird Head | Kinnaird Head (Scottish Gaelic: "An Ceann Àrd" , "high headland") is a headland projecting into the North Sea, within the town of Fraserburgh, Aberdeenshire on the east coast of Scotland. The 16th-century Kinnaird Castle was converted in 1787 for use as the Kinnaird Head Lighthouse, the first lighthouse in Scotland to be lit by the Commissioners of Northern Lights. Kinnaird Castle and the nearby Winetower were described by W. Douglas Simpson as two of the nine castles of the Knuckle, referring to the rocky headland of north-east Aberdeenshire. Both buildings are category A listed buildings.»
[6] «Kinnaird Castle, Brechin | Kinnaird Castle is a 15th-century castle in Angus, Scotland. The castle has been home to the Carnegie family, the Earl of Southesk, for more than 600 years.»

Question: How many storeys are in the castle that David Gregory inherited?

Reasoning: Let's think step by step in order to produce the answer. We know from the context that David Gregory inherited Kinnairdy Castle. 

Answer: five


Compiling the Baleen program

Now is the time to compile our multi-hop (SimplifiedBaleen) program.

We will first define our validation logic, which will simply require that:

  • The predicted answer matches the gold answer.
  • The retrieved context contains the gold answer.
  • None of the generated queries is rambling (i.e., none exceeds 100 characters in length).
  • None of the generated queries is roughly repeated (i.e., none is within 0.8 or higher F1 score of earlier queries).
def validate_context_and_answer_and_hops(example, pred, trace=None):
    if not dspy.evaluate.answer_exact_match(example, pred): return False
    if not dspy.evaluate.answer_passage_match(example, pred): return False

    hops = [example.question] + [outputs.query for *_, outputs in trace if 'query' in outputs]

    if max([len(h) for h in hops]) > 100: return False
    if any(dspy.evaluate.answer_exact_match_str(hops[idx], hops[:idx], frac=0.8) for idx in range(2, len(hops))): return False

    return True

Like we did for RAG, we’ll use one of the most basic teleprompters in DSPy, namely, BootstrapFewShot.

teleprompter = BootstrapFewShot(metric=validate_context_and_answer_and_hops)
compiled_baleen = teleprompter.compile(SimplifiedBaleen(), teacher=SimplifiedBaleen(passages_per_hop=2), trainset=trainset)
100%|██████████| 20/20 [00:00<00:00, 64.11it/s]
Evaluating the Retrieval

Earlier, it appeared like our simple RAG program was not very effective at finding all evidence required for answering each question. Is this resolved by the adding some extra steps in the forward function of SimplifiedBaleen? What about compiling, does it help for that?

The answer for these questions is not always going to be obvious. However, DSPy makes it extremely easy to try many diverse approaches with minimal effort.

Let’s evaluate the quality of retrieval of our compiled and uncompiled Baleen pipelines!

uncompiled_baleen_retrieval_score = evaluate_on_hotpotqa(uncompiled_baleen, metric=gold_passages_retrieved, display=False)
compiled_baleen_retrieval_score = evaluate_on_hotpotqa(compiled_baleen, metric=gold_passages_retrieved)
Average Metric: 30 / 50  (60.0): 100%|██████████| 50/50 [00:00<00:00, 54.98it/s]
Average Metric: 30 / 50  (60.0%)
  question example_answer gold_titles context pred_answer gold_passages_retrieved
0 Are both Cangzhou and Qionghai in the Hebei province of China? no {'Cangzhou', 'Qionghai'} ['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up ("or metro") area... No ✔️ [True]
1 Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season? National Hockey League {'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'} ["2017 NHL Expansion Draft | The 2017 NHL Expansion Draft was an expansion draft conducted by the National Hockey League on June 18–20, 2017 to... National Hockey League (NHL) ❌ [False]
2 The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay... Steve Yzerman {'2006–07 Detroit Red Wings season', 'Steve Yzerman'} ['List of Tampa Bay Lightning general managers | The Tampa Bay Lightning are an American professional ice hockey team based in Tampa, Florida. They play... Steve Yzerman ❌ [False]
3 What river is near the Crichton Collegiate Church? the River Tyne {'Crichton Castle', 'Crichton Collegiate Church'} ["Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is... River Tyne ✔️ [True]
4 In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king? King Alfred the Great {'Æthelweard (son of Alfred)', 'Ealhswith'} ['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.', 'Æthelred the Unready |... King Alfred the Great ❌ [False]
... 45 more rows not displayed ...
print(f"## Retrieval Score for RAG: {compiled_rag_retrieval_score}")  # note that for RAG, compilation has no effect on the retrieval step
print(f"## Retrieval Score for uncompiled Baleen: {uncompiled_baleen_retrieval_score}")
print(f"## Retrieval Score for compiled Baleen: {compiled_baleen_retrieval_score}")
## Retrieval Score for RAG: 26.0
## Retrieval Score for uncompiled Baleen: 36.0
## Retrieval Score for compiled Baleen: 60.0

Excellent! There might be something to this compiled, multi-hop program then. But this is far from all you can do: DSPy gives you a clean space of composable operators to deal with any shortcomings you see.

We can inspect a few concrete examples. If we see failure causes, we can:

  1. Expand our pipeline by using additional sub-modules (e.g., maybe summarize after retrieval?)
  2. Modify our pipeline by using more complex logic (e.g., maybe we need to break out of the multi-hop loop if we found all information we need?)
  3. Refine our validation logic (e.g., maybe use a metric that use a second DSPy program to do the answer evaluation, instead of relying on strict string matching)
  4. Use a different teleprompter to optimize your pipeline more aggressively.
  5. Add more or better training examples!

Or, if you really want, we can tweak the descriptions in the Signatures we use in your program to make them more precisely suited for their sub-tasks. This is akin to prompt engineering and should be a final resort, given the other powerful options that DSPy gives us!

compiled_baleen("How many storeys are in the castle that David Gregory inherited?")
turbo.inspect_history(n=3)




Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context: N/A

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield

Reasoning: Let's think step by step in order to produce the query. We know that the FA Charity Shield is an annual football match played in England between the winners of the previous season's Premier League and FA Cup. In this case, we are looking for the year when Manchester City played against a specific club in the 1972 FA Charity Shield. To find this information, we can search for the history of the FA Charity Shield and the teams that participated in the 1972 edition.

Query: "History of FA Charity Shield 1972"

---

Context: N/A

Question: Which is taller, the Empire State Building or the Bank of America Tower?

Reasoning: Let's think step by step in order to produce the query. We need to find the heights of both buildings and compare them.

Query: "height of Empire State Building" OR "height of Bank of America Tower"

---

Context: N/A

Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?

Reasoning: Let's think step by step in order to produce the query. We can search for the birth dates of both Aleksandr Danilovich Aleksandrov and Anatoly Fomenko and compare them to determine who is older.

Query: "Birth date Aleksandr Danilovich Aleksandrov" "Birth date Anatoly Fomenko"

---

Context: N/A

Question: How many storeys are in the castle that David Gregory inherited?

Reasoning: Let's think step by step in order to produce the query. We need to find information about the castle that David Gregory inherited and determine the number of storeys it has.

Query: "Castle inherited by David Gregory number of storeys"







Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context:
[1] «1972 FA Charity Shield | The 1972 FA Charity Shield was contested between Manchester City and Aston Villa.»
[2] «1971 FA Charity Shield | The 1971 FA Charity Shield was a football match between Leicester City and Liverpool at Filbert Street on Saturday 7 August 1971.»

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield

Reasoning: Let's think step by step in order to produce the query. We know that the 1972 FA Charity Shield was contested between Manchester City and Aston Villa. To find out the year the club founded that played against Manchester City, we need to search for the founding year of Aston Villa.

Query: "Aston Villa founding year"

---

Context:
[1] «Empire State Building | The Empire State Building is a 102-story skyscraper located on Fifth Avenue between West 33rd and 34th Streets in Midtown, Manhattan, New York City. It has a roof height of 1,250 feet (381 m), and with its antenna included, it stands a total of 1454 ft tall. Its name is derived from the nickname for New York, the Empire State.»
[2] «Bank of America Plaza (Atlanta) | Bank of America Plaza is a skyscraper located in between Midtown Atlanta and Downtown Atlanta. At 312 m , the tower is the 96th-tallest building in the world. It is the 14th tallest building in the U.S., the tallest building in Georgia and the tallest building in any U.S. state capital, overtaking the 250 m (820 ft), 50 story One Atlantic Center in height, which previously held the record as Georgia's tallest building. It has 55 stories of office space and was completed in 1992, when it was called NationsBank Plaza. Originally intended to be the headquarters for Citizens & Southern National Bank (which merged with Sovran Bank during construction), it became NationsBank's property following its formation in the 1991 hostile takeover of C&S/Sovran by NCNB.»

Question: Which is taller, the Empire State Building or the Bank of America Tower?

Reasoning: Let's think step by step in order to answer the question. We know that the Empire State Building has a roof height of 1,250 feet and a total height of 1,454 feet including its antenna. On the other hand, the Bank of America Plaza in Atlanta is 312 meters tall. To compare the heights of the two buildings, we need to convert the height of the Bank of America Plaza from meters to feet.

Query: "Convert 312 meters to feet"

---

Context:
[1] «Aleksandr Danilovich Aleksandrov | Aleksandr Danilovich Aleksandrov (Russian: Алекса́ндр Дани́лович Алекса́ндров , alternative transliterations: "Alexandr" or "Alexander" (first name), and "Alexandrov" (last name)) (August 4, 1912 – July 27, 1999), was a Soviet/Russian mathematician, physicist, philosopher and mountaineer.»
[2] «Aleksandr Pavlovich Aleksandrov | Aleksandr Pavlovich Aleksandrov (Russian: Александр Павлович Александров ; born February 20, 1943) is a former Soviet cosmonaut and twice Hero of the Soviet Union (November 23, 1983 and December 29, 1987).»

Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?

Reasoning: Let's think step by step in order to produce the query. We know that Aleksandr Danilovich Aleksandrov was born on August 4, 1912, but we don't have information about Anatoly Fomenko's birthdate. To find out who is older, we need to compare their birthdates.

Query: "Anatoly Fomenko birthdate"

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of Monte Cassino in Ascoli. In 1004, he fortified and expanded the castle of Dragonara on the Fortore. He gave it three circular towers and one square one. He also strengthened Lucera.»
[3] «Gregory Parsloe-Parsloe | Sir Gregory Parsloe-Parsloe, 7th Baronet is a fictional character from the Blandings stories of P. G. Wodehouse. The seventh Baronet, who resides at Matchingham Hall, he is the son of the Very Reverend Dean Parsloe-Parsloe and is the rival and enemy of Lord Emsworth, master of Blandings Castle.»

Question: How many storeys are in the castle that David Gregory inherited?

Reasoning: Let's think step by step in order to produce the query. We know that David Gregory inherited Kinnairdy Castle and that he had three of his twenty-nine children become mathematics professors. To find out how many storeys are in the castle, we need to search for information about Kinnairdy Castle.

Query: "Kinnairdy Castle number of storeys"







Answer questions with short factoid answers.

---

Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?
Answer: 2010

Question: On the coast of what ocean is the birthplace of Diogal Sakho?
Answer: Atlantic

Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?
Answer: Buena Vista Distribution

Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?
Answer: Tae Kwon Do Times

Question: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?
Answer: Rosario Dawson

Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ?
Answer: Kerry Condon

Question: which American actor was Candace Kita guest starred with
Answer: Bill Murray

Question: "Everything Has Changed" is a song from an album released under which record label ?
Answer: Big Machine Records

Question: Tombstone stared an actor born May 17, 1955 known as who?
Answer: Bill Paxton

Question: Which of these publications was most recently published, Who Put the Bomp or Self?
Answer: Self

Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?
Answer: Operation Citadel

Question: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?
Answer: space

Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?
Answer: "Outfield of Dreams"

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

---

Context:
[1] «1972 FA Charity Shield | The 1972 FA Charity Shield was contested between Manchester City and Aston Villa.»
[2] «1971 FA Charity Shield | The 1971 FA Charity Shield was a football match between Leicester City and Liverpool at Filbert Street on Saturday 7 August 1971.»
[3] «1896–97 Aston Villa F.C. season | The 1896-87 season was Aston Villa's ninth season in the Football League since being one of its 12 founding members in 1888.»
[4] «List of Aston Villa F.C. seasons | This is a list of seasons played by Aston Villa Football Club in English and European football, from 1879 (the year of the club's first FA Cup entry) to the most recent completed season. Aston Villa football club was founded in March, 1874, by members of the Villa Cross Wesleyan Chapel in Aston. Throughout the 1870s Aston Villa played a small amount of games. At least one game, against Aston Brook St Mary's was played with one half under Rugby rules and the other under football rules. In the 1880s the game became more formalised and in 1888, William McGregor formed the Football League with 11 other clubs.»

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield

Reasoning: Let's think step by step in order to produce the answer. We know that the 1972 FA Charity Shield was contested between Manchester City and Aston Villa. According to the context, Aston Villa Football Club was founded in March, 1874.

Answer: 1874

---

Context:
[1] «Empire State Building | The Empire State Building is a 102-story skyscraper located on Fifth Avenue between West 33rd and 34th Streets in Midtown, Manhattan, New York City. It has a roof height of 1,250 feet (381 m), and with its antenna included, it stands a total of 1454 ft tall. Its name is derived from the nickname for New York, the Empire State.»
[2] «Bank of America Plaza (Atlanta) | Bank of America Plaza is a skyscraper located in between Midtown Atlanta and Downtown Atlanta. At 312 m , the tower is the 96th-tallest building in the world. It is the 14th tallest building in the U.S., the tallest building in Georgia and the tallest building in any U.S. state capital, overtaking the 250 m (820 ft), 50 story One Atlantic Center in height, which previously held the record as Georgia's tallest building. It has 55 stories of office space and was completed in 1992, when it was called NationsBank Plaza. Originally intended to be the headquarters for Citizens & Southern National Bank (which merged with Sovran Bank during construction), it became NationsBank's property following its formation in the 1991 hostile takeover of C&S/Sovran by NCNB.»
[3] «Foot (unit) | The foot ( feet; abbreviation: ft; symbol: ′, the prime symbol) is a unit of length in the imperial and US customary systems of measurement. Since 1959, both units have been defined by international agreement as equivalent to 0.3048 meters exactly. In both systems, the foot comprises 12 inches and three feet compose a yard.»
[4] «Pentameter | Pentameter (from Greek: πεντάμετρος - 'measuring five (feet)') is a poetic meter. А poem is said to be written in a particular pentameter when the lines of the poem have the length of five feet, where 'foot' is a combination of a particular number (1 or 2) of unstressed (or weak) syllables and a stressed (or strong) syllable. Depending on the pattern of feet, pentameter can be iambic (one of three two-syllable meters alongside trochaic and spondaic) or dactylic (one of two three-syllable meters alongside anapestic) (see links below).»

Question: Which is taller, the Empire State Building or the Bank of America Tower?

Reasoning: Let's think step by step in order to produce the answer. We know that the Empire State Building has a roof height of 1,250 feet and with its antenna included, it stands a total of 1,454 feet tall. On the other hand, the Bank of America Plaza in Atlanta is 312 meters tall, which is approximately 1,024 feet. Therefore, the Empire State Building is taller than the Bank of America Plaza.

Answer: Empire State Building

---

Context:
[1] «Aleksandr Danilovich Aleksandrov | Aleksandr Danilovich Aleksandrov (Russian: Алекса́ндр Дани́лович Алекса́ндров , alternative transliterations: "Alexandr" or "Alexander" (first name), and "Alexandrov" (last name)) (August 4, 1912 – July 27, 1999), was a Soviet/Russian mathematician, physicist, philosopher and mountaineer.»
[2] «Aleksandr Pavlovich Aleksandrov | Aleksandr Pavlovich Aleksandrov (Russian: Александр Павлович Александров ; born February 20, 1943) is a former Soviet cosmonaut and twice Hero of the Soviet Union (November 23, 1983 and December 29, 1987).»
[3] «Anatoly Fomenko | Anatoly Timofeevich Fomenko (Russian: Анато́лий Тимофе́евич Фоме́нко ) (born 13 March 1945 in Stalino, USSR) is a Soviet and Russian mathematician, professor at Moscow State University, well known as a topologist, and a member of the Russian Academy of Sciences. He is author of a pseudoscientific theory known as New Chronology. He is also a member of the Russian Academy of Natural Sciences (1991).»
[4] «Semyon Fomin | Semyon Anatolyevich Fomin (Russian: Семён Анатольевич Фомин ; born 10 January 1989) is a Russian professional footballer. He plays as a midfielder for FC Luch-Energiya Vladivostok.»

Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?

Reasoning: Let's think step by step in order to produce the answer. We know that Aleksandr Danilovich Aleksandrov was born on August 4, 1912, and Anatoly Fomenko was born on March 13, 1945. Therefore, Aleksandr Danilovich Aleksandrov is older.

Answer: Aleksandr Danilovich Aleksandrov

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of Monte Cassino in Ascoli. In 1004, he fortified and expanded the castle of Dragonara on the Fortore. He gave it three circular towers and one square one. He also strengthened Lucera.»
[3] «Gregory Parsloe-Parsloe | Sir Gregory Parsloe-Parsloe, 7th Baronet is a fictional character from the Blandings stories of P. G. Wodehouse. The seventh Baronet, who resides at Matchingham Hall, he is the son of the Very Reverend Dean Parsloe-Parsloe and is the rival and enemy of Lord Emsworth, master of Blandings Castle.»
[4] «Kinnairdy Castle | Kinnairdy Castle is a tower house, having five storeys and a garret, two miles south of Aberchirder, Aberdeenshire, Scotland. The alternative name is Old Kinnairdy.»
[5] «Kinnaird Head | Kinnaird Head (Scottish Gaelic: "An Ceann Àrd" , "high headland") is a headland projecting into the North Sea, within the town of Fraserburgh, Aberdeenshire on the east coast of Scotland. The 16th-century Kinnaird Castle was converted in 1787 for use as the Kinnaird Head Lighthouse, the first lighthouse in Scotland to be lit by the Commissioners of Northern Lights. Kinnaird Castle and the nearby Winetower were described by W. Douglas Simpson as two of the nine castles of the Knuckle, referring to the rocky headland of north-east Aberdeenshire. Both buildings are category A listed buildings.»
[6] «Kinnaird Castle, Brechin | Kinnaird Castle is a 15th-century castle in Angus, Scotland. The castle has been home to the Carnegie family, the Earl of Southesk, for more than 600 years.»

Question: How many storeys are in the castle that David Gregory inherited?

Reasoning: Let's think step by step in order to produce the answer. We know that David Gregory inherited Kinnairdy Castle. According to the context, Kinnairdy Castle is a tower house with five storeys and a garret.

Answer: Five