DSPy: Tutorial @ SkyCamp

This notebook contains the DSPy tutorial for SkyCamp 2023.

Let’s begin by setting things up. The snippet below will also install DSPy if it’s not there already.

%load_ext autoreload
%autoreload 2

import sys
import os

try: # When on google Colab, let's clone the notebook so we download the cache.
    import google.colab
    repo_path = 'dspy'
    !git -C $repo_path pull origin || git clone https://github.com/stanfordnlp/dspy $repo_path
except:
    repo_path = '.'

if repo_path not in sys.path:
    sys.path.append(repo_path)

# Set up the cache for this notebook
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(repo_path, 'cache')

import pkg_resources # Install the package if it's not installed
if not "dspy-ai" in {pkg.key for pkg in pkg_resources.working_set}:
    !pip install -U pip
    # !pip install dspy-ai
    !pip install -e $repo_path

!pip install transformers

import dspy
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune

1) Configure the default LM and retriever

We’ll start by setting up the language model (LM) and retrieval model (RM). DSPy supports multiple API and local models.

In this notebook, we will use Llama2-13b-chat using the HuggingFace TGI serving software infrastructure. In principle you can run this on your own local GPUs, but for this tutorial all examples are pre-cached so you don’t need to worry about cost.

We will use the retriever ColBERTv2. To make things easy, we’ve set up a ColBERTv2 server hosting a Wikipedia 2017 “abstracts” search index (i.e., containing first paragraph of each article from this 2017 dump), so you don’t need to worry about setting one up! It’s free.

Note: If you run this notebook as instructed, you don’t need an API key. All examples are already cached internally so you can inspect them!

llama = dspy.HFClientTGI(model="meta-llama/Llama-2-13b-chat-hf", port=[7140, 7141, 7142, 7143], max_tokens=150)
colbertv2 = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

# # NOTE: After you finish this notebook, you can use GPT-3.5 like this if you like.
# turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct')
# # In that case, make sure to configure lm=turbo below if you choose to do that.

dspy.settings.configure(rm=colbertv2, lm=llama)

2) Create a few question–answer pairs for our task

train = [('Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?', 'Kevin Greutert'),
         ('The heir to the Du Pont family fortune sponsored what wrestling team?', 'Foxcatcher'),
         ('In what year was the star of To Hell and Back born?', '1925'),
         ('Which award did the first book of Gary Zukav receive?', 'U.S. National Book Award'),
         ('What documentary about the Gilgo Beach Killer debuted on A&E?', 'The Killing Season'),
         ('Which author is English: John Braine or Studs Terkel?', 'John Braine'),
         ('Who produced the album that included a re-recording of "Lithium"?', 'Butch Vig')]

train = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]

dev = [('Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?', 'E. L. Doctorow'),
       ('Right Back At It Again contains lyrics co-written by the singer born in what city?', 'Gainesville, Florida'),
       ('What year was the party of the winner of the 1971 San Francisco mayoral election founded?', '1828'),
       ('Anthony Dirrell is the brother of which super middleweight title holder?', 'Andre Dirrell'),
       ('The sports nutrition business established by Oliver Cookson is based in which county in the UK?', 'Cheshire'),
       ('Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.', 'February 13, 1980'),
       ('Kyle Moran was born in the town on what river?', 'Castletown River'),
       ("The actress who played the niece in the Priest film was born in what city, country?", 'Surrey, England'),
       ('Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.', 'Portrait of a Marriage'),
       ('What year was the father of the Princes in the Tower born?', '1442'),
       ('What river is near the Crichton Collegiate Church?', 'the River Tyne'),
       ('Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?', 'Renault'),
       ('André Zucca was a French photographer who worked with a German propaganda magazine published by what Nazi organization?', 'the Wehrmacht')]

dev = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]

3) Key Concepts: Signatures & Modules

# Define a dspy.Predict module with the signature `question -> answer` (i.e., takes a question and outputs an answer).
predict = dspy.Predict('question -> answer')

# Use the module!
predict(question="What is the capital of Germany?")

In the example above, we used the dspy.Predict module zero-shot, i.e. without compiling it on any examples.

Let’s now build a slightly more advanced program. Our program will use the dspy.ChainOfThought module, which asks the LM to think step by step.

We will call this program CoT.

class CoT(dspy.Module):  # let's define a new module
    def __init__(self):
        super().__init__()

        # here we declare the chain of thought sub-module, so we can later compile it (e.g., teach it a prompt)
        self.generate_answer = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.generate_answer(question=question)  # here we use the module

Now let’s compile this using our six train examples. We will us the very simple BootstrapFewShot in DSPy.

metric_EM = dspy.evaluate.answer_exact_match

teleprompter = BootstrapFewShot(metric=metric_EM, max_bootstrapped_demos=2)
cot_compiled = teleprompter.compile(CoT(), trainset=train)

Let’s ask a question to this new program.

cot_compiled("What is the capital of Germany?")

You might be curious what’s happening under the hood. Let’s inspect the last call to our Llama LM to see the prompt and the output.

llama.inspect_history(n=1)

Notice how the prompt ends with the question we asked (“What is the capital of Germany?”), but before that it includes few-shot examples.

The final example in the prompt contains a rationale (step-by-step reasoning) self-generated from the LM for use as a demonstration, for the training question “Which author is English: John Braine or Studs Terkel?”.

Now, let’s evaluate on our development set.

NUM_THREADS = 32
evaluate_hotpot = Evaluate(devset=dev, metric=metric_EM, num_threads=NUM_THREADS, display_progress=True, display_table=15)

First, let’s evaluate the compiled CoT program with Llama. Feel free to replace cot_compiled below with CoT() (notice the paranthesis) to test the zero-shot version of CoT.

evaluate_hotpot(cot_compiled)

4) Bonus 1: RAG with query generation

As a bonus, let’s define a more sophisticated program called RAG. This program will:

Use the LM to generate a search query based on the input question
Retrieve three passages using our retriever
Use the LM to generate a final answer using these passages

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        # declare three modules: the retriever, a query generator, and an answer generator
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_query = dspy.ChainOfThought("question -> search_query")
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # generate a search query from the question, and use it to retrieve passages
        search_query = self.generate_query(question=question).search_query
        passages = self.retrieve(search_query).passages

        # generate an answer from the passages and the question
        return self.generate_answer(context=passages, question=question)

Out of curiosity, we can evaluate the uncompiled (or zero-shot) version of this program.

evaluate_hotpot(RAG(), display_table=0)

Let’s now compile this RAG program. We’ll use a slightly more advnaced teleprompter (automatic prompt optimizer) this time, which relies on random search.

teleprompter2 = BootstrapFewShotWithRandomSearch(metric=metric_EM, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=NUM_THREADS)
rag_compiled = teleprompter2.compile(RAG(), trainset=train, valset=dev)

Let’s now evaluate this compiled version of RAG.

evaluate_hotpot(rag_compiled)

Let’s inspect one of the LM calls for this. Focus in particular on the structure of the last few input/output examples in the prompt.

rag_compiled("What year was the party of the winner of the 1971 San Francisco mayoral election founded?")
llama.inspect_history(n=1)

4) Bonus 2: Multi-Hop Retrieval and Reasoning

Let’s now build a simple multi-hop program, which will interleave multiple calls to the LM and the retriever.

Please follow the TODO instructions below to implement this.

from dsp.utils.utils import deduplicate

class MultiHop(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_query = dspy.ChainOfThought("question -> search_query")

        # TODO: Define a dspy.ChainOfThought module with the signature 'context, question -> search_query'.
        self.generate_query_from_context = None

        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        passages = []
        
        search_query = self.generate_query(question=question).search_query
        passages += self.retrieve(search_query).passages

        # TODO: Replace `None` with a call to self.generate_query_from_context to generate a search query.
        # Note: In DSPy, always pass keyword arguments (e.g., context=..., question=...) to the modules to avoid ambiguity.
        # Note 2: Don't forget to access the field .search_query to extract that from the output of the module.
        search_query2 = None

        # TODO: Replace `None` with a call to self.retrieve to retrieve passages. Append them to the list `passages`.
        passages += None

        return self.generate_answer(context=deduplicate(passages), question=question)

multihop_compiled = teleprompter2.compile(MultiHop(), trainset=train, valset=dev)

evaluate_hotpot(multihop_compiled, devset=dev)

Let’s now inspect the prompt for the second-hop search query for one of the questions.

multihop_compiled(question="Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?")
llama.inspect_history(n=1, skip=2)