As the capabilities of large language models (LLMs) continue to expand, developing robust AI systems that leverage their potential has become increasingly complex. Conventional approaches often involve intricate prompting techniques, data generation for fine-tuning, and manual guidance to ensure adherence to domain-specific constraints. However, this process can be tedious, error-prone, and heavily reliant on human intervention.
Enter DSPy, a revolutionary framework designed to streamline the development of AI systems powered by LLMs. DSPy introduces a systematic approach to optimizing LM prompts and weights, enabling developers to build sophisticated applications with minimal manual effort.
In this comprehensive guide, we’ll explore the core principles of DSPy, its modular architecture, and the array of powerful features it offers. We’ll also dive into practical examples, demonstrating how DSPy can transform the way you develop AI systems with LLMs.
What is DSPy, and Why Do You Need It?
DSPy is a framework that separates the flow of your program (modules
) from the parameters (LM prompts and weights) of each step. This separation allows for the systematic optimization of LM prompts and weights, enabling you to build complex AI systems with greater reliability, predictability, and adherence to domain-specific constraints.
Traditionally, developing AI systems with LLMs involved a laborious process of breaking down the problem into steps, crafting intricate prompts for each step, generating synthetic examples for fine-tuning, and manually guiding the LMs to adhere to specific constraints. This approach was not only time-consuming but also prone to errors, as even minor changes to the pipeline, LM, or data could necessitate extensive rework of prompts and fine-tuning steps.
DSPy addresses these challenges by introducing a new paradigm: optimizers. These LM-driven algorithms can tune the prompts and weights of your LM calls, given a metric you want to maximize. By automating the optimization process, DSPy empowers developers to build robust AI systems with minimal manual intervention, enhancing the reliability and predictability of LM outputs.
DSPy’s Modular Architecture
At the heart of DSPy lies a modular architecture that facilitates the composition of complex AI systems. The framework provides a set of built-in modules that abstract various prompting techniques, such as dspy.ChainOfThought
and dspy.ReAct
. These modules can be combined and composed into larger programs, allowing developers to build intricate pipelines tailored to their specific requirements.
Each module encapsulates learnable parameters, including the instructions, few-shot examples, and LM weights. When a module is invoked, DSPy’s optimizers can fine-tune these parameters to maximize the desired metric, ensuring that the LM’s outputs adhere to the specified constraints and requirements.
Optimizing with DSPy
DSPy introduces a range of powerful optimizers designed to enhance the performance and reliability of your AI systems. These optimizers leverage LM-driven algorithms to tune the prompts and weights of your LM calls, maximizing the specified metric while adhering to domain-specific constraints.
Some of the key optimizers available in DSPy include:
- BootstrapFewShot: This optimizer extends the signature by automatically generating and including optimized examples within the prompt sent to the model, implementing few-shot learning.
- BootstrapFewShotWithRandomSearch: Applies
BootstrapFewShot
several times with random search over generated demonstrations, selecting the best program over the optimization. - MIPRO: Generates instructions and few-shot examples in each step, with the instruction generation being data-aware and demonstration-aware. It uses Bayesian Optimization to effectively search over the space of generation instructions and demonstrations across your modules.
- BootstrapFinetune: Distills a prompt-based DSPy program into weight updates for smaller LMs, allowing you to fine-tune the underlying LLM(s) for enhanced efficiency.
By leveraging these optimizers, developers can systematically optimize their AI systems, ensuring high-quality outputs while adhering to domain-specific constraints and requirements.
Getting Started with DSPy
To illustrate the power of DSPy, let’s walk through a practical example of building a retrieval-augmented generation (RAG) system for question-answering.
Step 1: Setting up the Language Model and Retrieval Model
The first step involves configuring the language model (LM) and retrieval model (RM) within DSPy.
To install DSPy run:
pip install dspy-ai
DSPy supports multiple LM and RM APIs, as well as local model hosting, making it easy to integrate your preferred models.
import dspy # Configure the LM and RM turbo = dspy.OpenAI(model='gpt-3.5-turbo') colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts') dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)
Step 2: Loading the Dataset
Next, we’ll load the HotPotQA dataset, which contains a collection of complex question-answer pairs typically answered in a multi-hop fashion.
from dspy.datasets import HotPotQA # Load the dataset dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0) # Specify the 'question' field as the input trainset = [x.with_inputs('question') for x in dataset.train] devset = [x.with_inputs('question') for x in dataset.dev]
Step 3: Building Signatures
DSPy uses signatures to define the behavior of modules. In this example, we’ll define a signature for the answer generation task, specifying the input fields (context and question) and the output field (answer).
class GenerateAnswer(dspy.Signature): """Answer questions with short factoid answers.""" context = dspy.InputField(desc="may contain relevant facts") question = dspy.InputField() answer = dspy.OutputField(desc="often between 1 and 5 words")
Step 4: Building the Pipeline
We’ll build our RAG pipeline as a DSPy module, which consists of an initialization method (__init__) to declare the sub-modules (dspy.Retrieve and dspy.ChainOfThought) and a forward method (forward) to describe the control flow of answering the question using these modules.
class RAG(dspy.Module): def __init__(self, num_passages=3): super().__init__() self.retrieve = dspy.Retrieve(k=num_passages) self.generate_answer = dspy.ChainOfThought(GenerateAnswer) def forward(self, question): context = self.retrieve(question).passages prediction = self.generate_answer(context=context, question=question) return dspy.Prediction(context=context, answer=prediction.answer)
Step 5: Optimizing the Pipeline
With the pipeline defined, we can now optimize it using DSPy’s optimizers. In this example, we’ll use the BootstrapFewShot optimizer, which generates and selects effective prompts for our modules based on a training set and a metric for validation.
from dspy.teleprompt import BootstrapFewShot # Validation metric def validate_context_and_answer(example, pred, trace=None): answer_EM = dspy.evaluate.answer_exact_match(example, pred) answer_PM = dspy.evaluate.answer_passage_match(example, pred) return answer_EM and answer_PM # Set up the optimizer teleprompter = BootstrapFewShot(metric=validate_context_and_answer) # Compile the program compiled_rag = teleprompter.compile(RAG(), trainset=trainset)
Step 6: Evaluating the Pipeline
After compiling the program, it is essential to evaluate its performance on a development set to ensure it meets the desired accuracy and reliability.
from dspy.evaluate import Evaluate # Set up the evaluator evaluate = Evaluate(devset=devset, metric=validate_context_and_answer, num_threads=4, display_progress=True, display_table=0) # Evaluate the compiled RAG program evaluation_result = evaluate(compiled_rag) print(f"Evaluation Result: {evaluation_result}")
Step 7: Inspecting Model History
For a deeper understanding of the model’s interactions, you can review the most recent generations by inspecting the model’s history.
# Inspect the model's history turbo.inspect_history(n=1)
Step 8: Making Predictions
With the pipeline optimized and evaluated, you can now use it to make predictions on new questions.
# Example question question = "Which award did Gary Zukav's first book receive?" # Make a prediction using the compiled RAG program prediction = compiled_rag(question) print(f"Question: {question}") print(f"Answer: {prediction.answer}") print(f"Retrieved Contexts: {prediction.context}")
Minimal Working Example with DSPy
Now, let’s walk through another minimal working example using the GSM8K dataset and the OpenAI GPT-3.5-turbo model to simulate prompting tasks within DSPy.
Setup
First, ensure your environment is properly configured:
import dspy from dspy.datasets.gsm8k import GSM8K, gsm8k_metric # Set up the LM turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct', max_tokens=250) dspy.settings.configure(lm=turbo) # Load math questions from the GSM8K dataset gsm8k = GSM8K() gsm8k_trainset, gsm8k_devset = gsm8k.train[:10], gsm8k.dev[:10] print(gsm8k_trainset)
The gsm8k_trainset and gsm8k_devset datasets contain a list of examples with each example having a question and answer field.
Define the Module
Next, define a custom program utilizing the ChainOfThought module for step-by-step reasoning:
class CoT(dspy.Module): def __init__(self): super().__init__() self.prog = dspy.ChainOfThought("question -> answer") def forward(self, question): return self.prog(question=question)
Compile and Evaluate the Model
Now compile it with the BootstrapFewShot teleprompter:
from dspy.teleprompt import BootstrapFewShot # Set up the optimizer config = dict(max_bootstrapped_demos=4, max_labeled_demos=4) # Optimize using the gsm8k_metric teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config) optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset) # Set up the evaluator from dspy.evaluate import Evaluate evaluate = Evaluate(devset=gsm8k_devset, metric=gsm8k_metric, num_threads=4, display_progress=True, display_table=0) evaluate(optimized_cot) # Inspect the model's history turbo.inspect_history(n=1)
This example demonstrates how to set up your environment, define a custom module, compile a model, and rigorously evaluate its performance using the provided dataset and teleprompter configurations.
Data Management in DSPy
DSPy operates with training, development, and test sets. For each example in your data, you typically have three types of values: inputs, intermediate labels, and final labels. While intermediate or final labels are optional, having a few example inputs is essential.
Creating Example Objects
Example objects in DSPy are similar to Python dictionaries but come with useful utilities:
qa_pair = dspy.Example(question="This is a question?", answer="This is an answer.") print(qa_pair) print(qa_pair.question) print(qa_pair.answer)
Output:
Example({'question': 'This is a question?', 'answer': 'This is an answer.'}) (input_keys=None) This is a question? This is an answer.
Specifying Input Keys
In DSPy, Example objects have a with_inputs() method to mark specific fields as inputs:
print(qa_pair.with_inputs("question")) print(qa_pair.with_inputs("question", "answer"))
Values can be accessed using the dot operator, and methods like inputs() and labels() return new Example objects containing only input or non-input keys, respectively.
Optimizers in DSPy
A DSPy optimizer tunes the parameters of a DSPy program (i.e., prompts and/or LM weights) to maximize specified metrics. DSPy offers various built-in optimizers, each employing different strategies.
Available Optimizers
- BootstrapFewShot: Generates few-shot examples using provided labeled input and output data points.
- BootstrapFewShotWithRandomSearch: Applies BootstrapFewShot multiple times with random search over generated demonstrations.
- COPRO: Generates and refines new instructions for each step, optimizing them with coordinate ascent.
- MIPRO: Optimizes instructions and few-shot examples using Bayesian Optimization.
Choosing an Optimizer
If you’re unsure where to start, use BootstrapFewShotWithRandomSearch:
For very little data (10 examples), use BootstrapFewShot.
For slightly more data (50 examples), use BootstrapFewShotWithRandomSearch.
For larger datasets (300+ examples), use MIPRO.
Here’s how to use BootstrapFewShotWithRandomSearch:
from dspy.teleprompt import BootstrapFewShotWithRandomSearch config = dict(max_bootstrapped_demos=4, max_labeled_demos=4, num_candidate_programs=10, num_threads=4) teleprompter = BootstrapFewShotWithRandomSearch(metric=YOUR_METRIC_HERE, **config) optimized_program = teleprompter.compile(YOUR_PROGRAM_HERE, trainset=YOUR_TRAINSET_HERE)
Saving and Loading Optimized Programs
After running a program through an optimizer, save it for future use:
optimized_program.save(YOUR_SAVE_PATH)
Load a saved program:
loaded_program = YOUR_PROGRAM_CLASS() loaded_program.load(path=YOUR_SAVE_PATH)
Advanced Features: DSPy Assertions
DSPy Assertions automate the enforcement of computational constraints on LMs, enhancing the reliability, predictability, and correctness of LM outputs.
Using Assertions
Define validation functions and declare assertions following the respective model generation. For example:
dspy.Suggest( len(query) <= 100, "Query should be short and less than 100 characters", ) dspy.Suggest( validate_query_distinction_local(prev_queries, query), "Query should be distinct from: " + "; ".join(f"{i+1}) {q}" for i, q in enumerate(prev_queries)), )
Transforming Programs with Assertions
from dspy.primitives.assertions import assert_transform_module, backtrack_handler baleen_with_assertions = assert_transform_module(SimplifiedBaleenAssertions(), backtrack_handler)
Alternatively, activate assertions directly on the program:
baleen_with_assertions = SimplifiedBaleenAssertions().activate_assertions()
Assertion-Driven Optimizations
DSPy Assertions work with DSPy optimizations, particularly with BootstrapFewShotWithRandomSearch, including settings like:
- Compilation with Assertions
- Compilation + Inference with Assertions
Conclusion
DSPy offers a powerful and systematic approach to optimizing language models and their prompts. By following the steps outlined in these examples, you can build, optimize, and evaluate complex AI systems with ease. DSPy’s modular design and advanced optimizers allow for efficient and effective integration of various language models, making it a valuable tool for anyone working in the field of NLP and AI.
Whether you’re building a simple question-answering system or a more complex pipeline, DSPy provides the flexibility and robustness needed to achieve high performance and reliability.