Introduction to `guidance`

This notebook is a terse tutorial walkthrough of the syntax of guidance.

Models

At the core of any guidance program are model objects that handle the actual text generation. You can create a model object using any of the constructors under guidance.models. Models can be run on local hardware or accessed through cloud APIs.

Support for Guidance’s constrained generation features varies by provider. The table below summarizes current model support across different providers. Refer to other notebooks for detailed documentation on using specific models with Guidance.

Model Support Table

Provider	Type	Constrained Generation Support
OpenAI	Remote	JSON generation
Azure OpenAI	Remote	JSON generation
LlamaCpp	Local	Full support
HuggingFace Transformers	Local	Full support (may be slower and more memory-intensive than LlamaCpp)

Support for more remote providers is a work in progress.

Suggested Local Models

For the smoothest experience, use one of the models below with the Transformers library. These models are known to work well with Guidance.

Model	Constructor	Notes
GPT-2	`models.Transformers("openai-community/gpt2")`	Runs on any hardware but does not generate very coherent text
Qwen 2.5 Family	`models.Transformers("Qwen/Qwen2.5-1.5B")`	Modest hardware requirements; alternative sizes available at Qwen 2.5 collection
Mistral 7B v0.2	`models.Transformers("mistralai/Mistral-7B-Instruct-v0.2")`	16 GB of VRAM recommended
Phi 4 mini	`models.Transformers("microsoft/Phi-4-mini-instruct")`	Solid all-around model; 16 GB of VRAM recommended

Using LlamaCpp

LlamaCpp offers better performance but may require more troubleshooting. For LlamaCpp, you need to provide the path on disk to a .gguf model file. One option used internally by the Guidance team is the bartowski/Llama-3.2-3B-Instruct-GGUF, specifically Llama-3.2-3B-Instruct-Q6_K_L.gguf. See the LlamaCpp Guidance documentation for detailed instructions about LlamaCpp usage.

Example

[1]:

from guidance import models

model = models.Transformers("Qwen/Qwen2.5-1.5B")

# If you want to use the suggested LlamaCpp model
# from huggingface_hub import hf_hub_download
# model = models.LlamaCpp(
#     hf_hub_download(
#         repo_id="bartowski/Llama-3.2-3B-Instruct-GGUF",
#         filename="Llama-3.2-3B-Instruct-Q6_K_L.gguf",
#     ),
#     verbose=True,
#     n_ctx=4096,
# )

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Non-Nvidia GPU monitoring is not supported in this version. NVML Shared Library Not Found

Simple generation

Once you have an initial model object you can append text to it with the addition operator. This creates a new model object that has the same context (prompt) as the original model, but with the text appended at the end (just like what would happen if you add two strings together).

[2]:

import guidance
lm = model + "Who won the last Kentucky derby and by how much?"

Once you have added some text to the model you can then ask the model to generate unconstrained text using the gen guidance function. Guidance functions represent executable components that can be appended to a model. When you append a guidance function to a model the model extends its state by executing the guidance function.

Note that while the lm and model objects are semantically separate, for performance purposes they share the same model weights and KV cache, so the incremental creation of new lm objects is very cheap and reuses all the computation from prior objects.

We can add the text and the gen function in one statement to follow the traditional prompt-then-generate pattern:

[3]:

from guidance import gen

model + '''\
Q: Who won the last Kentucky derby and by how much?
A:''' + gen(stop="Q:", max_tokens=50)

[3]:

<guidance.models._transformers.Transformers at 0x16c328e10>

Simple templates

You can define a template in guidance (v0.1+) using f-strings. You can interpolate both standard variables and also guidance functions. Note that in Python 3.12 you can put anything into f-string slots, but in python 3.11 and below there are a few disallowed characters (like backslash).

[4]:

query = "Who won the last Kentucky derby and by how much?"
model + f'''\
Q: {query}
A: {gen(stop="Q:", max_tokens=40)}'''

[4]:

<guidance.models._transformers.Transformers at 0x16c3346d0>

Capturing variables

Often when you are building a guidance program you will want to capture specific portions of the output generated by the model. You can do this by giving a name to the element you wish to capture.

[5]:

query = "Who won the last Kentucky derby and by how much?"
lm = model + f'''\
Q: {query}
A: {gen(name="answer", stop="Q:", max_tokens=50)}'''

Then we can access the variable by indexing into the final model object.

[6]:

lm["answer"]

[6]:

'1999 Derby winner, War Emblem, won by 1 1/2 lengths over 2009 Derby winner, Real Quiet.\n'

Function encapsulation

When you have a set of model operations you want to group together, you can place them into a custom guidance function. To do this you define a decorated python function that takes a model as the first positional argument and returns a new updated model. You can add this guidance function to a model to execute it, just like with the built-in guidance functions like gen.

[7]:

import guidance

@guidance
def qa_bot(lm, query):
    lm += f'''\
    Q: {query}
    A: {gen(name="answer", stop="Q:")}'''
    return lm

query = "Who won the last Kentucky derby and by how much?"
model + qa_bot(query) # note we don't pass the `lm` arg here (that will get passed during execution when it gets added to the model)

[7]:

<guidance.models._transformers.Transformers at 0x16c73db10>

Note that one atypical feature of guidance functions is that multi-line string literals defined inside a guidance function respect the python indentation structure. This means that the whitespace before “Q:” and “A:” in the prompt above is stripped (but if they were indented 6 spaces instead of 4 spaces then only the first 4 spaces would be stripped, since that is the current python indentation level). This allows us to define multi-line templates inside guidance functions while retaining indentation readability (if you ever want to disable this behavior you can use @guidance(dedent=False)).

[8]:

# Demonstrating the dedent behavior
@guidance(dedent=False)
def qa_bot(lm, query):
    lm += f'''\
    Q: {query}
    A: {gen(name="answer", stop="Q:")}'''
    return lm

query = "Who won the last Kentucky derby and by how much?"
model + qa_bot(query)

[8]:

<guidance.models._transformers.Transformers at 0x16c2f3650>

Selecting among alternatives

Guidance has lots of ways to constrain model generation, but the most basic buliding block is the select function that forces the model to choose between a set of options (either strings or full grammars).

[9]:

from guidance import select

model + f'''\
Q: {query}
Now I will choose to either SEARCH the web or RESPOND.
Choice: {select(["SEARCH", "RESPOND"], name="choice")}'''

[9]:

<guidance.models._transformers.Transformers at 0x16c55d850>

Note that since guidance is smart about when tokens are forced by the program (and so don’t need to be predicted by the model) only one token was generated in the program above (the beginning of “SEARCH” that is highlighted in green).

Interleaved generation and control

Because guidance is pure Python code you can interleave (constrained) generation commands with traditional python control statements. In the example below we first ask the model to decide if it should search the web or respond directly, then act accordingly.

[10]:

@guidance
def qa_bot(lm, query):
    lm += f'''\
    Q: {query}
    Now I will choose to either SEARCH the web or RESPOND.
    Choice: {select(["SEARCH", "RESPOND"], name="choice")}
    '''
    if lm["choice"] == "SEARCH":
        lm += "A: I don't know, Google it!"
    else:
        lm += f'A: {gen(stop="Q:", name="answer")}'
    return lm

model + qa_bot(query)

[10]:

<guidance.models._transformers.Transformers at 0x16c7127d0>

Generating lists

Whenever you want to generate a list of items you can use the list_append parameter which will cause the captured value to be appended to a list instead of overwriting previous values.

[11]:

lm = model + f'''\
Q: {query}
Now I will choose to either SEARCH the web or RESPOND.
Choice: {select(["SEARCH", "RESPOND"], name="choice")}
'''
if lm["choice"] == "SEARCH":
    lm += "Here are 3 search queries:\n"
    for i in range(3):
        lm += f'''{i+1}. "{gen(stop='"', name="queries", temperature=1.0, list_append=True)}"\n'''

[12]:

lm["queries"]

[12]:

['who has won the last kentucky derby and by how much',
 'who has won the kentucky derby finale and by how much',
 'who has won the kentucky derby and by how much']

Chat

You can control chat models using special with context blocks that wrap whatever is inside them with the special formats needed for the chat model you are using. This allows you express chat programs without tying yourself to a single model backend.

[13]:

from guidance import models
# to use role based chat tags you need a chat model, here we use gpt-4o-mini
gpt4o = models.OpenAI("gpt-4o-mini")

[14]:

from guidance import system, user, assistant, gen

with system():
    lm = gpt4o + "You are a helpful assistant."

with user():
    lm += "What is the meaning of life?"

with assistant():
    lm += gen("response")

with user():
    lm += "And how about why the sky is blue?"

with assistant():
    lm += gen("response")

with user():
    lm += "Are you sure?"

with assistant():
    lm += gen("response")

Multistep

[15]:

# you can create and guide multi-turn conversations by using a series of role tags
@guidance
def experts(lm, query):
    with system():
        lm += "You are a helpful assistant."

    with user():
        lm += f"""\
        I want a response to the following question:
        {query}
        Who are 3 world-class experts (past or present) who would be great at answering this?
        Please don't answer the question or comment on it yet."""

    with assistant():
        lm += gen(name='experts', max_tokens=300)

    with user():
        lm += f"""\
        Great, now please answer the question as if these experts had collaborated in writing a joint anonymous answer.
        In other words, their identity is not revealed, nor is the fact that there is a panel of experts answering the question.
        If the experts would disagree, just present their different positions as alternatives in the answer itself (e.g. 'some might argue... others might argue...').
        Please start your answer with ANSWER:"""

    with assistant():
        lm += gen(name='answer', max_tokens=500)

    return lm

gpt4o + experts(query='What is the meaning of life?')

[15]:

<guidance.models._openai.OpenAI at 0x3dcfef1d0>

Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!

Introduction to guidance