{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to `guidance`\n", "\n", "This notebook is a terse tutorial walkthrough of the syntax of `guidance`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Models\n", "\n", "At the core of any guidance program are model objects that handle the actual text generation. You can create a model object using any of the constructors under `guidance.models`. Models can be run on local hardware or accessed through cloud APIs. \n", "\n", "Support for Guidance's constrained generation features varies by provider. The table below summarizes current model support across different providers. Refer to other notebooks for detailed documentation on using specific models with Guidance.\n", "\n", "### Model Support Table\n", "\n", "| Provider | Type | Constrained Generation Support |\n", "|----------|------|-------------------------------|\n", "| OpenAI | Remote | JSON generation |\n", "| Azure OpenAI | Remote | JSON generation |\n", "| LlamaCpp | Local | Full support |\n", "| HuggingFace Transformers | Local | Full support (may be slower and more memory-intensive than LlamaCpp) |\n", "\n", "Support for more remote providers is a work in progress.\n", "\n", "### Suggested Local Models\n", "\n", "For the smoothest experience, use one of the models below with the Transformers library. These models are known to work well with Guidance.\n", "\n", "| Model | Constructor | Notes |\n", "|-------|-------------|-------|\n", "| GPT-2 | `models.Transformers(\"openai-community/gpt2\")` | Runs on any hardware but does not generate very coherent text |\n", "| Qwen 2.5 Family | `models.Transformers(\"Qwen/Qwen2.5-1.5B\")` | Modest hardware requirements; alternative sizes available at [Qwen 2.5 collection](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) |\n", "| Mistral 7B v0.2 | `models.Transformers(\"mistralai/Mistral-7B-Instruct-v0.2\")` | 16 GB of VRAM recommended |\n", "| Phi 4 mini | `models.Transformers(\"microsoft/Phi-4-mini-instruct\")` | Solid all-around model; 16 GB of VRAM recommended |\n", "\n", "### Using LlamaCpp\n", "\n", "LlamaCpp offers better performance but may require more troubleshooting. For LlamaCpp, you need to provide the path on disk to a `.gguf` model file. One option used internally by the Guidance team is the [bartowski/Llama-3.2-3B-Instruct-GGUF](https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF), specifically `Llama-3.2-3B-Instruct-Q6_K_L.gguf`. See the **LlamaCpp Guidance documentation** for detailed instructions about LlamaCpp usage.\n", "\n", "## Example" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.\n", "Non-Nvidia GPU monitoring is not supported in this version. NVML Shared Library Not Found\n" ] } ], "source": [ "from guidance import models\n", "\n", "model = models.Transformers(\"Qwen/Qwen2.5-1.5B\")\n", "\n", "# If you want to use the suggested LlamaCpp model\n", "# from huggingface_hub import hf_hub_download\n", "# model = models.LlamaCpp(\n", "# hf_hub_download(\n", "# repo_id=\"bartowski/Llama-3.2-3B-Instruct-GGUF\",\n", "# filename=\"Llama-3.2-3B-Instruct-Q6_K_L.gguf\",\n", "# ),\n", "# verbose=True,\n", "# n_ctx=4096,\n", "# )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Simple generation\n", "\n", "Once you have an initial model object you can append text to it with the addition operator. This creates a new model object that has the same context (prompt) as the original model, but with the text appended at the end (just like what would happen if you add two strings together)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "18c8754c2ffd439aa4ccc170193b685d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "StitchWidget(initial_height='auto', initial_width='100%', srcdoc='\\n\\n
\\n …" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import guidance\n", "lm = model + \"Who won the last Kentucky derby and by how much?\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have added some text to the model you can then ask the model to generate unconstrained text using the `gen` guidance function. Guidance functions represent executable components that can be appended to a model. When you append a guidance function to a model the model extends its state by executing the guidance function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that while the `lm` and `model` objects are semantically separate, for performance purposes they share the same model weights and KV cache, so the incremental creation of new lm objects is very cheap and reuses all the computation from prior objects.\n", "\n", "We can add the text and the `gen` function in one statement to follow the traditional prompt-then-generate pattern:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e45c4c42a2284ff4bad5693fa7710316", "version_major": 2, "version_minor": 0 }, "text/plain": [ "StitchWidget(initial_height='auto', initial_width='100%', srcdoc='\\n\\n\\n …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "