Content Role: pillar

Prompt Engineering: Structured Output from LLMs

Function calling and JSON mode for reliable AI integrations

Large language models excel at generating human-like text, but production systems require predictable, machine-readable outputs. When you ask an LLM to extract entities from text or generate structured data, you need JSON with specific fields—not prose with the answer buried in markdown code blocks. This fundamental mismatch between conversational AI and programmatic interfaces creates integration challenges that naive prompting cannot solve.

The Structured Output Problem

Traditional prompt engineering relies on instructions embedded in natural language. You might write: "Extract the customer name, email, and order total from this text and return it as JSON." The model often complies, but the output format varies unpredictably. Sometimes you get valid JSON. Other times you receive markdown-wrapped code blocks, explanatory text before the JSON, or fields with slightly different names than specified.

This inconsistency breaks downstream systems. Your parser expects customer_email but receives email or customerEmail. The model adds a confidence_score field you didn't request. Or it returns a string when you need an array. Each variation requires defensive coding, error handling, and retry logic that compounds system complexity.

The root cause isn't model capability—modern LLMs understand JSON perfectly. The issue is that conversational interfaces prioritize helpfulness over strict adherence to schemas. Models are trained to be flexible, verbose, and explanatory. These traits help human users but sabotage programmatic consumers.

Modern Solutions: Function Calling and JSON Mode

Leading LLM providers now offer two mechanisms specifically designed for structured output: function calling (also called tool use) and JSON mode. Both approaches constrain model outputs to match predefined schemas, but they work differently and suit different use cases.

Function Calling: Schema-First Integration

Function calling treats LLM interactions as remote procedure calls. You define functions with typed parameters using JSON Schema, and the model returns structured arguments that match your specification exactly. The model doesn't execute functions—it generates the arguments you'd need to call them.

Here's a practical example using OpenAI's API with TypeScript:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const tools = [
  {
    type: 'function' as const,
    function: {
      name: 'extract_customer_order',
      description: 'Extract structured customer order information from text',
      parameters: {
        type: 'object',
        properties: {
          customer_name: {
            type: 'string',
            description: 'Full name of the customer',
          },
          email: {
            type: 'string',
            format: 'email',
            description: 'Customer email address',
          },
          order_items: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                product: { type: 'string' },
                quantity: { type: 'integer' },
                price: { type: 'number' },
              },
              required: ['product', 'quantity', 'price'],
            },
          },
          total_amount: {
            type: 'number',
            description: 'Total order amount in USD',
          },
        },
        required: ['customer_name', 'email', 'order_items', 'total_amount'],
      },
    },
  },
];

async function extractOrderData(emailText: string) {
  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'user',
        content: `Extract order information from this email: ${emailText}`,
      },
    ],
    tools,
    tool_choice: { type: 'function', function: { name: 'extract_customer_order' } },
  });

  const toolCall = response.choices[0].message.tool_calls?.[0];
  if (toolCall?.function.name === 'extract_customer_order') {
    return JSON.parse(toolCall.function.arguments);
  }

  throw new Error('Model did not return expected function call');
}

The tool_choice parameter forces the model to use your function, eliminating conversational responses. The output strictly conforms to your JSON Schema, including type constraints and required fields.

JSON Mode: Flexible Structured Output

JSON mode guarantees the model returns valid JSON without requiring function definitions. This approach works well when you need structured data but your schema varies by request or you want the model to determine the structure.

async function analyzeWithJsonMode(prompt: string) {
  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: 'You are a data extraction assistant. Always respond with valid JSON.',
      },
      {
        role: 'user',
        content: prompt,
      },
    ],
    response_format: { type: 'json_object' },
  });

  return JSON.parse(response.choices[0].message.content || '{}');
}

// Usage
const result = await analyzeWithJsonMode(`
  Analyze this customer feedback and return JSON with sentiment, key_issues (array), 
  and priority (high/medium/low):

  "The product arrived damaged and customer service was unhelpful. 
  I've been waiting 3 days for a response."
`);

JSON mode requires you to explicitly request JSON in your prompt. The model has freedom in structuring the response but guarantees syntactic validity. This flexibility helps with exploratory tasks where rigid schemas are premature.

Choosing Between Approaches

Use function calling when:

You need guaranteed schema compliance for downstream systems
Type safety matters (integers vs strings, required vs optional fields)
You're building APIs or data pipelines with strict contracts
You want to validate outputs against JSON Schema automatically

Use JSON mode when:

Schema requirements are flexible or vary by request
You're prototyping and iterating on data structures
The model should determine appropriate fields based on input
You need valid JSON but not strict type enforcement

Implementation Patterns for Production

Schema Validation and Type Safety

Even with function calling, validate outputs before using them. Models occasionally generate values that satisfy JSON Schema but violate business logic:

import Ajv from 'ajv';
import addFormats from 'ajv-formats';

const ajv = new Ajv();
addFormats(ajv);

interface OrderData {
  customer_name: string;
  email: string;
  order_items: Array<{
    product: string;
    quantity: number;
    price: number;
  }>;
  total_amount: number;
}

function validateAndParse(rawData: unknown): OrderData {
  const schema = {
    type: 'object',
    properties: {
      customer_name: { type: 'string', minLength: 1 },
      email: { type: 'string', format: 'email' },
      order_items: {
        type: 'array',
        minItems: 1,
        items: {
          type: 'object',
          properties: {
            product: { type: 'string', minLength: 1 },
            quantity: { type: 'integer', minimum: 1 },
            price: { type: 'number', minimum: 0 },
          },
          required: ['product', 'quantity', 'price'],
        },
      },
      total_amount: { type: 'number', minimum: 0 },
    },
    required: ['customer_name', 'email', 'order_items', 'total_amount'],
  };

  const validate = ajv.compile(schema);
  if (!validate(rawData)) {
    throw new Error(`Validation failed: ${JSON.stringify(validate.errors)}`);
  }

  return rawData as OrderData;
}

Retry Logic with Exponential Backoff

LLM APIs occasionally fail or return malformed data. Implement retries with validation:

async function extractWithRetry<T>(
  extractFn: () => Promise<unknown>,
  validateFn: (data: unknown) => T,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const rawData = await extractFn();
      return validateFn(rawData);
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, Math.pow(2, attempt) * 1000));
    }
  }
  throw new Error('Max retries exceeded');
}

Prompt Engineering for Better Structured Output

Even with function calling, prompt quality affects results. Provide clear context and examples:

const systemPrompt = `You are a precise data extraction system. 
Extract information exactly as it appears in the source text.
If a field is not present, omit it rather than guessing.
For numerical values, extract only the number without currency symbols or units.`;

const userPrompt = `Extract order data from this email:

"Hi John Smith (john@example.com),

Your order is confirmed:
- 2x Wireless Mouse ($29.99 each)
- 1x Keyboard ($79.99)

Total: $139.97

Thanks!"`;

Common Pitfalls and Solutions

Hallucinated Fields: Models sometimes invent data to fill required fields. Mark uncertain fields as optional in your schema and validate against source text.

Type Coercion Issues: A model might return "42" when you need 42. Use strict JSON Schema types and validate with libraries that enforce type correctness.

Nested Structure Complexity: Deep nesting confuses models. Flatten schemas when possible or break extraction into multiple calls.

Token Limits: Large schemas consume tokens. For complex structures, extract incrementally or use smaller, focused functions.

Inconsistent Enum Values: When using enums, provide examples in descriptions: "status": { "enum": ["pending", "shipped", "delivered"], "description": "Order status. Example: 'shipped'" }

Best Practices Checklist

Define explicit JSON Schemas with descriptions for all fields
Use TypeScript interfaces that mirror your schemas for type safety
Validate all LLM outputs before using them in production code
Implement retry logic with exponential backoff for transient failures
Log raw LLM responses for debugging and monitoring
Use tool_choice to force function calling when schema compliance is critical
Test edge cases: missing data, malformed input, unexpected formats
Monitor schema validation failure rates to detect model drift
Version your schemas and maintain backward compatibility
Cache results for identical inputs to reduce costs and latency

Frequently Asked Questions

What's the difference between function calling and JSON mode?

Function calling enforces strict schema compliance using JSON Schema definitions. The model must return data matching your exact specification. JSON mode only guarantees syntactically valid JSON—the structure is determined by your prompt instructions rather than a formal schema.

Can I use multiple functions in a single request?

Yes. Define multiple tools and let the model choose which to call, or use parallel function calling (supported by GPT-4 and Claude 3+) to extract different data types simultaneously. This reduces round trips for complex extractions.

How do I handle optional fields in function calling?

Omit fields from the required array in your JSON Schema. The model will include them when data is available and omit them otherwise. Always check for field presence in your code before accessing values.

What happens if the model can't extract requested data?

With function calling, the model attempts to call the function anyway, potentially with null values or empty strings. Design schemas to make critical fields required and validate that extracted data is meaningful, not just schema-compliant.

Should I use GPT-4, Claude, or Gemini for structured output?

All three support function calling and structured output. GPT-4o offers the most mature implementation with parallel function calling. Claude 3.5 Sonnet excels at complex reasoning within structured tasks. Gemini 1.5 Pro provides the largest context window for processing extensive documents. Choose based on your specific latency, cost, and accuracy requirements.

How do I debug when structured output fails?

Log the complete API response including raw function arguments or JSON content. Check if the model attempted the correct function. Validate your schema syntax using a JSON Schema validator. Simplify your schema to isolate which fields cause issues. Review your prompt for ambiguity or conflicting instructions.

Can I use structured output with streaming responses?

Function calling typically requires complete responses since arguments must be fully formed. Some providers support streaming JSON mode where you receive tokens incrementally, but you must parse incomplete JSON carefully. For production systems, non-streaming requests provide more reliable structured output.

Prompt Engineering: Structured Output from LLMs

Prompt Engineering: Structured Output from LLMs

Function calling and JSON mode for reliable AI integrations

The Structured Output Problem

Modern Solutions: Function Calling and JSON Mode

Function Calling: Schema-First Integration

JSON Mode: Flexible Structured Output

Choosing Between Approaches

Implementation Patterns for Production

Schema Validation and Type Safety

Retry Logic with Exponential Backoff

Prompt Engineering for Better Structured Output

Common Pitfalls and Solutions

Best Practices Checklist

Frequently Asked Questions

Comments

More from this blog

Embedding-First Architecture for Real-World LLM Apps

AI/ML Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Containers/K8s Modern Patterns

Command Palette

Prompt Engineering: Structured Output from LLMs

Function calling and JSON mode for reliable AI integrations

The Structured Output Problem

Modern Solutions: Function Calling and JSON Mode

Function Calling: Schema-First Integration

JSON Mode: Flexible Structured Output

Choosing Between Approaches

Implementation Patterns for Production

Schema Validation and Type Safety

Retry Logic with Exponential Backoff

Prompt Engineering for Better Structured Output

Common Pitfalls and Solutions

Best Practices Checklist

Frequently Asked Questions

Comments

More from this blog