Skip to content

ADR-0011: Auto-Generated Pydantic Models from OpenAPI

Status

Accepted (Revised)

Date: 2025-10-30 (Original) Updated: 2025-12-08

Context

The generated attrs models from OpenAPI represent API request/response structures with Unset sentinel values, nested complexity, and mixed concerns. While suitable for API transport, they are suboptimal for:

  1. ETL and Data Processing: Unset sentinels complicate data export and transformation
  2. Business Logic: Methods for display formatting, search, validation belong on domain models
  3. Type Safety: Unset sentinels require constant checking (if not isinstance(x, Unset))
  4. Immutability: No built-in immutability guarantees for safer data handling
  5. JSON Schema Generation: attrs doesn't provide JSON schema for documentation/validation

Users need clean, business-focused models that represent "the thing itself" rather than "how to transport the thing".

Decision

We will auto-generate Pydantic v2 models from the same OpenAPI spec using datamodel-code-generator, providing a parallel model layer:

  1. Auto-Generation: Use datamodel-code-generator to generate Pydantic models from docs/katana-openapi.yaml
  2. Domain Grouping: Split generated models into domain-grouped files using AST parsing (base, common, inventory, stock, sales_orders, purchase_orders, manufacturing, contacts, webhooks, errors)
  3. Registry: Maintain bidirectional mapping between attrs and Pydantic classes for easy conversion
  4. Pydantic v2 Config: Models use frozen=True, extra="forbid", validate_assignment=True for safety

Architecture

katana_public_api_client/
├── models/                    # attrs models (generated by openapi-python-client)
├── models_pydantic/
│   ├── __init__.py           # Public exports
│   ├── _base.py              # KatanaPydanticBase with conversion methods
│   ├── _registry.py          # attrs↔pydantic class mappings
│   ├── _auto_registry.py     # Auto-generated registry entries
│   └── _generated/           # Auto-generated domain files
│       ├── __init__.py       # Re-exports all models
│       ├── base.py           # BaseEntity hierarchy
│       ├── common.py         # Shared types, enums, utilities
│       ├── inventory.py      # Products, Materials, Variants
│       ├── stock.py          # Batches, Stock levels
│       ├── sales_orders.py
│       ├── purchase_orders.py
│       ├── manufacturing.py
│       ├── contacts.py       # Customers, Suppliers
│       ├── webhooks.py
│       └── errors.py

Generation Process

The generation script (scripts/generate_pydantic_models.py):

  1. Runs datamodel-codegen with config from pyproject.toml
  2. Parses generated code using Python AST
  3. Groups classes by domain using pattern matching
  4. Fixes datamodel-codegen issues:
  5. MRO (Method Resolution Order) conflicts
  6. String enum defaults (→ enum member references)
  7. Invalid union_mode without discriminators
  8. Writes domain-grouped module files
  9. Generates registry mappings for attrs↔pydantic conversion
  10. Runs ruff format/fix

Conversion Methods

from katana_public_api_client.models_pydantic import Product as PydanticProduct
from katana_public_api_client.models import Product as AttrsProduct

# Convert attrs → pydantic
pydantic_product = PydanticProduct.from_attrs(attrs_product)

# Convert pydantic → attrs
attrs_product = pydantic_product.to_attrs()

Consequences

Positive Consequences

  • Full Coverage: All 287+ models generated automatically
  • Type Safety: Pydantic validation, clean Optional[T] types
  • Immutability: Frozen by default, prevents accidental mutations
  • JSON Schema: Automatic generation for documentation
  • MCP Integration: Clean, validated data for LLM contexts
  • Bidirectional: Easy conversion between attrs and Pydantic layers

Negative Consequences

  • Two Model Layers: Developers must understand attrs (transport) vs Pydantic
  • Conversion Overhead: Small performance cost for conversion
  • Generation Complexity: Custom script to fix datamodel-codegen issues
  • Naming Conflicts: Some generated names are awkward (Status7, CustomField3)

Neutral Consequences

  • Generated Code Unchanged: attrs models remain unmodified
  • Incremental Adoption: Use Pydantic layer where beneficial
  • Regeneration Required: When OpenAPI spec changes, regenerate both layers

Implementation Notes

Required Dependencies

dependencies = [
  "pydantic>=2,<3",
  "email-validator>=2.0.0",  # Required for EmailStr fields
]

datamodel-codegen Configuration (pyproject.toml)

[tool.datamodel-codegen]
input = "docs/katana-openapi.yaml"
input-file-type = "openapi"
output-model-type = "pydantic_v2.BaseModel"
use-annotated = true
use-standard-collections = true
field-constraints = true
use-union-operator = true
target-python-version = "3.11"
base-class = "katana_public_api_client.models_pydantic._base.KatanaPydanticBase"

Alternatives Considered

Alternative 1: Hand-Crafted Domain Models

  • Description: Manually write Pydantic classes for key entities only
  • Pros: Clean names, selective coverage, business methods
  • Cons: High maintenance, incomplete coverage, manual sync required
  • Why Rejected: Auto-generation provides complete, consistent coverage

Alternative 2: Regenerate with Pydantic Generator Only

  • Description: Replace openapi-python-client with pydantic-based generator
  • Pros: Single model layer
  • Cons: Would require rewriting all existing code, loss of httpx patterns
  • Why Rejected: Too disruptive, attrs layer works well for API transport

Alternative 3: Runtime Conversion Only

  • Description: Convert attrs→dict→Pydantic at runtime
  • Pros: No generation needed
  • Cons: No static typing, runtime overhead, no IDE support
  • Why Rejected: Static types and IDE support are essential

References