ADR-0011: Auto-Generated Pydantic Models from OpenAPI¶

Status¶

Accepted (Revised)

Date: 2025-10-30 (Original) Updated: 2025-12-08

Context¶

The generated attrs models from OpenAPI represent API request/response structures with Unset sentinel values, nested complexity, and mixed concerns. While suitable for API transport, they are suboptimal for:

ETL and Data Processing: Unset sentinels complicate data export and transformation
Business Logic: Methods for display formatting, search, validation belong on domain models
Type Safety: Unset sentinels require constant checking (if not isinstance(x, Unset))
Immutability: No built-in immutability guarantees for safer data handling
JSON Schema Generation: attrs doesn't provide JSON schema for documentation/validation

Users need clean, business-focused models that represent "the thing itself" rather than "how to transport the thing".

Decision¶

We will auto-generate Pydantic v2 models from the same OpenAPI spec using datamodel-code-generator, providing a parallel model layer:

Auto-Generation: Use datamodel-code-generator to generate Pydantic models from docs/katana-openapi.yaml
Domain Grouping: Split generated models into domain-grouped files using AST parsing (base, common, inventory, stock, sales_orders, purchase_orders, manufacturing, contacts, webhooks, errors)
Registry: Maintain bidirectional mapping between attrs and Pydantic classes for easy conversion
Pydantic v2 Config: Models use frozen=True, extra="forbid", validate_assignment=True for safety

Architecture¶

katana_public_api_client/
├── models/                    # attrs models (generated by openapi-python-client)
├── models_pydantic/
│   ├── __init__.py           # Public exports
│   ├── _base.py              # KatanaPydanticBase with conversion methods
│   ├── _registry.py          # attrs↔pydantic class mappings
│   ├── _auto_registry.py     # Auto-generated registry entries
│   └── _generated/           # Auto-generated domain files
│       ├── __init__.py       # Re-exports all models
│       ├── base.py           # BaseEntity hierarchy
│       ├── common.py         # Shared types, enums, utilities
│       ├── inventory.py      # Products, Materials, Variants
│       ├── stock.py          # Batches, Stock levels
│       ├── sales_orders.py
│       ├── purchase_orders.py
│       ├── manufacturing.py
│       ├── contacts.py       # Customers, Suppliers
│       ├── webhooks.py
│       └── errors.py

Generation Process¶

The generation script (scripts/generate_pydantic_models.py):

Runs datamodel-codegen with config from pyproject.toml
Parses generated code using Python AST
Groups classes by domain using pattern matching
Fixes datamodel-codegen issues:
MRO (Method Resolution Order) conflicts
String enum defaults (→ enum member references)
Invalid union_mode without discriminators
Writes domain-grouped module files
Generates registry mappings for attrs↔pydantic conversion
Runs ruff format/fix

Conversion Methods¶

from katana_public_api_client.models_pydantic import Product as PydanticProduct
from katana_public_api_client.models import Product as AttrsProduct

# Convert attrs → pydantic
pydantic_product = PydanticProduct.from_attrs(attrs_product)

# Convert pydantic → attrs
attrs_product = pydantic_product.to_attrs()

Consequences¶

Positive Consequences¶

Full Coverage: All 287+ models generated automatically
Type Safety: Pydantic validation, clean Optional[T] types
Immutability: Frozen by default, prevents accidental mutations
JSON Schema: Automatic generation for documentation
MCP Integration: Clean, validated data for LLM contexts
Bidirectional: Easy conversion between attrs and Pydantic layers

Negative Consequences¶

Two Model Layers: Developers must understand attrs (transport) vs Pydantic
Conversion Overhead: Small performance cost for conversion
Generation Complexity: Custom script to fix datamodel-codegen issues
Naming Conflicts: Some generated names are awkward (Status7, CustomField3)

Neutral Consequences¶

Generated Code Unchanged: attrs models remain unmodified
Incremental Adoption: Use Pydantic layer where beneficial
Regeneration Required: When OpenAPI spec changes, regenerate both layers

Implementation Notes¶

Required Dependencies¶

dependencies = [
  "pydantic>=2,<3",
  "email-validator>=2.0.0",  # Required for EmailStr fields
]

datamodel-codegen Configuration (pyproject.toml)¶

[tool.datamodel-codegen]
input = "docs/katana-openapi.yaml"
input-file-type = "openapi"
output-model-type = "pydantic_v2.BaseModel"
use-annotated = true
use-standard-collections = true
field-constraints = true
use-union-operator = true
target-python-version = "3.11"
base-class = "katana_public_api_client.models_pydantic._base.KatanaPydanticBase"

Alternatives Considered¶

Alternative 1: Hand-Crafted Domain Models¶

Description: Manually write Pydantic classes for key entities only
Pros: Clean names, selective coverage, business methods
Cons: High maintenance, incomplete coverage, manual sync required
Why Rejected: Auto-generation provides complete, consistent coverage

Alternative 2: Regenerate with Pydantic Generator Only¶

Description: Replace openapi-python-client with pydantic-based generator
Pros: Single model layer
Cons: Would require rewriting all existing code, loss of httpx patterns
Why Rejected: Too disruptive, attrs layer works well for API transport

Alternative 3: Runtime Conversion Only¶

Description: Convert attrs→dict→Pydantic at runtime
Pros: No generation needed
Cons: No static typing, runtime overhead, no IDE support
Why Rejected: Static types and IDE support are essential