Draft: This documentation is currently a work in progress and subject to change.
Quick Navigation: - For complete endpoint implementations, see API
Specifications - For code examples and patterns, see
Data Ownership - For authentication and security, see
Auth
Table of Contents
- API Response Contract
- Field Design Rules
- Implementation Reference
- Naming Conventions
- Action Endpoints
- Bulk Operations
- Error Handling
- Audit Logging
- Multi-Tenancy Patterns
API Response Contract
All API responses use a consistent discriminated union envelope structure with asuccess boolean discriminator.
Type Definition
Base Meta Object
Themeta object is always present and contains request tracking information:
Design Decision:
requestId and timestamp are always included because
they provide negligible overhead while being critical for distributed
debugging and log correlation across services.Extension Meta Types
Success responses can extend meta with additional context:Error Response Structure
Response Examples
Meta Object Design Decisions
| Field | Decision | Rationale |
|---|---|---|
requestId | Always include | Essential for log correlation across services |
timestamp | Always include | Negligible overhead; critical for distributed debugging |
apiVersion | Omit | URL path (/v1/) is the version; redundant in response |
rateLimit | Add later | When rate limiting is implemented |
Implementation Reference
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Framework | Next.js 16+ (App Router) | Server components, API routes |
| API Layer | Hono | Lightweight, type-safe API routes |
| Validation | Zod | Runtime validation, schema definitions |
| Database | Drizzle ORM | Type-safe queries, migrations |
| Auth | BetterAuth | Session management, OAuth |
| Workflows | Trigger.dev | Durable task execution |
Zod Schema Definitions
Define all request/response schemas using Zod for validation and type inference:Pattern: Workflows by Default. All operations that interact with
infrastructure (BMC, K8s) are executed as Trigger.dev workflows and return
immediately with a workflow run ID. Clients poll workflow status via the
Workflows API or receive webhooks.
Field Design Rules
Design fields for extension from day one. The cost of refactoring primitive fields into objects later is high and often requires breaking changes.Principle: Objects Over Primitives. Always wrap values that might grow
into structured objects. It’s better to have nested objects early than to
break APIs later when you need to add context.
Use Objects Over Primitives
Wrap values that might grow into objects immediately.Never Use Booleans for State
States often grow beyond two values.Use IDs with Optional Expansion
Don’t embed full objects. Use IDs and let clients request expansion.GET /v1/compute/servers/srv_123
GET /v1/compute/servers/srv_123?expand=cluster
API Versioning
All API endpoints include version in the URL path:/v1/...
Domain-based routing separates Atlas and Arc APIs:
Version Format
Version Policy
- Major versions (
v1,v2) for breaking changes - No minor versions in URL - use feature flags and deprecation warnings instead
- Deprecation timeline: 6 months notice before removing deprecated endpoints
- Version in URL, not response: The
apiVersionfield is omitted from responses because the URL path is the source of truth
When deprecating fields or endpoints, include warnings in the
meta.warnings
array with sunset dates and migration guides.Deprecation Example
Naming Conventions
Consistent naming across URLs, fields, and resources improves developer experience and reduces confusion.URL Paths
Hono uses colon-prefixed route parameters.| Rule | Example |
|---|---|
| Route params with colon | /v1/clusters/:clusterId |
| Lowercase, hyphenated | /v1/ai-services |
| Plural nouns for collections | /servers, /clusters |
| Singular for singletons | /me, /health |
NOTE: Lowercase, hyphenated is only for url paths and not the same for the database, response, request body, and other contexts.
Resource IDs
Resources (clusters, servers, organizations, etc.) get globally unique, opaque IDs that do NOT contain region information. This decouples resource identity from physical location. Format:{prefix}_{base62}
| Resource | Prefix | Example |
|---|---|---|
| Organization | org_ | org_8TcVx2WkZddNmK3Pt9JwX7BzWrLM |
| Server | srv_ | srv_3KpQm9WnXccFjH2Ls8DkT6VzRqYU |
| Cluster | cls_ | cls_6NZtkvWLBbbmHfPi7L6oz7KZpqET |
| Stack | stk_ | stk_5MfRp4WjYbbHmG8Nt2LvS9CxPqZK |
| Workflow Run | run_ | run_7NhTq6WlAbbKmF5Rt3MxU8DzSqWJ |
| Pool | poo_ | poo_2LgPn8WmXccGjE7Mt4KwV9BySrTL |
| Allocation | all_ | all_9QjSr3WnZddMmH6Pt5LxW2CzUrYK |
| API Key | key_ | key_4KfQm7WkYccJmG3Nt8MvX9BzSqWL |
| Event | evt_ | evt_6MgRp2WlXbbKmF9Rt5NxU3DzTqZJ |
org_system is reserved for platform-level admin operations. TBD if this is needed still. Originally it was for something else.
Field Names
| Rule | Example |
|---|---|
| camelCase | createdAt, nodeCount |
Suffix IDs with Id | clusterId, organizationId |
| Use past tense for timestamps | createdAt, updatedAt, deletedAt |
Pagination
Use cursor-based pagination for real-time data, offset/limit for stable datasets.Query Parameters
Response
Filtering and Sorting
Query String Format
Use consistent query parameter patterns for filtering and sorting:| Parameter | Format | Example |
|---|---|---|
| Filter | field=value | status=available |
| Multiple values | field=val1,val2 | status=available,provisioning |
| Sort ascending | sort=field | sort=name |
| Sort descending | sort=-field | sort=-createdAt |
| Multiple sorts | sort=field1,-field2 | sort=status,-createdAt |
Implementation with Zod
Action Endpoints
For operations beyond CRUD, use a unified action endpoint with POST method. Actions represent commands that change resource state asynchronously.Design Principles
- Unified Endpoint: Single
/actionsendpoint handles all action types (power, provision, deprovision, inspect, maintenance) - Type-Safe Parameters: Each action type has its own request schema with action-specific options
- Async by Default: Actions return
202 Acceptedwith workflow/operation IDs for tracking - Audit Trail: Logs show “POST /actions with type=power action=off” for clear tracking
- Granular Permissions: Easy to scope permissions like
servers:lifecyclevsservers:update
Endpoint Pattern
Action Request Schema
Implementation Example
Bulk Operations
Bulk operations allow applying actions to multiple resources simultaneously. All bulk actions use partial success semantics - individual resource failures do not fail the entire bulk operation.Design Principles
- Partial Success: Individual failures don’t abort the entire bulk operation
- Explicit IDs: Use explicit ID lists for predictability and safety
- Per-Resource Results: Response includes success/failure status for each resource
- 207 Multi-Status: Always return 207 to indicate mixed results possible
- Dedicated Endpoints: Use
/bulkpattern for consistency
Endpoint Pattern
Request Schema
Response Structure
Implementation Example
Safety Features
Dry-Run Mode
Preview which resources would be affected without executing:Rate Limiting
Bulk operations are throttled to prevent resource overload. Default: 10 requests/min.Implementation Reference
Error Handling
Typed Error Classes
Define semantic error types for consistent error responses:lib/errors.ts
Global Error Handler
middleware/error-handler.ts
Usage in Routes
Audit Logging
SOC 2 compliant audit logging for all API requests. Every significant action must be traceable to a user and timestamp.What to Log
| Event Type | Log? | Rationale |
|---|---|---|
| All mutations (POST/PUT/PATCH/DELETE) | ✅ Always | Core audit trail |
| Failed authentication (401) | ✅ Always | Security monitoring |
| Failed authorization (403) | ✅ Always | Access control audit |
| Server errors (5xx) | ✅ Always | Incident response |
| Reads on sensitive resources | ✅ Always | Compliance (see below) |
| General reads (GET) | ⚠️ Optional | High volume; enable for debugging |
| Health/metrics endpoints | ❌ Never | Noise |
For multi-tenant security architecture and authorization patterns, see Auth
Architecture.
Sensitive Entities Requiring Audit Logs
These entities require audit logging on all operations, including reads:| Entity | Why Sensitive | Example Events |
|---|---|---|
| API Keys | Credential access | api_key.created, api_key.viewed, api_key.revoked |
| BMC Credentials | Infrastructure access | bmc_credential.created, bmc_credential.accessed |
| Cluster Credentials | Kubeconfig access | cluster_credential.downloaded |
| SSH Keys | Server access | ssh_key.created, ssh_key.deleted |
| Secrets | User-managed secrets | secret.created, secret.accessed, secret.deleted |
| Organization Members | Access control | member.invited, member.role_changed, member.removed |
| Billing/Payment | Financial data | payment_method.added, invoice.viewed |
Audit Event Schema
Audit Event Naming Convention
Use past-tense, dot-namespaced actions:Multi-Tenancy Patterns
Row-Level Security (RLS)
Use PostgreSQL RLS for defense-in-depth isolation:Setting Context Per Request
middleware/rls-context.ts
Critical: RLS context is set per-transaction. For connection pooling,
always set context at the start of each request. Drizzle’s
transaction()
helper ensures this.Decision Log
Response Envelope Pattern
| Decision | Rationale | Trade-off |
|---|---|---|
Discriminated union with success: boolean over separate success/error types | • TypeScript discriminated unions provide excellent type narrowing • Client code: if (response.success) gets correct types• Consistent structure across all endpoints • Easier to generate TypeScript clients | Slightly more verbose than HTTP-only error signaling. Type safety worth it. |
Resource ID Format
| Decision | Rationale | Trade-off |
|---|---|---|
Prefixed nanoid (srv_abc123, cls_xyz789) over UUIDs or numeric IDs | • Human-readable in logs • Immediately identify resource type • URL-safe • Short enough for display • Low collision probability | Slightly longer than pure nanoid. Worth it for debugging and log correlation. |
Action Endpoints
| Decision | Rationale | Trade-off |
|---|---|---|
Dedicated POST endpoints (/power, /provision) over overloading PATCH | • Semantic clarity: POST /power action=reboot clearer than PATCH { online: true }• Action-specific parameters (e.g., force, imageUrl)• Better audit trail: “POST /power action=off” vs “PATCH with field changes” • Granular permissions: servers:lifecycle vs servers:update | Slightly more endpoints. Worth it for clarity and permissions. |
Bulk Operation Responses
| Decision | Rationale | Trade-off |
|---|---|---|
| Always 207 Multi-Status with per-resource results (not 200 OK with mixed results or fail-entire-operation) | • Partial success is common in bulk operations • Client needs to know which specific resources succeeded/failed • Failing entire operation for one resource is poor UX • 207 status code semantically correct for mixed outcomes | None significant. Standard practice for bulk operations. |
Async Operation Default
| Decision | Rationale | Trade-off |
|---|---|---|
| Return 202 immediately (not synchronous with long timeouts) | • Infrastructure operations take 30s to 30min • Prevents HTTP timeouts and connection issues • Allows UI to show progress • Supports horizontal scaling (request and execution on different instances) • Better observability via workflow tracking | Requires more client code. Mitigated by SDKs and clear polling patterns. |
Testing
Test Structure
All API endpoints should have integration tests covering:- Happy path: Successful requests with expected responses
- Validation: Invalid inputs return appropriate errors
- Authorization: Unauthorized users receive 403
- State transitions: Invalid state transitions are rejected
- Edge cases: Empty lists, missing resources, etc.
Test Helpers
tests/helpers/api.ts
Example Test
Related Documentation
- Specification - Complete API specification with detailed endpoint examples
- Auth Architecture - Authentication, authorization, and multi-tenant security patterns
- Data Ownership - Implementation patterns, workflow orchestration, and development guidance
- Data Model - Database schema and relationships