How to Design a Good SDK

What an SDK is for

An SDK — software development kit, but really just "client library" — wraps an API in idiomatic code for a specific language. It exists because using a raw HTTP client has too much friction for most developers: they don't want to construct URLs, encode bodies, parse responses, handle retries, or manage authentication tokens. They want to write client.orders.cancel(orderId) and have it do the right thing.

The implicit contract: an SDK should make the easy cases easy and the hard cases possible. The two failure modes — making easy cases hard (over-abstracted, ceremonious) or making hard cases impossible (everything is hidden behind too-clever defaults) — are equally common.

Auto-generated vs hand-written

Two camps:

Auto-generated from an OpenAPI spec. Pros: stays in sync with the API definition automatically; cheap to maintain across many languages; consistent. Cons: rarely idiomatic in any specific language; reflects the API's wire shape rather than what's natural in the target language; tends to produce ceremonious code.
Hand-written. Pros: feels native to the language; can hide complexity that the wire format exposes; can shape ergonomics around what users actually do. Cons: expensive to maintain; drifts out of sync with the API unless tests catch it; one SDK per language is a real engineering investment.

The hybrid that has emerged: auto-generated low-level transport code (HTTP plumbing, schema validation), hand-written ergonomic surface on top. The generated layer keeps wire compatibility; the hand-written layer keeps it pleasant to use.

Design principles that age well

Be idiomatic in the host language

The same SDK in Python, Go, and TypeScript should not look the same. Python uses snake_case methods, exceptions for errors, context managers for resource cleanup. Go uses CamelCase, multi-return values for errors, defer for cleanup. TypeScript uses camelCase methods, Promises for async, discriminated unions for results. An SDK that translates literally from one language idiom to another feels like a translation, not native code. The bar: a developer reading the SDK code should not be able to tell that it wraps an HTTP API.

Hide the wire format

Users should not see HTTP status codes, response envelopes, or pagination cursors unless they explicitly opt in. The SDK takes a logical operation, returns a logical result; the wire-level details are an implementation concern. The exception is for users who genuinely need them — for which provide an "advanced" or "raw" mode that exposes the underlying request and response.

Return the right thing

For a single resource lookup: return the resource, or raise/return a not-found error. Don't return a wrapper object that the user has to unwrap. For a list: return the list, with helpers for iteration that handle pagination internally. For a write that creates: return the created resource. The user shouldn't have to know that creating returned a 201 with a Location header.

Make pagination invisible by default

The naive SDK returns a single page and forces users to figure out the next call. The thoughtful SDK returns an iterator (or async iterator) that fetches pages on demand:

for order in client.orders.list(status="open"):
    process(order)

The iteration handles pagination, retries, and rate limiting under the hood. Users who need explicit page control can opt in via a different method or parameter, but the default should be the obvious shape.

Async by default in async languages

If the language has first-class async (Python's async def, JavaScript's Promises, Kotlin's coroutines), the SDK should be async natively. Adding a sync wrapper on top is straightforward; trying to add async on top of a sync SDK is painful and usually leaks the underlying thread model.

Authentication ergonomics

The user shouldn't think about tokens. Patterns that work:

Construct the client with credentials once. All subsequent calls use them; the user never attaches a token manually.
Read from environment by default. If API_KEY environment variable is set, use it. Override at construction. This makes the "hello world" trivial: client = Client() and you're done.
Refresh transparently. If the SDK uses tokens with expiry, refresh them in the background. The user shouldn't write retry-on-401 logic.
Make the credential type swappable. Let users construct the client with an API key, an OAuth token, or a custom credential provider. Locking to one mechanism makes the SDK painful for users who need a different one.

Error handling

Errors are where SDKs distinguish themselves. Three rules:

Errors are typed, not strings

Each kind of error gets its own type (in languages with error hierarchies) or its own discriminated value (in languages with sum types). Users branch on the type, not on string matching. The minimum useful taxonomy:

Client errors (the user did something wrong): bad arguments, validation failures, not found, conflict.
Auth errors: missing credentials, invalid credentials, insufficient permissions.
Rate-limit errors: with the retry-after time exposed as a field.
Server errors (the API failed): typically wrapped with the original status and message.
Network errors (the call didn't reach the API): timeouts, DNS failures, connection resets.

Errors carry diagnostic information

Each error should carry the request ID (from the API response), the underlying status code, the original error message, and any structured error details the API returned. The user should be able to log the error and have everything they need to file a support ticket — without modifying their code.

Don't swallow errors

The cardinal SDK sin: catching exceptions and returning null, or logging the error and returning a default value. The user has no way to know something went wrong. If the SDK can't recover from an error, it must propagate it.

Retries and resilience

The SDK should handle the cases that don't require user judgment: retrying on transient network errors and 5xx responses, respecting Retry-After on 429s, applying exponential backoff with jitter. The defaults should match what most users would write themselves; advanced users override them.

What the SDK should not do automatically:

Retry POST without an idempotency key. Doing so silently duplicates side effects. Either require the user to provide one (preferred), or generate one automatically and document the behaviour clearly.
Retry forever. Cap retries at some sensible number; surface the failure if exceeded.
Hide a circuit-breaker decision inside the SDK. If the SDK is going to stop calling the API for a while, the user needs to know.

The patterns behind these decisions — backoff, idempotency, circuit breakers — are covered in the integration guide and the deeper articles on idempotency and rate limiting.

Logging and observability

The SDK shouldn't log anything by default — surprise log output in someone else's application is hostile. But it should make logging easy for users who want it: a hook to pass in a logger, configurable verbosity, and structured log records with the request method, URL, status, and latency.

For metrics and tracing, the SDK should integrate with the language's standard observability tools (OpenTelemetry, Prometheus client libraries) or expose enough hooks for the user to instrument the calls themselves.

Backwards compatibility

The SDK is a contract too. Once a method signature is published, removing or changing it breaks every user. The discipline:

Add fields and methods freely. New optional parameters, new methods, new fields on response types — all non-breaking if the language treats them that way.
Deprecate before removing. Mark old methods deprecated for at least one major version before removing. Most languages have a way to emit warnings on use.
Use semantic versioning seriously. Patch for bug fixes, minor for additive changes, major for breaking changes. Users decide their tolerance for upgrades based on the version number; lying about version makes them stop upgrading.
Don't follow the API's internal versioning. The SDK can pin to a specific API version internally and present a stable surface. Users want to upgrade the SDK without thinking about API versions.

Documentation that pulls its weight

Three layers of documentation, each catching different users:

A getting-started guide. Five minutes from install to first successful call. No assumed context. This is what 80% of new users read; if it's bad, they don't come back.
API reference. One entry per public method, with parameters, return type, errors, and a code example. Generated from the source — code and docs together stay in sync, separate they drift.
Cookbook / patterns. Common multi-step tasks with worked examples. "How do I paginate a search query and store the results in a database?" — that's a cookbook entry, not an API reference question.

What gets skipped at most cost: a clear "errors" section. Users hit errors on day one; if the documentation doesn't help them interpret what went wrong, they file support tickets you don't want to handle.

Common mistakes

Translating the wire format too literally. The API's response shape is for the API, not for the user. The SDK gets to choose what the user sees.
One blocking call hiding a retry storm. A "single SDK call" that internally retries five times with backoff is a 30-second operation. Users need to know this; expose it as a parameter or document it loudly.
Auto-retrying writes without idempotency. Silent duplicate operations are the worst kind of SDK bug — the user's code looks correct, the data is wrong.
Heavy global state. Singletons, module-level configuration, hidden HTTP clients. They make the SDK hard to test, hard to use in multi-tenant code, and surprising in concurrent code.
One giant client class. Group related operations into namespaces (client.orders.cancel, client.payments.create) rather than dumping every method on the top-level client.
Forgetting connection reuse. Each SDK instance should reuse a connection pool. Constructing a new HTTP client per request is the most common SDK performance bug.
Logging credentials. Sometimes via the URL, sometimes via headers, sometimes via a debug mode. Audit what gets logged at every verbosity level.
No tests against the real API. Mock-based tests verify that the SDK matches its own assumptions, not that the API behaves as documented.

Where to go next

For consumer-side patterns the SDK supports, see the integration guide. For the underlying mechanisms — idempotency, rate limiting, pagination, error handling — see the deep-dives. For the broader API design context that constrains what an SDK can do, see API design best practices.