How We Built a Multi-Tenant Database Agent Without Leaking Your Data

April 12, 2026

securityengineeringdatabases

When you let an AI agent run SQL against a user's database, you're one bad abstraction away from a security incident. We spent a lot of time thinking about this while building Ryuk, and some of the decisions we made are worth writing down so other people building similar things can skip the mistakes we nearly made.

The threat model#

Here's what can go wrong when an LLM has access to a database connection:

Cross-tenant data access. User A's query somehow hits User B's database. This is the worst case.
Data modification. The model generates a DROP TABLE or UPDATE, either because it misunderstood the prompt or because someone crafted a prompt to trick it into doing so.
Credential exposure. Connection strings, passwords, or hostnames leak through the API, through logs, or in the model's own responses.
Resource exhaustion. A query runs for 45 minutes, eats all the memory, and takes down the connection pool for everyone.

Each of these needs a different kind of defense. Most of them can't be solved at the application layer alone.

Connection isolation#

Every query in Ryuk runs on its own connection using the credentials that the specific user provided. For community databases, the credentials are ones the admin configured and the user never sees. There's no shared connection pool serving multiple tenants. No connection string switching based on a user ID parameter.

This is expensive. You're opening and closing connections more often than you would with a pool. But the alternative is a shared pool that could serve the wrong tenant's data if you mess up the routing. We'll take the performance hit.

For community databases, the credentials are encrypted on our side and never returned through any API endpoint. Not to users, not to admins, not even to the person who created the connection. It's write-only. If you need to change the password, you delete the database entry and make a new one.

Read-only enforcement#

We enforce read-only access at the PostgreSQL session level:

sql

SET default_transaction_read_only = on;

This is a session-level setting. For the entire lifetime of that connection, PostgreSQL rejects any statement that tries to modify data. If the model generates an INSERT, Postgres throws an error before anything happens.

We could have done this at the application level by parsing SQL and rejecting anything that isn't a SELECT. But SQL parsing is hard to get right. What about SELECT ... INTO? CTEs that contain INSERT? Database-specific syntax? By handing this off to the database engine, we get correct enforcement for free. Including edge cases we haven't considered yet.

Why we don't filter SQL with regex#

Some projects try to make queries safe by running them through a regex or keyword blocklist:

python

if re.search(r'\b(INSERT|UPDATE|DELETE|DROP|ALTER|TRUNCATE)\b', sql, re.IGNORECASE):
    raise Exception("Unsafe query")

This breaks in a bunch of ways:

SELECT * FROM users WHERE action = 'DELETE' gets blocked because the string literal contains "DELETE"
WITH deleted AS (SELECT ...) SELECT * FROM deleted trips on the CTE alias
SELECT "DROP" FROM some_table fails on quoted identifiers
Comments: SELECT /* DROP TABLE users */ * FROM users

You'd need a full SQL parser to handle this correctly, and even then you'd be reimplementing what the database already does. Just use SET default_transaction_read_only = on.

Schema analysis#

When a user connects a database, we run a background job that pulls the schema: tables, columns, types, foreign keys, indexes. This gets cached and fed into the agent's system prompt so it can write accurate SQL without guessing column names.

The hard part is freshness. Schemas change. Someone adds a column, renames a table, creates an index. We re-analyze periodically and after certain events, but there's always a gap where the cache is stale. When that happens, the model writes a query that references a column that doesn't exist anymore. Postgres returns a clear error, the model sees it, and it usually fixes itself on the next try.

That's one of the benefits of giving the model real query execution instead of just SQL generation. It can recover from its own mistakes.

The credential problem#

Database credentials are the most sensitive thing we store. Here's what we do:

Encryption at rest. Passwords are encrypted with Fernet symmetric encryption before they're stored. The key lives in an environment variable, not in the codebase.

Structural exclusion from APIs. Our API response schemas don't have fields for host, port, username, or password. We're not filtering them out before sending the response. The Pydantic model that defines the response shape just doesn't include those fields. You can't accidentally leak what isn't there.

No read-back. Even the admin who created a community database can't pull the credentials back out. The admin panel shows database name, description, tags, status. That's it. There's a "test connection" button that connects and reports success or failure without showing anything sensitive.

This is inconvenient on purpose. If you lose a password, you set a new one at the source and recreate the database entry. That inconvenience removes an entire category of bugs where credentials end up in logs, API responses, or a model's context window.

What the agent can do#

The agent has three tools:

execute_sql runs a read-only query and returns results
get_table_details returns schema info for specific tables (columns, types, constraints)
generate_chart creates a visualization from query results

That's the full list. No tool for modifying data. No tool for managing connections. No tool for touching other users' databases. Small surface area on purpose.

The agent decides which tools to use based on the conversation and the schema in its prompt. It can chain queries: run one to understand the data shape, then another to answer the question. It can also decide a chart would be more useful than a table and make one without being asked.

Mistakes we nearly made#

Shared connection pools. Our first prototype used one pool with tenant routing. Faster, but if we got routing wrong we'd serve the wrong tenant's data. We tore it out.

Application-level SQL filtering. We started with a keyword blocklist. Found edge cases within the first week. Moved to session-level read-only and that was the end of it.

Returning credentials in admin responses. The first version of the admin API sent back the full connection object, encrypted password included. Nobody was going to decrypt it on the frontend, but it was still going over the network, getting captured by middleware, sitting in browser dev tools. We changed the API to never send credentials back.

Telling the model to be safe. Early on we thought about adding "never generate destructive queries" to the system prompt. That's not a security measure. Prompt instructions are suggestions. If someone crafts the right input, the model will ignore them. Real safety has to be enforced by infrastructure, not by asking nicely.

If you're building something like this: don't rely on the application layer for security, don't rely on the model for safety, and don't use regex for SQL parsing. Push constraints as deep as you can. Into the database engine, into the API schema types, into the encryption layer. The closer the enforcement sits to the data, the harder it is to get around.