Back to Blog
System DesignPostgreSQLFastAPIScalabilityInfrastructure

Proactive Scaling: Designing Signale's Architecture for 1,000 Concurrent Users

Scaling backend APIs for high concurrency: A deep dive into connection pooling, sidecar patterns, and the nuance of prepared statements in Python.

Most engineering teams discover connection pooling the hard way: at 3 AM, when their database crashes under load.

When designing Signale (a WhatsApp memory assistant built for a client), we knew we were targeting a viral vector. WhatsApp deployments don't scale linearly; they scale exponentially. A single mention in a group chat can drive hundreds of users to the bot in seconds.

We chose FastAPI for its high-concurrency async capabilities, but we also identified a critical mismatch in the ecosystem: The Async/Sync Impedance Mismatch.

This post details the architectural decisions that allowed Signale to handle over 1,000 concurrent users during its public beta without a single database timeout.

The Architectural Challenge

The core problem is simple physics:

  1. FastAPI is designed to be lightweight. A single container can easily handle thousands of concurrent async requests.
  2. PostgreSQL is process-heavy. For every connection, it forks a new OS process, consuming ~10MB of RAM.

If you connect FastAPI directly to Postgres, you create a DDoS cannon aimed at your own database. When traffic spikes, FastAPI accepts 500 requests instantly, tries to open 500 DB connections, and Postgres collapses under the context switching load.

We didn't wait for this to happen. We designed a Constraint-Driven Architecture to prevent it.

The Solution: The Sidecar Pattern

We decoupled "Application Concurrency" from "Database Load" by introducing PgBouncer as a mandatory infrastructure component from Day 1.

We utilized the Docker Sidecar Pattern, placing a lightweight PgBouncer container in front of the database. This acts as a funnel:

[FastAPI 1] --(50)--> \
                       \
[FastAPI 2] --(50)--> [ PgBouncer ] --(20)--> [(PostgreSQL)]
                       /
[FastAPI 3] --(50)--> /

Instead of 150 connections hitting Postgres, only 20 do. The remaining application connections wait in a lightweight queue managed by PgBouncer (which uses libevent and costs almost zero RAM).

Technical Deep Dive: The asyncpg Nuance

However, this introduces a new layer of complexity. You can't just drop PgBouncer in and expect it to work with modern Python async drivers.

We use Transaction Mode Pooling (pool_mode=transaction). This is the most aggressive pooling mode, where a server connection is released back to the pool as soon as a transaction commits, not when the client disconnects. This allows 1,000 clients to be served by ~20 actual connections.

But there's a catch.

The asyncpg driver (standard for FastAPI) relies heavily on Prepared Statements for performance. It creates a prepared statement on a specific connection and expects it to be there for the next query. In Transaction Mode, the "next query" might run on a completely different connection where that statement doesn't exist.

Result: ERROR: prepared statement "stmt_1" does not exist.

The Fix

We configured the SQLAlchemy engine to explicitly disable statement caching at the driver level, forcing asyncpg to use the simple query protocol compatible with aggressive pooling.

# app/database/session.py
 
engine = create_async_engine(
    settings.DATABASE_URL,
    pool_size=20,
    max_overflow=10,
    connect_args={
        # CRITICAL: Disable prepared statements for PgBouncer compatibility
        "prepared_statement_cache_size": 0, 
        "statement_cache_size": 0,
    }
)

This is a specific trade-off: we sacrifice a few microseconds of query parsing time to gain massive horizontal scalability.

The Result: Boredom at Scale

When we opened the public beta, we saw the traffic spike we anticipated.

  • Concurrent Users: ~1,200
  • FastAPI Latency: Stable at ~120ms
  • Database CPU: < 15%
  • Active DB Connections: Flatline at 20

Scale isn't just about handling traffic; it's about handling it quietly. By identifying constraints early - specifically the mismatch between request concurrency and database process limits - we ensured that Signale's public beta test was a technical non-event.

Pedro

Founder & Principal Engineer