rembrembdocs

Fly Proxy routes requests to individual Machines in your apps using a combination of concurrency settings specified on your app, current load, and closeness. This page describes the details of load balancing traffic to your apps on Fly.io.

Load balancing strategy

The basic load balancing strategy is:

Load

Fly Proxy determines load using the concurrency settings configured for an app and the current traffic relative to those settings.

The table below describes how traffic may or may not be routed to a Machine based on configured soft_limit and hard_limit values.

Machine loadWhat happens
Above hard_limitNo new traffic will be sent to the Machine
At or above soft_limit, below hard_limitTraffic will only be sent to this Machine if all other Machines are also above their soft_limit
Below soft_limitTraffic will be sent to the Machine when it is the closest Machine that is under soft_limit

Cross-region routing only happens when all Machines in the local region are unhealthy or at their hard_limit.

Closeness

Closeness is determined by RTT (round-trip time) between the Fly.io edge server receiving a connection or request, and the worker server where your Machine runs. Even within the same region, we use different datacenters with different RTTs. These RTTs are measured constantly between all servers.

You can observe live RTT values between Fly.io regions using our RTT app.

Example of load balancing for a web service

We have a hypothetical web service that we know can handle 25 concurrent requests with the configured CPU and memory settings. We set the following values in our fly.toml:

  [services.concurrency]
    type = "requests"
    hard_limit = 25
    soft_limit = 20

We set type = "requests" so Fly.io will use concurrent HTTP requests to determine when to adjust load. We prefer this to type = "connections", because our web service does work for each request and our users may make multiple requests over a single connection (e.g., with HTTP/2). Fly Proxy will also pool connections to a Machine for a short time (about 4 seconds) when using type = "requests" to avoid frequent opening and closing of connections to your app.

We set the soft_limit to 20, so Fly Proxy has some headroom to prefer less-loaded Machines within the same region before distributing traffic more evenly. Soft limits only affect routing within a region. They do not cause the proxy to shift traffic to other regions.

We deploy 10 Machines in four regions: ams (Amsterdam), bom (Mumbai), sea (Seattle), and sin (Singapore), with three of those in ams.

In this contrived example, all of the users are currently in Amsterdam, so the traffic is arriving at one of the Fly.io edges in Amsterdam. Here’s what happens as the number of concurrent HTTP requests from users in Amsterdam increases:

If traffic is far above the hard_limit for a long period of time, Fly Proxy might start returning 503 Service Unavailable responses for requests that are not able to be routed to a Machine.