rembrembdocs

Machine suspend lets you pause a running Fly Machine and save its complete state, including memory, to persistent storage. When resumed, the machine picks up exactly where it left off, without rebooting the OS or restarting your app. That can make startup take just hundreds of milliseconds instead of multiple seconds.

You can think of suspend as what a laptop does when you close the lid, except your “laptop” is a microVM running in, say, dfw or fra or syd.

How it works

Suspend uses Firecracker snapshots to capture the entire VM state: CPU registers, memory contents, open file handles. When you start a suspended machine, Fly restores from this snapshot instead of cold booting.

Typical performance:


Using Suspend

Manually

# Suspend a machine
fly machine suspend <machine-id>

# Check status (running, suspending, suspended, etc.)
fly machine status <machine-id>

# Resume from snapshot
fly machine start <machine-id>

# Force a cold start (discard snapshot)
fly machine stop <machine-id>
fly machine start <machine-id>

Automatically via Fly Proxy

Configure in fly.toml:

[http_service]
  auto_stop_machines = "suspend"  # or "stop"
  auto_start_machines = true

  [[http_service.concurrency]]
    type = "requests"
    soft_limit = 25

The proxy will automatically suspend machines during low traffic, checking for idle periods every few minutes, and resume them when requests arrive.

Machines API

# Suspend
POST /v1/apps/{app_name}/machines/{machine_id}/suspend

# Wait for suspension to complete
GET /v1/apps/{app_name}/machines/{machine_id}/wait?state=suspended

# Resume (standard start endpoint)
POST /v1/apps/{app_name}/machines/{machine_id}/start

Generally, you need an API token to use the Machines API. But if you’re just suspending your own machine, you can skip the token and hit the /.fly/api Unix socket directly:

$ curl --unix-socket /.fly/api -X POST \
  http://flaps/v1/apps/$FLY_APP_NAME/machines/$FLY_MACHINE_ID/suspend

Requirements

A machine can use suspend if it has:

If you have an older machine, or you’re not sure when it was last updated, you can bring it up to date with:

fly machine update <machine-id> --yes 

This updates the machine in place to the latest supported configuration for suspend, without changing your app code or image.


Limitations and considerations

Always design for both resume and cold start paths.


Snapshot behavior with suspend

Snapshots are tied to the exact code and state of the machine they were taken from. If you deploy new code, the old snapshot can’t be resumed safely and will be discarded.

Snapshots aren’t guaranteed to persist. Cold starts may happen if:


Handling Network Connections After Resume

On resume, the machine thinks its network connections are still live. External systems (databases, APIs) may disagree.

Common symptoms:

Fix: Reconnect on failure.

Example (Python + DB):

try:
    result = db.execute(query)
except (ConnectionError, OperationalError):
    db.reconnect()
    result = db.execute(query)

Tips:


Billing

Suspended machines cost the same as stopped machines: storage only. There are no CPU/RAM charges.


Monitoring & Debugging

fly machine status <machine-id>

States:

If machines cold start unexpectedly:

Test cold start:

fly machine stop <machine-id>
fly machine start <machine-id>

Availability

Suspend works in all Fly.io regions as of July 2024.


Related reading: