rembrembdocs

You can get an ollama image running on a GPU in a few minutes. Get started by adapting the following fly.toml file:

app = '<your-app>'
primary_region = 'ams'

[build]
  image = 'ollama/ollama'

[[mounts]]
  source = 'models'
  destination = '/root/.ollama'
  initial_size = '10gb'

[http_service]
  internal_port = 11434
  force_https = false
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

[[vm]]
  size = 'a100-80gb'

Modify the app name and region to suit your preferences and save the file as fly.ollama.toml. Then you can launch the app:

fly launch -c fly.ollama.toml --flycast

There are a couple of things to note here:

Finally, this only starts the ollama server; at this point you cannot interact with any models yet. To do so, you will have to pull in a model with this one easy, short, intuitive command:

fly m run -e OLLAMA_HOST=http://<your-app>.flycast --shell --command "ollama pull llama3.1" ollama/ollama

This command will pull in the llama3.1 model. You can change the model name to suit your needs. At this point this model is now available to the internal network of the organization it is deployed in. You can access it using Flycast from this URL: http://<your-app>.flycast.

Now that we have a functioning ollama with a model, we have to expose the ollama host to our app. One way to do this is to set the host as a secret:

fly secrets set OLLAMA_HOST=http://<your-app>.flycast

To interact with our new AI friend, we will have to install the ollama package:

Now we can initialize the client:

import os
from ollama import AsyncClient

OLLAMA_HOST = os.getenv('OLLAMA_HOST')
ollama_client = AsyncClient(OLLAMA_HOST)

From here we can start integrating it into our app:

@app.get("/")
async def read_root():
    resp = await ollama_client.generate(
        model="llama3.1",
        prompt="Why is the sky not green?",
    )
    return resp["response"]

When you re-deploy your app you should see llama’s answer:

You can check out this gist for the complete example app.