rembrembdocs

Get started with monitoring Redis Software

Collect and visualize Redis Software metrics.

You can use Prometheus and compatible integrations to collect and visualize your Redis Software metrics.

Metrics are exposed at the cluster, node, database, shard, and proxy levels.

You can use Prometheus integrations to:

Prometheus integrations

You can integrate Redis Software with Prometheus and one of the following tools to collect and visualize your deployment's metrics:

Best practices for monitoring

Follow these best practices when monitoring your Redis Software cluster using the metrics stream engine.

Monitor host-level metrics

For cluster health, resources, and node stability, monitor these metrics:

Group

Metric

Why monitor

Unit

CPU utilization

node_cpu_user,
node_cpu_system

Detect CPU saturation from Redis or the OS that results in higher latency and queueing.

Seconds (counter)

Memory (freeable)

node_memory_MemTotal_bytes,
node_memory_MemFree_bytes,
node_memory_Buffers_bytes,
node_memory_Cached_bytes

Detect memory pressure early. Low free memory or cache can precede swapping or out-of-memory errors.

Bytes (gauge)

Swap usage

node_ephemeral_storage_free

Monitor memory and disk pressure in your setup. Sustained pressure leads to latency spikes.

Bytes (gauge)

Network traffic

node_ingress_bytes,
node_egress_bytes

Ensure the network interface is not saturated. Protects replication and client responsiveness.

Bytes (counter)

Disk space

node_filesystem_avail_bytes,
node_filesystem_size_bytes

Prevent persistence and logging outages from low disk space.

Bytes (gauge)

Cluster state

has_quorum{…}

Monitor whether quorum is maintained (1) or lost (0).

Boolean

node_metrics_up

Monitor whether the node is connected and reporting to the cluster.

Gauge

Licensing

license_shards_limit

Track shard capacity limits by type (RAM or flash).

Count

Certificates

node_cert_expires_in_seconds

Avoid downtime from expired node certificates.

Seconds (gauge)

Services – CPU

namedprocess_namegroup_cpu_seconds_total

Identify abnormal CPU usage by platform services that can starve Redis, such as alert_mgr, redis_mgr, dmc_proxy.

Seconds (counter)

Services – memory

namedprocess_namegroup_memory_bytes

Detect memory leaks or outliers in platform services, such as alert_mgr, redis_mgr, dmc_proxy.

Bytes (gauge)

Monitor database-level metrics

For database performance, availability, and efficiency, monitor the following metrics:

Group

Metric

Why monitor

Unit

Memory

redis_server_used_memory

Track actual data memory to prevent out-of-memory errors and evictions.

Bytes

Memory

redis_server_allocator_allocated

Monitor bytes allocated by allocator (includes internal fragmentation).

Bytes

Memory

redis_server_allocator_active

Monitor bytes in active pages (includes external fragmentation). Use delta/ratio versus allocated to infer defraggable memory.

Bytes

Memory

redis_server_active_defrag_running

Monitor if defragmentation is active and the intended CPU %. High values can affect performance.

% (gauge)

Latency

endpoint_read_requests_latency_histogram,
endpoint_write_requests_latency_histogram,
endpoint_other_requests_latency_histogram

Monitor server-side command latency.

Microseconds

High availability

redis_server_master_repl_offset

Compute replica throughput and lag using deltas over time.

Bytes (counter)

High availability

redis_server_master_link_status

Monitor replica link status (up or down) for early warning of high availability risk.

Status

Active-Active

database_syncer_dst_lag,
database_syncer_lag_ms

Detect cross-region synchronization delays that impact consistency and SLAs.

Milliseconds (gauge)

Active-Active

database_syncer_state

Monitor operational state for troubleshooting synchronization issues.

Gauge

Traffic – requests

endpoint_read_requests,
endpoint_write_requests,
endpoint_other_requests

Monitor workload mix and spikes that drive capacity and latency. Total equals the sum of all three.

Counter

Traffic – responses

endpoint_read_responses,
endpoint_write_responses,
endpoint_other_responses

Validate service responsiveness and symmetry with requests.

Counter

Traffic – bytes

endpoint_ingress,
endpoint_egress

Monitor size trends and watch for sudden growth that impacts egress costs or bandwidth.

Bytes (counter)

Egress queue

endpoint_egress_pending,
endpoint_egress_pending_discarded

Monitor back-pressure and drops that indicate network or client issues.

Bytes (counter)

Connections

endpoint_client_connection

Monitor accepted connections over time and match against client rollouts or spikes.

Counter

Connections

endpoint_client_connection_expired

Monitor connections closed due to TTL expiry, which can indicate idle policy or client issues.

Counter

Connections

endpoint_longest_pipeline_histogram

Monitor long pipelines that can amplify latency bursts and detect misbehaving clients.

Histogram (count)

Connections

endpoint_client_connections,
endpoint_client_disconnections,
endpoint_proxy_disconnections

Monitor connection churn and identify who closed the socket (client versus proxy). Current connections ≈ connections − disconnections.

Counter

Cache efficiency

redis_server_db_keys,
redis_server_db_avg_ttl

Monitor key inventory and TTL coverage to inform eviction strategy.

Counter

Cache efficiency

redis_server_evicted_keys ,
redis_server_expired_keys

Monitor eviction and expiry rates. Frequent evictions indicate memory pressure or poor sizing.

Counter

Cache efficiency

cache_hits,
cache_hit_rate

Monitor hit rate, which drives read latency and cost. Cache hit rate equals cache_hits/(cache_hits+cache_misses).

Count / Ratio (%)

Cache efficiency

endpoint_client_tracking_on_requests,
endpoint_client_tracking_off_requests,
endpoint_disposed_commands_after_client_caching

Track client-side caching usage and misuse.

Counter

Big / complex keys

redis_server_<data_type>_<size_or_items>_<bucket>

Monitor oversized keys and cardinality that cause fragmentation, slow replication, and CPU spikes. Track to prevent incidents. Examples:
strings_sizes_over_512M,
zsets_items_over_8M

Gauge

Security – clients

endpoint_client_expiration_refresh,
endpoint_client_establishment_failures

Monitor unstable clients or problems with authentication or setup.

Counter

Security – LDAP

endpoint_successful_ldap_authentication,
endpoint_failed_ldap_authentication,
endpoint_disconnected_ldap_client

Monitor authentication health and detect brute-force attacks or misconfigurations.

Counter

Security – cert-based

endpoint_successful_cba_authentication,
endpoint_failed_cba_authentication,
endpoint_disconnected_cba_client

Monitor certificate authentication status and failures.

Counter

Security – password

endpoint_disconnected_user_password_client

Monitor password-authentication client disconnects and correlate with policy changes.

Counter

Security – ACL

redis_server_acl_access_denied_auth,
redis_server_acl_access_denied_cmd,
redis_server_acl_access_denied_key,
redis_server_acl_access_denied_channel

Monitor unauthorized access attempts and incorrectly scoped ACLs.

Counter

Configuration

db_config

This is an information metric that holds database configuration within labels such as: db_name, db_version, db_port, tls_mode.

counter

Prometheus quick start

To get started with Prometheus:

  1. Create a directory called prometheus on your local machine.

  2. Within that directory, create a configuration file called prometheus.yml.

  3. Add the following contents to the configuration file and replace <cluster_name> with your Redis Software cluster's FQDN:

    v2 (metrics stream engine) v1

    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    # Attach these labels to any time series or alerts when communicating with
    # external systems (federation, remote storage, Alertmanager).
      external_labels:
        monitor: "prometheus-stack-monitor"
    
    # Load and evaluate rules in this file every 'evaluation_interval' seconds.
    #rule_files:
    # - "first.rules"
    # - "second.rules"
    
    scrape_configs:
    # scrape Prometheus itself
      - job_name: prometheus
        scrape_interval: 10s
        scrape_timeout: 5s
        static_configs:
          - targets: ["localhost:9090"]
    
    # scrape Redis Software
      - job_name: redis-enterprise
        scrape_interval: 30s
        scrape_timeout: 30s
        metrics_path: /v2
        scheme: https
        tls_config:
          insecure_skip_verify: true
        static_configs:
          - targets: ["<cluster_name>:8070"]
    
  4. Set up your Prometheus server.

    Note:

    We recommend running Prometheus in Docker only for development and testing.

    To set up Prometheus on Docker:

    1. Create a docker-compose.yml file:

      version: '3'
      services:
          prometheus-server:
              image: prom/prometheus
              ports:
                  - 9090:9090
              volumes:
                  - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      
    2. To start the containers, run:

      $ docker compose up -d
      
    3. To check that all of the containers are up, run:

      docker ps
      
    4. In your browser, sign in to Prometheus at http://localhost:9090 to make sure the server is running.

    5. Select Status and then Targets to check that Prometheus is collecting data from your Redis Software cluster.

      The Redis Software target showing that Prometheus is connected to the Redis Software Cluster.

      If Prometheus is connected to the cluster, you can type node_up in the Expression field on the Prometheus home page to see the cluster metrics.

  5. Integrate Redis Software and your Prometheus server with one of the compatible tools. For help, see the integration guide and official documentation for your chosen tool.

  6. Add dashboards for cluster, database, node, and shard metrics.

On this page