Testing the fix in queries in deployment

pull/21890/head
Yasmin Lorin Kaygalak 1 month ago committed by Melisa Griffin
parent b1d730ad1c
commit 5d8438944b

@ -19,138 +19,113 @@ The Consul dataplane dashboard provides the following information about service
**Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
sum(envoy_server_live{app=~"$service"})
```
</CodeBlockConfig>
### Total request success rate
**Description:** Tracks the percentage of successful requests across the service mesh. It excludes 4xx and 5xx response codes to focus on operational success. Use it to monitor the overall reliability of your services.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
sum(irate(envoy_cluster_upstream_rq_xx{...}[10m]))
```
</CodeBlockConfig>
### Total failed requests
**Description:** This pie chart shows the total number of failed requests within the service mesh, categorized by service. It provides a visual breakdown of where failures are occurring, allowing operators to focus on problematic services.
<CodeBlockConfig heading="Grafana query" language="promql">
```promql
```
sum(increase(envoy_cluster_upstream_rq_xx{...}[10m]))
```
</CodeBlockConfig>
### Requests per second
**Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps operators understand the current load on services and how much traffic they are processing.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
sum(rate(envoy_http_downstream_rq_total{...}[5m]))
```
</CodeBlockConfig>
### Unhealthy clusters
**Description:** This metric tracks the number of unhealthy clusters in the mesh, helping operators identify services that are experiencing issues and need attention to ensure operational health.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
(sum(envoy_cluster_membership_healthy{...}) - sum(envoy_cluster_membership_total{...}))
```
</CodeBlockConfig>
### Heap size
**Description:** This metric displays the total memory heap size of the Envoy proxies. Monitoring heap size is essential to detect memory issues and ensure that services are operating efficiently.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
SUM(envoy_server_memory_heap_size{app=~"$service"})
```
</CodeBlockConfig>
### Allocated memory
**Description:** This metric shows the amount of memory allocated by the Envoy proxies. It helps operators monitor the resource usage of services to prevent memory overuse and optimize performance.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
SUM(envoy_server_memory_allocated{app=~"$service"})
```
</CodeBlockConfig>
### Avg uptime per node
**Description:** This metric calculates the average uptime of Envoy proxies across all nodes. It helps operators monitor the stability of services and detect potential issues with service restarts or crashes.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
avg(envoy_server_uptime{app=~"$service"})
```
</CodeBlockConfig>
### Cluster state
**Description:** This metric indicates whether all clusters are healthy. It provides a quick overview of the cluster state to ensure that there are no issues affecting service performance.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
(sum(envoy_cluster_membership_total{...}) - sum(envoy_cluster_membership_healthy{...})) == bool 0
```
</CodeBlockConfig>
### CPU throttled seconds by namespace
**Description:** This metric tracks the number of seconds during which CPU usage was throttled. Monitoring CPU throttling helps operators identify when services are exceeding their allocated CPU limits and may need optimization.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
rate(container_cpu_cfs_throttled_seconds_total{namespace=~"$namespace"}[5m])
```
</CodeBlockConfig>
### Memory usage by pod limits
**Description:** This metric shows memory usage as a percentage of the memory limit set for each pod. It helps operators ensure that services are staying within their allocated memory limits to avoid performance degradation.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
100 * max(container_memory_working_set_bytes{namespace=~"$namespace"} / kube_pod_container_resource_limits{resource="memory"})
```
</CodeBlockConfig>
### CPU usage by pod limits
**Description:** This metric displays CPU usage as a percentage of the CPU limit set for each pod. Monitoring CPU usage helps operators optimize service performance and prevent CPU exhaustion.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
100 * max(container_cpu_usage{namespace=~"$namespace"} / kube_pod_container_resource_limits{resource="cpu"})
```
</CodeBlockConfig>
### Total active upstream connections
**Description:** This metric tracks the total number of active upstream connections to other services in the mesh. It provides insight into service dependencies and network load.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
sum(envoy_cluster_upstream_cx_active{app=~"$service"})
```
</CodeBlockConfig>
### Total active downstream connections
**Description:** This metric tracks the total number of active downstream connections from services to clients. It helps operators monitor service load and ensure that services are able to handle the traffic effectively.
<CodeBlockConfig heading="Grafana query" language="promql">
```
```promql
sum(envoy_http_downstream_cx_active{app=~"$service"})
```
</CodeBlockConfig>

Loading…
Cancel
Save