Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 76f650e

Browse files
committedOct 18, 2019
Refactored original troubleshoot article
1 parent 8f1f90c commit 76f650e

13 files changed

+354
-19
lines changed
 

‎articles/azure-cache-for-redis/TOC.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,12 @@
9797
href: cache-how-to-monitor.md#operations-and-alerts
9898
- name: Diagnose and troubleshoot
9999
items:
100-
- name: Troubleshoot cache issues
101-
href: cache-how-to-troubleshoot.md
100+
- name: Troubleshoot Redis server
101+
href: cache-howto-troubleshoot-server.md
102+
- name: Troubleshoot Redis client
103+
href: cache-howto-troubleshoot-client.md
104+
- name: Troubleshoot timeouts
105+
href: cache-howto-troubleshoot-timeouts.md
102106
- name: Troubleshoot data loss
103107
href: cache-howto-troubleshoot-data-loss.md
104108
- name: Scale
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
title: Troubleshoot Azure Cache for Redis client | Microsoft Docs
3+
description: Learn how to resolve common client-side issues with Azure Cache for Redis
4+
services: cache
5+
documentationcenter: ''
6+
author: yegu-ms
7+
manager: maiye
8+
editor: ''
9+
10+
ms.assetid:
11+
ms.service: cache
12+
ms.workload: tbd
13+
ms.tgt_pltfrm: cache
14+
ms.devlang: na
15+
ms.topic: article
16+
ms.date: 10/18/2019
17+
ms.author: yegu
18+
19+
---
20+
# Troubleshoot Azure Cache for Redis client-side issues
21+
22+
This section discusses troubleshooting issues that occur because of a condition on the Redis client that your application uses.
23+
24+
- [Memory pressure on Redis client](#memory-pressure-on-redis-client)
25+
- [Traffic burst](#traffic-burst)
26+
- [High client CPU usage](#high-client-cpu-usage)
27+
- [Client-side bandwidth limitation](#client-side-bandwidth-limitation)
28+
- [Large request or response size](#large-request-or-response-size)
29+
30+
## Memory pressure on Redis client
31+
32+
Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of responses from the cache. When memory pressure hits, the system may page data to disk. This _page faulting_ causes the system to slow down significantly.
33+
34+
To detect memory pressure on the client:
35+
36+
- Monitor memory usage on machine to make sure that it doesn't exceed available memory.
37+
- Monitor the client's `Page Faults/Sec` performance counter. During normal operation, most systems have some page faults. Spikes in page faults corresponding with request timeouts can indicate memory pressure.
38+
39+
High memory pressure on the client can be mitigated several ways:
40+
41+
- Dig into your memory usage patterns to reduce memory consumption on the client.
42+
- Upgrade your client VM to a larger size with more memory.
43+
44+
## Traffic burst
45+
46+
Bursts of traffic combined with poor `ThreadPool` settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
47+
48+
Monitor how your `ThreadPool` statistics change over time using [an example `ThreadPoolLogger`](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can use `TimeoutException` messages from StackExchange.Redis like below to further investigate:
49+
50+
System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
51+
IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
52+
53+
In the preceding exception, there are several issues that are interesting:
54+
55+
- Notice that in the `IOCP` section and the `WORKER` section you have a `Busy` value that is greater than the `Min` value. This difference means your `ThreadPool` settings need adjusting.
56+
- You can also see `in: 64221`. This value indicates that 64,211 bytes have been received at the client's kernel socket layer but haven't been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn't reading data from the network as quickly as the server is sending it to you.
57+
58+
You can [configure your `ThreadPool` Settings](https://gist.github.com/JonCole/e65411214030f0d823cb) to make sure that your thread pool scales up quickly under burst scenarios.
59+
60+
## High client CPU usage
61+
62+
High client CPU usage indicates the system can't keep up with the work it's been asked to do. Even though the cache sent the response quickly, the client may fail to process the response in a timely fashion.
63+
64+
Monitor the client's system-wide CPU usage using metrics available in the Azure portal or through performance counters on the machine. Be careful not to monitor *process* CPU because a single process can have low CPU usage but the system-wide CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. High CPU may also cause high `in: XXX` values in `TimeoutException` error messages as described in the [Burst of traffic](#burst-of-traffic) section.
65+
66+
> [!NOTE]
67+
> StackExchange.Redis 1.1.603 and later includes the `local-cpu` metric in `TimeoutException` error messages. Ensure you using the latest version of the [StackExchange.Redis NuGet package](https://www.nuget.org/packages/StackExchange.Redis/). There are bugs constantly being fixed in the code to make it more robust to timeouts so having the latest version is important.
68+
>
69+
70+
To mitigate a client's high CPU usage:
71+
72+
- Investigate what is causing CPU spikes.
73+
- Upgrade your client to a larger VM size with more CPU capacity.
74+
75+
## Client-side bandwidth limitation
76+
77+
Depending on the architecture of client machines, they may have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth by overloading network capacity, then data isn't processed on the client side as quickly as the server is sending it. This situation can lead to timeouts.
78+
79+
Monitor how your Bandwidth usage change over time using [an example `BandwidthLogger`](https://github.com/JonCole/SampleCode/blob/master/BandWidthMonitor/BandwidthLogger.cs). This code may not run successfully in some environments with restricted permissions (like Azure web sites).
80+
81+
To mitigate, reduce network bandwidth consumption or increase the client VM size to one with more network capacity.
82+
83+
## Large request or response Size
84+
85+
A large request/response can cause timeouts. As an example, suppose your timeout value configured on your client is 1 second. Your application requests two keys (for example, 'A' and 'B') at the same time (using the same physical network connection). Most clients support request "pipelining", where both requests 'A' and 'B' are sent one after the other without waiting for their responses. The server sends the responses back in the same order. If response 'A' is large, it can eat up most of the timeout for later requests.
86+
87+
In the following example, request 'A' and 'B' are sent quickly to the server. The server starts sending responses 'A' and 'B' quickly. Because of data transfer times, response 'B' must wait behind response 'A' times out even though the server responded quickly.
88+
89+
|-------- 1 Second Timeout (A)----------|
90+
|-Request A-|
91+
|-------- 1 Second Timeout (B) ----------|
92+
|-Request B-|
93+
|- Read Response A --------|
94+
|- Read Response B-| (**TIMEOUT**)
95+
96+
This request/response is a difficult one to measure. You could instrument your client code to track large requests and responses.
97+
98+
Resolutions for large response sizes are varied but include:
99+
100+
1. Optimize your application for a large number of small values, rather than a few large values.
101+
- The preferred solution is to break up your data into related smaller values.
102+
- See the post [What is the ideal value size range for redis? Is 100 KB too large?](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details on why smaller values are recommended.
103+
1. Increase the size of your VM to get higher bandwidth capabilities
104+
- More bandwidth on your client or server VM may reduce data transfer times for larger responses.
105+
- Compare your current network usage on both machines to the limits of your current VM size. More bandwidth on only the server or only on the client may not be enough.
106+
1. Increase the number of connection objects your application uses.
107+
- Use a round-robin approach to make requests over different connection objects.
108+
109+
## Additional information
110+
111+
- [What Azure Cache for Redis offering and size should I use?](cache-faq.md#what-azure-cache-for-redis-offering-and-size-should-i-use)
112+
- [How can I benchmark and test the performance of my cache?](cache-faq.md#how-can-i-benchmark-and-test-the-performance-of-my-cache)

‎articles/azure-cache-for-redis/cache-howto-troubleshoot-data-loss.md

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Troubleshoot Azure Cache for Redis | Microsoft Docs
3-
description: Learn how to resolve common issues with Azure Cache for Redis.
3+
description: Learn how to resolve data loss issues with Azure Cache for Redis
44
services: cache
55
documentationcenter: ''
66
author: yegu-ms
@@ -20,6 +20,15 @@ ms.author: yegu
2020

2121
# Troubleshoot data loss
2222

23+
This section discusses how to diagnose actual or perceived data losses that may occur in Azure Cache for Redis.
24+
25+
- [Partial loss of keys](#partial-loss-of-keys)
26+
- [Major or complete loss of keys](#major-or-complete-loss-of-keys)
27+
28+
> [!NOTE]
29+
> Several of the troubleshooting steps in this guide include instructions to run Redis commands and monitor various performance metrics. For more information and instructions, see the articles in the [Additional information](#additional-information) section.
30+
>
31+
2332
## Partial loss of keys
2433

2534
Redis doesn't randomly delete keys once they have been stored in memory. It will remove keys, however, in response to expiration or eviction policies as well as explicit key deletion commands. In addition, keys that have been written to the master node in a Premium or Standard Azure Cache for Redis may not be available on a replica right away. Data are replicated from the master to the replica in an asynchronous and non-blocking manner.
@@ -28,10 +37,10 @@ If you find that keys have disappeared from your cache, you can check the follow
2837

2938
| Cause | Description |
3039
|---|---|
31-
| [Key expiration](cache-howto-troubleshoot-data-loss.md#key-expiration) | Keys are removed due to timeouts set on them |
32-
| [Key eviction](cache-howto-troubleshoot-data-loss.md#key-eviction) | Keys are removed under memory pressure |
33-
| [Key deletion](cache-howto-troubleshoot-data-loss.md#key-deletion) | Keys are removed by explicit delete commands |
34-
| [Async replication](cache-howto-troubleshoot-data-loss.md#async-replication) | Keys are not available on a replica due to data replication delays |
40+
| [Key expiration](#key-expiration) | Keys are removed due to timeouts set on them |
41+
| [Key eviction](#key-eviction) | Keys are removed under memory pressure |
42+
| [Key deletion](#key-deletion) | Keys are removed by explicit delete commands |
43+
| [Async replication](#async-replication) | Keys are not available on a replica due to data replication delays |
3544

3645
### Key expiration
3746

@@ -42,25 +51,25 @@ You can use the [INFO](http://redis.io/commands/info) command to get stats on ho
4251
```
4352
# Stats
4453
45-
expired\_keys:46583
54+
expired_keys:46583
4655
4756
# Keyspace
4857
49-
db0:keys=3450,expires=2,avg\_ttl=91861015336
58+
db0:keys=3450,expires=2,avg_ttl=91861015336
5059
```
5160

5261
Furthermore, you can look at diagnostic metrics for your cache to see if there is a correlation between when the key went missing and a spike in expired keys. See the [Appendix](https://gist.github.com/JonCole/4a249477142be839b904f7426ccccf82#appendix) for information on using Keyspace Notifications or MONITOR to debug these types of issues.
5362

5463
### Key eviction
5564

56-
Redis requires memory space to store data. It will purge keys to free up available memory when necessary. When the **used\memory** or **used\memory\rss** values in the [INFO](http://redis.io/commands/info) command approach the configured **maxmemory** setting, Redis will start evicting keys from memory based on [cache policy](http://redis.io/topics/lru-cache).
65+
Redis requires memory space to store data. It will purge keys to free up available memory when necessary. When the **used_memory** or **used_memory_rss** values in the [INFO](http://redis.io/commands/info) command approach the configured **maxmemory** setting, Redis will start evicting keys from memory based on [cache policy](http://redis.io/topics/lru-cache).
5766

5867
You can monitor the number of keys evicted using the [INFO](http://redis.io/commands/info) command.
5968

6069
```
6170
# Stats
6271
63-
evicted\_keys:13224
72+
evicted_keys:13224
6473
```
6574

6675
Furthermore, you can look at diagnostic metrics for your cache to see if there is a correlation between when the key went missing and a spike in evicted keys. See the [Appendix](https://gist.github.com/JonCole/4a249477142be839b904f7426ccccf82#appendix) for information on using Keyspace Notifications or MONITOR to debug these types of issues.
@@ -72,9 +81,9 @@ Redis clients can issue the [DEL](http://redis.io/commands/del) or [HDEL](http:/
7281
```
7382
# Commandstats
7483
75-
cmdstat\_del:calls=2,usec=90,usec\_per\_call=45.00
84+
cmdstat_del:calls=2,usec=90,usec_per_call=45.00
7685
77-
cmdstat\_hdel:calls=1,usec=47,usec\_per\_call=47.00
86+
cmdstat_hdel:calls=1,usec=47,usec_per_call=47.00
7887
```
7988

8089
### Async replication
@@ -87,9 +96,9 @@ If you find that most of or all keys have disappeared from your cache, you can c
8796

8897
| Cause | Description |
8998
|---|---|
90-
| [Key flushing](cache-howto-troubleshoot-data-loss.md#key-flushing) | Keys have been manually purged |
91-
| [Incorrect database selection](cache-howto-troubleshoot-data-loss.md#incorrect-database-selection) | Redis is set to use a non-default database |
92-
| [Redis instance failure](cache-howto-troubleshoot-data-loss.md#redis-instance-failure) | Keys are removed by explicit delete commands |
99+
| [Key flushing](#key-flushing) | Keys have been manually purged |
100+
| [Incorrect database selection](#incorrect-database-selection) | Redis is set to use a non-default database |
101+
| [Redis instance failure](#redis-instance-failure) | Keys are removed by explicit delete commands |
93102

94103
### Key flushing
95104

@@ -98,9 +107,9 @@ Clients can call the [FLUSHDB](http://redis.io/commands/flushdb) command to remo
98107
```
99108
# Commandstats
100109
101-
cmdstat\_flushall:calls=2,usec=112,usec\_per\_call=56.00
110+
cmdstat_flushall:calls=2,usec=112,usec_per_call=56.00
102111
103-
cmdstat\_flushdb:calls=1,usec=110,usec\_per\_call=52.00
112+
cmdstat_flushdb:calls=1,usec=110,usec_per_call=52.00
104113
```
105114

106115
### Incorrect database selection
@@ -111,9 +120,10 @@ Azure Cache for Redis uses the **db0** database by default. If you switch to ano
111120

112121
Redis is an in memory data store. Data are kept on the physical or virtual machines that host Redis. An Azure Cache for Redis instance in the Basic tier runs on only a single virtual machine (VM). When that VM is down, all data that you've stored in the cache is lost. Caches in the Standard and Premium tiers offer much higher resiliency against data loss by using two VMs in a replicated configuration. When the master node in such a cache fails, the replica node will take over to serve data automatically. These VMs are located on separate fault and update domains to minimize the chance of both becoming unavailable simultaneously. In the event of a major datacenter outage, however, the VMs can still go down together. Your data will be lost in these rare cases.
113122

114-
You should consider using [Redis data persistence](http://redis.io/topics/persistence) and[geo-replication](https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-geo-replication) to improve protect your data against these infrastructure failures.
123+
You should consider using [Redis data persistence](http://redis.io/topics/persistence) and [geo-replication](https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-geo-replication) to improve protection of your data against these infrastructure failures.
115124

116125
## Additional information
117126

127+
- [What Azure Cache for Redis offering and size should I use?](cache-faq.md#what-azure-cache-for-redis-offering-and-size-should-i-use)
118128
- [How to monitor Azure Cache for Redis](cache-how-to-monitor.md)
119129
- [How can I run Redis commands?](cache-faq.md#how-can-i-run-redis-commands)
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
title: Troubleshoot Azure Cache for Redis server | Microsoft Docs
3+
description: Learn how to resolve common server-side issues with Azure Cache for Redis
4+
services: cache
5+
documentationcenter: ''
6+
author: yegu-ms
7+
manager: maiye
8+
editor: ''
9+
10+
ms.assetid:
11+
ms.service: cache
12+
ms.workload: tbd
13+
ms.tgt_pltfrm: cache
14+
ms.devlang: na
15+
ms.topic: article
16+
ms.date: 10/18/2019
17+
ms.author: yegu
18+
19+
---
20+
# Troubleshoot Azure Cache for Redis server-side issues
21+
22+
This section discusses troubleshooting issues that occur because of a condition on an Azure Cache for Redis or the virtual machine(s) hosting it.
23+
24+
- [Memory pressure on Redis server](#memory-pressure-on-redis-server)
25+
- [High CPU usage or server load](#high-cpu-usage--server-load)
26+
- [Server-side bandwidth limitation](#server-side-bandwidth-limitation)
27+
28+
> [!NOTE]
29+
> Several of the troubleshooting steps in this guide include instructions to run Redis commands and monitor various performance metrics. For more information and instructions, see the articles in the [Additional information](#additional-information) section.
30+
>
31+
32+
## Memory pressure on Redis server
33+
34+
Memory pressure on the server side leads to all kinds of performance problems that can delay processing of requests. When memory pressure hits, the system may page data to disk. This _page faulting_ causes the system to slow down significantly. There are several possible causes of this memory pressure:
35+
36+
- The cache is filled with data near its maximum capacity.
37+
- Redis is seeing high memory fragmentation. This fragmentation is most often caused by storing large objects since Redis is optimized for small objects.
38+
39+
Redis exposes two stats through the [INFO](https://redis.io/commands/info) command that can help you identify this issue: "used_memory" and "used_memory_rss". You can [view these metrics](cache-how-to-monitor.md#view-metrics-with-azure-monitor) using the portal.
40+
41+
There are several possible changes you can make to help keep memory usage healthy:
42+
43+
- [Configure a memory policy](cache-configure.md#maxmemory-policy-and-maxmemory-reserved) and set expiration times on your keys. This policy may not be sufficient if you have fragmentation.
44+
- [Configure a maxmemory-reserved value](cache-configure.md#maxmemory-policy-and-maxmemory-reserved) that is large enough to compensate for memory fragmentation. For more information, see the additional [considerations for memory reservations](#considerations-for-memory-reservations) below.
45+
- Break up your large cached objects into smaller related objects.
46+
- [Create alerts](cache-how-to-monitor.md#alerts) on metrics like used memory to be notified early about potential impacts.
47+
- [Scale](cache-how-to-scale.md) to a larger cache size with more memory capacity.
48+
49+
> [!NOTE]
50+
> Updating memory reservation values, like maxmemory-reserved, can affect cache performance. Suppose you have a 53-GB cache that is filled with 49 GB of data. Changing the reservation value to 8 GB drops the system's max available memory to 45 GB. If _used_memory_ or _used_memory_rss_ values are higher than 45 GB, the system may evict data until both _used_memory_ and _used_memory_rss_ are below 45 GB. Eviction can increase server load and memory fragmentation.
51+
>
52+
53+
## High CPU usage or server load
54+
55+
A high server load or CPU usage means the server can't process requests in a timely fashion. The server may be slow to respond and unable to keep up with request rates.
56+
57+
[Monitor metrics](cache-how-to-monitor.md#view-metrics-with-azure-monitor) such as CPU or server load. Watch for spikes in CPU usage that correspond with timeouts.
58+
59+
There are several changes you can make to mitigate high server load:
60+
61+
- Investigate what is causing CPU spikes such as running [expensive commands](#expensive-commands) or page faulting because of high memory pressure.
62+
- [Create alerts](cache-how-to-monitor.md#alerts) on metrics like CPU or server load to be notified early about potential impacts.
63+
- [Scale](cache-how-to-scale.md) to a larger cache size with more CPU capacity.
64+
65+
> [!NOTE]
66+
> Some Redis commands are more expensive to execute than others. The [Redis commands documentation](https://redis.io/commands) shows the time complexity of each command. It's recommended you review the commands you're running on your cache to understand the performance impact of those commands. For instance, the [KEYS](https://redis.io/commands/keys) command is often used without knowing that it's an O(N) operation. You can avoid KEYS by using [SCAN](https://redis.io/commands/scan) to reduce CPU spikes.
67+
>
68+
> Using the [SLOWLOG](https://redis.io/commands/slowlog) command, you can measure expensive commands being executed against the server.
69+
>
70+
71+
## Server-side bandwidth limitation
72+
73+
Different cache sizes have different network bandwidth capacities. If the server exceeds the available bandwidth, then data won't be sent to the client as quickly. Clients requests could time out because the server can't push data to the client fast enough.
74+
75+
The "Cache Read" and "Cache Write" metrics can be used to see how much server-side bandwidth is being used. You can [view these metrics](cache-how-to-monitor.md#view-metrics-with-azure-monitor) in the portal.
76+
77+
To mitigate situations where network bandwidth usage is close to maximum capacity:
78+
79+
- Change client call behavior to reduce network demand.
80+
- [Create alerts](cache-how-to-monitor.md#alerts) on metrics like cache read or cache write to be notified early about potential impacts.
81+
- [Scale](cache-how-to-scale.md) to a larger cache size with more network bandwidth capacity.
82+
83+
## Additional information
84+
85+
- [What Azure Cache for Redis offering and size should I use?](cache-faq.md#what-azure-cache-for-redis-offering-and-size-should-i-use)
86+
- [How can I benchmark and test the performance of my cache?](cache-faq.md#how-can-i-benchmark-and-test-the-performance-of-my-cache)
87+
- [How to monitor Azure Cache for Redis](cache-how-to-monitor.md)
88+
- [How can I run Redis commands?](cache-faq.md#how-can-i-run-redis-commands)
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
title: Troubleshoot Azure Cache for Redis timeouts | Microsoft Docs
3+
description: Learn how to resolve common timeout issues with Azure Cache for Redis
4+
services: cache
5+
documentationcenter: ''
6+
author: yegu-ms
7+
manager: maiye
8+
editor: ''
9+
10+
ms.assetid:
11+
ms.service: cache
12+
ms.workload: tbd
13+
ms.tgt_pltfrm: cache
14+
ms.devlang: na
15+
ms.topic: article
16+
ms.date: 10/18/2019
17+
ms.author: yegu
18+
19+
---
20+
# Troubleshoot Azure Cache for Redis timeouts
21+
22+
This section discusses troubleshooting timeout issues that occur when connecting to Azure Cache for Redis.
23+
24+
- [Redis server patching](#redis-server-patching)
25+
- [StackExchange.Redis timeout exceptions](#stackexchangeredis-timeout-exceptions)
26+
27+
> [!NOTE]
28+
> Several of the troubleshooting steps in this guide include instructions to run Redis commands and monitor various performance metrics. For more information and instructions, see the articles in the [Additional information](#additional-information) section.
29+
>
30+
31+
## Redis server patching
32+
33+
## StackExchange.Redis timeout exceptions
34+
35+
StackExchange.Redis uses a configuration setting named `synctimeout` for synchronous operations with a default value of 1000 ms. If a synchronous call doesn’t complete in this time, the StackExchange.Redis client throws a timeout error similar to the following example:
36+
37+
System.TimeoutException: Timeout performing MGET 2728cc84-58ae-406b-8ec8-3f962419f641, inst: 1,mgr: Inactive, queue: 73, qu=6, qs=67, qc=0, wr=1/1, in=0/0 IOCP: (Busy=6, Free=999, Min=2,Max=1000), WORKER (Busy=7,Free=8184,Min=2,Max=8191)
38+
39+
This error message contains metrics that can help point you to the cause and possible resolution of the issue. The following table contains details about the error message metrics.
40+
41+
| Error message metric | Details |
42+
| --- | --- |
43+
| inst |In the last time slice: 0 commands have been issued |
44+
| mgr |The socket manager is doing `socket.select`, which means it's asking the OS to indicate a socket that has something to do. The reader isn't actively reading from the network because it doesn't think there's anything to do |
45+
| queue |There are 73 total in-progress operations |
46+
| qu |6 of the in-progress operations are in the unsent queue and haven't yet been written to the outbound network |
47+
| qs |67 of the in-progress operations have been sent to the server but a response isn't yet available. The response could be `Not yet sent by the server` or `sent by the server but not yet processed by the client.` |
48+
| qc |0 of the in-progress operations have seen replies but haven't yet been marked as complete because they're waiting on the completion loop |
49+
| wr |There's an active writer (meaning the 6 unsent requests aren't being ignored) bytes/activewriters |
50+
| in |There are no active readers and zero bytes are available to be read on the NIC bytes/activereaders |
51+
52+
You can use the following steps to investigate possible root causes.
53+
54+
1. As a best practice, make sure you're using the following pattern to connect when using the StackExchange.Redis client.
55+
56+
```csharp
57+
private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
58+
{
59+
return ConnectionMultiplexer.Connect("cachename.redis.cache.windows.net,abortConnect=false,ssl=true,password=...");
60+
61+
});
62+
63+
public static ConnectionMultiplexer Connection
64+
{
65+
get
66+
{
67+
return lazyConnection.Value;
68+
}
69+
}
70+
```
71+
72+
For more information, see [Connect to the cache using StackExchange.Redis](cache-dotnet-how-to-use-azure-redis-cache.md#connect-to-the-cache).
73+
74+
1. Ensure that your server and the client application are in the same region in Azure. For example, you might be getting timeouts when your cache is in East US but the client is in West US and the request doesn't complete within the `synctimeout` interval or you might be getting timeouts when you're debugging from your local development machine.
75+
76+
Its highly recommended to have the cache and in the client in the same Azure region. If you have a scenario that includes cross region calls, you should set the `synctimeout` interval to a value higher than the default 1000-ms interval by including a `synctimeout` property in the connection string. The following example shows a snippet of a connection string for StackExchange.Redis provided by Azure Cache for Redis with a `synctimeout` of 2000 ms.
77+
78+
synctimeout=2000,cachename.redis.cache.windows.net,abortConnect=false,ssl=true,password=...
79+
1. Ensure you using the latest version of the [StackExchange.Redis NuGet package](https://www.nuget.org/packages/StackExchange.Redis/). There are bugs constantly being fixed in the code to make it more robust to timeouts so having the latest version is important.
80+
1. If your requests are bound by bandwidth limitations on the server or client, it takes longer for them to complete and can cause timeouts. To see if your timeout is because of network bandwidth on the server, see [Server-side bandwidth exceeded](#server-side-bandwidth-exceeded). To see if your timeout is because of client network bandwidth, see [Client-side bandwidth exceeded](#client-side-bandwidth-exceeded).
81+
1. Are you getting CPU bound on the server or on the client?
82+
83+
- Check if you're getting bound by CPU on your client. High CPU could cause the request to not be processed within the `synctimeout` interval and cause a request to time out. Moving to a larger client size or distributing the load can help to control this problem.
84+
- Check if you're getting CPU bound on the server by monitoring the `CPU` [cache performance metric](cache-how-to-monitor.md#available-metrics-and-reporting-intervals). Requests coming in while Redis is CPU bound can cause those requests to time out. To address this condition, you can distribute the load across multiple shards in a premium cache, or upgrade to a larger size or pricing tier. For more information, see [Server Side Bandwidth Exceeded](#server-side-bandwidth-exceeded).
85+
1. Are there commands taking long time to process on the server? Long-running commands that are taking long time to process on the redis-server can cause timeouts. For more information about long-running commands, see [Expensive commands](#expensive-commands). You can connect to your Azure Cache for Redis instance using the redis-cli client or the [Redis Console](cache-configure.md#redis-console). Then, run the [SLOWLOG](https://redis.io/commands/slowlog) command to see if there are requests slower than expected. Redis Server and StackExchange.Redis are optimized for many small requests rather than fewer large requests. Splitting your data into smaller chunks may improve things here.
86+
87+
For information on connecting to your cache's SSL endpoint using redis-cli and stunnel, see the blog post [Announcing ASP.NET Session State Provider for Redis Preview Release](https://blogs.msdn.com/b/webdev/archive/2014/05/12/announcing-asp-net-session-state-provider-for-redis-preview-release.aspx).
88+
1. High Redis server load can cause timeouts. You can monitor the server load by monitoring the `Redis Server Load` [cache performance metric](cache-how-to-monitor.md#available-metrics-and-reporting-intervals). A server load of 100 (maximum value) signifies that the redis server has been busy, with no idle time, processing requests. To see if certain requests are taking up all of the server capability, run the SlowLog command, as described in the previous paragraph. For more information, see High CPU usage / Server Load.
89+
1. Was there any other event on the client side that could have caused a network blip? Common events include: scaling the number of client instances up or down, deploying a new version of the client, or autoscale enabled. In our testing, we have found that autoscale or scaling up/down can cause outbound network connectivity to be lost for several seconds. StackExchange.Redis code is resilient to such events and reconnects. While reconnecting, any requests in the queue can time out.
90+
1. Was there a large request preceding several small requests to the cache that timed out? The parameter `qs` in the error message tells you how many requests were sent from the client to the server, but haven't processed a response. This value can keep growing because StackExchange.Redis uses a single TCP connection and can only read one response at a time. Even though the first operation timed out, it doesn't stop more data from being sent to or from the server. Other requests will be blocked until the large request is finished and can cause time outs. One solution is to minimize the chance of timeouts by ensuring that your cache is large enough for your workload and splitting large values into smaller chunks. Another possible solution is to use a pool of `ConnectionMultiplexer` objects in your client, and choose the least loaded `ConnectionMultiplexer` when sending a new request. Loading across multiple connection objects should prevent a single timeout from causing other requests to also time out.
91+
1. If you're using `RedisSessionStateProvider`, ensure you have set the retry timeout correctly. `retryTimeoutInMilliseconds` should be higher than `operationTimeoutInMilliseconds`, otherwise no retries occur. In the following example `retryTimeoutInMilliseconds` is set to 3000. For more information, see [ASP.NET Session State Provider for Azure Cache for Redis](cache-aspnet-session-state-provider.md) and [How to use the configuration parameters of Session State Provider and Output Cache Provider](https://github.com/Azure/aspnet-redis-providers/wiki/Configuration).
92+
93+
```xml
94+
<add
95+
name="AFRedisCacheSessionStateProvider"
96+
type="Microsoft.Web.Redis.RedisSessionStateProvider"
97+
host="enbwcache.redis.cache.windows.net"
98+
port="6380"
99+
accessKey=""
100+
ssl="true"
101+
databaseId="0"
102+
applicationName="AFRedisCacheSessionState"
103+
connectionTimeoutInMilliseconds = "5000"
104+
operationTimeoutInMilliseconds = "1000"
105+
retryTimeoutInMilliseconds="3000" />
106+
```
107+
108+
1. Check memory usage on the Azure Cache for Redis server by [monitoring](cache-how-to-monitor.md#available-metrics-and-reporting-intervals) `Used Memory RSS` and `Used Memory`. If an eviction policy is in place, Redis starts evicting keys when `Used_Memory` reaches the cache size. Ideally, `Used Memory RSS` should be only slightly higher than `Used memory`. A large difference means there's memory fragmentation (internal or external). When `Used Memory RSS` is less than `Used Memory`, it means part of the cache memory has been swapped by the operating system. If this swapping occurs, you can expect some significant latencies. Because Redis doesn't have control over how its allocations are mapped to memory pages, high `Used Memory RSS` is often the result of a spike in memory usage. When Redis server frees memory, the allocator takes the memory but it may or may not give the memory back to the system. There may be a discrepancy between the `Used Memory` value and memory consumption as reported by the operating system. Memory may have been used and released by Redis but not given back to the system. To help mitigate memory issues, you can do the following steps:
109+
110+
- Upgrade the cache to a larger size so that you aren't running against memory limitations on the system.
111+
- Set expiration times on the keys so that older values are evicted proactively.
112+
- Monitor the `used_memory_rss` cache metric. When this value approaches the size of their cache, you're likely to start seeing performance issues. Distribute the data across multiple shards if you're using a premium cache, or upgrade to a larger cache size.
113+
114+
For more information, see [Memory pressure on Redis server](cache-howto-troubleshoot-server.md#memory-pressure-on-the-server).
115+
116+
## Additional information
117+
118+
- [What Azure Cache for Redis offering and size should I use?](cache-faq.md#what-azure-cache-for-redis-offering-and-size-should-i-use)
119+
- [How can I benchmark and test the performance of my cache?](cache-faq.md#how-can-i-benchmark-and-test-the-performance-of-my-cache)
120+
- [How can I run Redis commands?](cache-faq.md#how-can-i-run-redis-commands)
121+
- [How to monitor Azure Cache for Redis](cache-how-to-monitor.md)
Loading
Loading
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.