UPDATE: Maintenance completed with great success. Thanks!
The site will be down for maintenance today (Friday, May 28, 2010) starting at 22:00 PST for a (long due) Redis upgrade. We expect the outage to last only a few minutes and plan to roll back if downtime exceeds 15 minutes. During this time, the site and all repository access via git/ssh will be unavailable.


2 days ago
Purely out of personal interest, did you spend much time investigating whether or not this upgrade could be achieved with zero downtime? (I'm fascinated by zero downtime upgrades at the moment.) Maybe something like running the new Redis version in parallel with the old one and sending writes to both, or using replication to sync the new version with the old one.
Not meant as a criticism at all, I'm just interested in hearing your thoughts on this kind of thing. It's a fascinating problem.
2 days ago
Agreed. We talked about it a bit. In theory, we should be able to do something like:
redis:6379.redis:6380with a different dump file location.SLAVE OF redis 6379againstredis:6380. Wait for first SYNC to complete.SHUTDOWNonredis:6379.SLAVE OF no oneonredis:6380.But you either need some kind of redispatching proxy in front of the two redis's (should be fairly simple with haproxy), or failover built into the redis clients. I'd probably go the haproxy route if the latency hit wasn't too bad.
I hadn't considered a dual-write approach. That could be interesting too, although the built in replication stuff gets you pretty damn close without it.
What would be great is if redis were to bake in some of this goodness for in-place upgrades. I've been dying to apply that approach since reading about it.
Anyway, we don't have the proxy in place and I'd like to avoid failovering clients if possible. We're currently running on 0.900, so I'm not even sure all of those commands are supported. The background save in 0.900 is hitting the machine pretty hard, plus we're seeing sporadic timeouts, so the number one priority is getting a recent-ish version in place. I'd definitely like to get something a little more resilient and flexible in place, though, so we'll be experimenting in the future. I'll write up what we find if there isn't an established approach by then.
2 days ago
You know, it would be trivial to insert a proxy without any real disruption: light up haproxy on
redis:6378configured withredis:6379as a primary andredis:6380as the secondary, then point all the clients atredis:6378(we can restart everything that runs a redis client gracefully). Once everything is off ofredis:6379and going throughredis:6378, it should be safe to run through the procedure outlined in the previous comment.Maybe we'll give that a shot in staging tomorrow and see how it goes. We'd need to benchmark the proxy vs. the non-proxy config and get everything into puppet and just beat on it for a while, so we wouldn't be able to ship today. If we can get something working in our staging environment, we'll write up the technique and get it in place for the next upgrade.
2 days ago
Sounds like I need to learn more about HAProxy - I'd never considered using it for anything other than HTTP (I currently do load balancing on my projects with nginx).
2 days ago
Including the HAProxy shininess into your plan Ryan, it should look something like:
redis:6379.redis:6380with a different dump file location.SLAVE OF redis 6379againstredis:6380. Wait for first SYNC to complete.echo "enable server redis/redis-6380" | socat stdio unix-connect:/var/run/haproxy/admin.sockecho "disable server redis/redis-6379" | socat stdio unix-connect:/var/run/haproxy/admin.sockSLAVE OF no oneonredis:6380.SHUTDOWNonredis:6379.Assuming that you've got a HAProxy config looking something like
You could then fire up another instance of the new redis on 6379, repeat the process again so that you're back to a clean state again.
2 days ago
@simonw We use HAProxy in TCP mode for several different services. Primarily we use it to load balance between two server processes so that we can do zero downtime restarts of those services. As long as the servers support a graceful restart, the redispatch feature of HAProxy will seamlessly replay a failed connection on one backend to the other backend and the end user will not notice any interruption. All you need to do then is restart one backend, wait for it to come back up, and then restart the other backend. We use this approach quite often.
2 days ago
The easiest way to have no-downtime upgrades is have an architecture that can tolerate some subset of their processes to be down at any time. De-SPOF and this gets easier (not that de-SPOFing is always trivial).
2 days ago
I've done exactly this w/ haproxy and redis - the trick is connections, which will be terminated when you make the haproxy switch. As long as all your client side redis is robust against connection loss, you should be fine.
2 days ago
Another option, that probably performs much better than haproxy, is to use virtual IPs. This is a common solution to implement failover in MySQL master-master installations.
Some tools for inspiration ;-):