• Scheduled Maintenance Today @ 22:00 PST

    rtomayko 27 May 2010

    UPDATE: Maintenance completed with great success. Thanks!

    The site will be down for maintenance today (Friday, May 28, 2010) starting at 22:00 PST for a (long due) Redis upgrade. We expect the outage to last only a few minutes and plan to roll back if downtime exceeds 15 minutes. During this time, the site and all repository access via git/ssh will be unavailable.

  • 9 Comments

    simonw commented

    2 days ago

    Purely out of personal interest, did you spend much time investigating whether or not this upgrade could be achieved with zero downtime? (I'm fascinated by zero downtime upgrades at the moment.) Maybe something like running the new Redis version in parallel with the old one and sending writes to both, or using replication to sync the new version with the old one.

    Not meant as a criticism at all, I'm just interested in hearing your thoughts on this kind of thing. It's a fascinating problem.

    rtomayko commented github staff

    2 days ago

    Agreed. We talked about it a bit. In theory, we should be able to do something like:

    1. Leave the old redis version running on, say, redis:6379.
    2. Install and start a new redis on redis:6380 with a different dump file location.
    3. Execute SLAVE OF redis 6379 against redis:6380. Wait for first SYNC to complete.
    4. Execute SHUTDOWN on redis:6379.
    5. Execute SLAVE OF no one on redis:6380.

    But you either need some kind of redispatching proxy in front of the two redis's (should be fairly simple with haproxy), or failover built into the redis clients. I'd probably go the haproxy route if the latency hit wasn't too bad.

    I hadn't considered a dual-write approach. That could be interesting too, although the built in replication stuff gets you pretty damn close without it.

    What would be great is if redis were to bake in some of this goodness for in-place upgrades. I've been dying to apply that approach since reading about it.

    Anyway, we don't have the proxy in place and I'd like to avoid failovering clients if possible. We're currently running on 0.900, so I'm not even sure all of those commands are supported. The background save in 0.900 is hitting the machine pretty hard, plus we're seeing sporadic timeouts, so the number one priority is getting a recent-ish version in place. I'd definitely like to get something a little more resilient and flexible in place, though, so we'll be experimenting in the future. I'll write up what we find if there isn't an established approach by then.

    rtomayko commented github staff

    2 days ago

    You know, it would be trivial to insert a proxy without any real disruption: light up haproxy on redis:6378 configured with redis:6379 as a primary and redis:6380 as the secondary, then point all the clients at redis:6378 (we can restart everything that runs a redis client gracefully). Once everything is off of redis:6379 and going through redis:6378, it should be safe to run through the procedure outlined in the previous comment.

    Maybe we'll give that a shot in staging tomorrow and see how it goes. We'd need to benchmark the proxy vs. the non-proxy config and get everything into puppet and just beat on it for a while, so we wouldn't be able to ship today. If we can get something working in our staging environment, we'll write up the technique and get it in place for the next upgrade.

    simonw commented

    2 days ago

    Sounds like I need to learn more about HAProxy - I'd never considered using it for anything other than HTTP (I currently do load balancing on my projects with nginx).

    rodjek commented

    2 days ago

    Including the HAProxy shininess into your plan Ryan, it should look something like:

    1. Leave the old redis version running on, say, redis:6379.
    2. Install and start a new redis on redis:6380 with a different dump file location.
    3. Execute SLAVE OF redis 6379 against redis:6380. Wait for first SYNC to complete.
    4. echo "enable server redis/redis-6380" | socat stdio unix-connect:/var/run/haproxy/admin.sock
    5. echo "disable server redis/redis-6379" | socat stdio unix-connect:/var/run/haproxy/admin.sock
    6. Execute SLAVE OF no one on redis:6380.
    7. Execute SHUTDOWN on redis:6379.

    Assuming that you've got a HAProxy config looking something like

    listen redis <redis ip>:6378
        server redis-6379 localhost:6379 check weight 256
        server redis-6380 localhost:6380 check disabled
    

    You could then fire up another instance of the new redis on 6379, repeat the process again so that you're back to a clean state again.

    mojombo commented github staff

    2 days ago

    @simonw We use HAProxy in TCP mode for several different services. Primarily we use it to load balance between two server processes so that we can do zero downtime restarts of those services. As long as the servers support a graceful restart, the redispatch feature of HAProxy will seamlessly replay a failed connection on one backend to the other backend and the end user will not notice any interruption. All you need to do then is restart one backend, wait for it to come back up, and then restart the other backend. We use this approach quite often.

    The easiest way to have no-downtime upgrades is have an architecture that can tolerate some subset of their processes to be down at any time. De-SPOF and this gets easier (not that de-SPOFing is always trivial).

    I've done exactly this w/ haproxy and redis - the trick is connections, which will be terminated when you make the haproxy switch. As long as all your client side redis is robust against connection loss, you should be fine.

    amix commented

    2 days ago

    Another option, that probably performs much better than haproxy, is to use virtual IPs. This is a common solution to implement failover in MySQL master-master installations.

    Some tools for inspiration ;-):

    Comments are parsed with GitHub Flavored Markdown

Web annotations