I’m looking at setting up a HA, multi-master cluster, each master having it’s own slave replica in opposite geographical site (Low latency, high bandwidth link between). This tries to ensure that if we lose a node/physical site the slave replica can automatically be promoted to master and continue covering the slots.
Problem is that after the failure, there doesn’t appear to be any way to automatically failback to original roles (to ensure that we continue to have one data replica on opposite site). Manually using the CLUSTER FAILOVER command on a failed slave does seem to correctly ‘failback’ without dataloss (resetting master/slaves back to their original roles).
To automate failback:
I googled the heck out the problem and only found a couple of redis feature requests wanting the same. I also started playing with the redis-py-cluster library which is severely lacking and flakey. Ideally I want to avoid shelling out/parsing to automate this.
Any ideas to elegantly resolve this?: e.g. Would this be a useful feature request? Is there a better topology that would avoid this issue?..etc?