We are trying to setup Keydb as 4 node multi-master setup between two datacenters for disaster recovery.
The steps we have taken
- replaced Redis-cluster with KeyDB
- Keydb is the latest version, executing inside docker container
- changed configuration files to default KeyDB
- added the multimaster options to config file ( multi-master yes + active-replica yes)
- multi master options are before the replicaof options. Some thread had comments that the order of these is critical
In our tests, the second node does load the DB fine from the first and only node.
When the third node joins in, the trouble starts:
- nodes bounce up and down, no indication about this in the logs though. But clients do loose connection to the cluster
- thousands of following errors appear to logs in all nodes:
== CRITICAL == This replica is sending an error to its master: ‘Invalid MVCC Tstamp’ after processing the command ‘KEYDB.MVCCRESTORE’
Latest backlog is: ‘“b4-46fd-819b-a879a588c46d\r\n$153\r\n*5\r\n$17\r\nKEYDB.MVCCRESTORE\r\n$44\r\nmyvine:STAT:C271F2F6588DD73BE0530502020A5897\r\n$20\r\n18446744073709551615\r\n$13\r\n1626350182830\r\n$20\r\n\x00\b18763:10\t\x00H\xda\x1a\xd4\xcbZ\xd0\xa1\r\n\r\n$1\r\n0\r\n$19\r\n1700941248263623573\r\n\r\n$1\r\n0\r\n$19\r\n1700941309739532331\r\n”’
Database size is about 3Gb
In dev tests, where the database is much smaller, the same errors appeared, but stopped after couple hours. But with this real system, the problem seems to stay + we can not anyway afford the nodes to bounce up and down as our infra heavily relies on Redis/KeyDB now
From Google I found this related error: https://www.gitmemory.com/issue/EQ-Alpha/KeyDB/309/829592027
However the solution described in there did not help in our case:
client-output-buffer-limit replica 0 0 0
I also tried to increase the thread count in the config file. Did not help.
Any ideas if I am missing something from the config or doing something otherwise wrong?
Thanks for any help!