2 separate clusters cross data center replication setup question

I have 2 separate clusters with 6 nodes each (3 master, 3 slaves). I want to replicate master-master between both clusters in different data centers. Am I going about this wrong? What is the best way to setup for cross data center replication with HA? when trying to replicate from a master in cluster 1 to a master in cluster 2, I get the following error.

127.0.0.1:6379> replicaof 10.124.192.159 6379
(error) ERR REPLICAOF not allowed in cluster mode.

Here are my 2 clusters.

$ keydb-cli -a dba123 --cluster check 10.124.192.159:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.124.192.159:6379 (554ec592...) -> 3 keys | 5461 slots | 1 slaves.
10.124.193.37:6379 (4a8d25f9...) -> 0 keys | 5462 slots | 1 slaves.
10.124.193.30:6379 (c20ad06e...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 3 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 10.124.192.159:6379)
M: 554ec592bdd9bba92e178e01fca86295f9c0e8d9 10.124.192.159:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 54f6533dd1f27df20a980cae4bc0847ea95db1c9 10.124.192.161:6379
   slots: (0 slots) slave
   replicates c20ad06eab66b2d1c03c7c36b3832fe04b11dd1e
M: 4a8d25f90b8e70897e62d9a4b768096e1a00a629 10.124.193.37:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 872a92a97d7d6bc4c4d4c032751175cae1c83eeb 10.124.193.31:6379
   slots: (0 slots) slave
   replicates 554ec592bdd9bba92e178e01fca86295f9c0e8d9
S: 8eb1cf952b5c2d0937887fe6424b7999e7b4861b 10.124.192.160:6379
   slots: (0 slots) slave
   replicates 4a8d25f90b8e70897e62d9a4b768096e1a00a629
M: c20ad06eab66b2d1c03c7c36b3832fe04b11dd1e 10.124.193.30:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

# keydb-cli -a dba123 --cluster check 10.124.193.51:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.124.193.195:6379 (c3d5ce46...) -> 0 keys | 5461 slots | 1 slaves.
10.124.194.110:6379 (2ad996e4...) -> 0 keys | 5461 slots | 1 slaves.
10.124.193.194:6379 (e9466be6...) -> 0 keys | 5462 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 10.124.193.51:6379)
S: 5f5e90508c8d7b2104a0e0c07769f611fea9184f 10.124.193.51:6379
   slots: (0 slots) slave
   replicates 2ad996e46fb67a504f451d1c34ed89552c641319
S: 5ff6574b36719e0414fa828a81023c48d3d9190f 10.124.194.109:6379
   slots: (0 slots) slave
   replicates c3d5ce46fefc4605e8f883584d41e9e372458aa6
M: c3d5ce46fefc4605e8f883584d41e9e372458aa6 10.124.193.195:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
M: 2ad996e46fb67a504f451d1c34ed89552c641319 10.124.194.110:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 11f0d515ed0d040d6c92c942da0ebea238c505e4 10.124.194.111:6379
   slots: (0 slots) slave
   replicates e9466be6d3bf31f179c2520f5f7a360dff8d38b6
M: e9466be6d3bf31f179c2520f5f7a360dff8d38b6 10.124.193.194:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

@kyle.stephenson, just read KeyDB documentation: In Depth Cluster Tutorial.
It seems that KeyDB clustering is different from the simple replication. In a cluster configuration we have data sharded across cluster nodes. In replication we have just full data replication between the nodes.

Why is the main reason for cluster usage in your case? Do you need to store a lot of data which can’t fit in to the one node RAM?

I was looking to use cluster for High Availability (master,slave) and then be able to replicate that cluster over to another cluster in a different datacenter. Is this not possible with keydb or redis? Is the only way to replicate to another datacenter using simple replication?

Is this not possible with keydb or redis? Is the only way to replicate to another datacenter using simple replication?

Probably I can’t confirm that but it seems that it can’t be done using KeyDB capabilities which I read from the documentation. I have no experience with Redis/KeyDB clusters.

In our case for HA we use 2 x KeyDB nodes in Multi-master configuration and LB in front of them.
It permit us to have a redundancy over the entire dataset with minimal number of nodes and without any complexity in configuration.
Just to be sure that nodes have enough volume of the memory - 2 x DB size.

Such a configuration also can be very easy be extended to the cross datacenter repIication. For more redundancy you can just add more nodes in Multi-master configuration to have a full mesh.

Some notes about KeyDB Multi-master: Using Multiple Masters

Note: This feature is still experimental, if you try it out please let us know how it works for you. Keep an eye on your perf as this feature can occasionally experience traffic storms.

Note: If you are only setting up 2 instances to be masters please use active-replication as it is more stable than multi-master and tested to handle high loads

I am using version 6.0.13 but i do not see in my config a variable by the name of “multi-master”. is this in a newer version than what i have?

keydb-server --version

KeyDB server v=6.0.13 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=5055ac929b8ff5fb

As I remember, it is missing from the default config and should be added per documentation:

multi-master yes
active-replica yes
replicaof 10.0.0.3 6379
replicaof 10.0.0.4 6379

It seems that i can get a 2 node cluster to replicate but when i add a third, synchronization doesn’t work. any idea why node 51 will not sync? what does no cached master mean?

node 111:
9737:9749:S 12 Aug 2020 16:57:15.334 * Partial re synchronization not possible (no cached master)
9737:9749:S 12 Aug 2020 16:57:16.328 * Replica 10.124.193.51:6379 asks for synchronization
9737:9749:S 12 Aug 2020 16:57:16.331 * Replica 10.124.193.194:6379 asks for synchronization
9737:9749:S 12 Aug 2020 16:57:16.335 * Synchronization with replica 10.124.193.51:6379 succeeded
9737:9749:S 12 Aug 2020 16:57:16.335 * Synchronization with replica 10.124.193.194:6379 succeeded
9737:9749:S 12 Aug 2020 17:06:28.342 * Replica 10.124.193.51:6379 asks for synchronization
9737:9749:S 12 Aug 2020 17:06:28.402 * Synchronization with replica 10.124.193.51:6379 succeeded

node 51:
12841:12854:S 12 Aug 2020 17:06:28.342 * Partial re synchronization not possible (no cached master)

node 194:
10493:10506:S 12 Aug 2020 16:57:15.334 * Replica 10.124.194.111:6379 asks for synchronization
10493:10506:S 12 Aug 2020 16:57:15.428 * Synchronization with replica 10.124.194.111:6379 succeeded
10493:10506:S 12 Aug 2020 16:57:16.331 * Partial re synchronization not possible (no cached master)
10493:10506:S 12 Aug 2020 17:03:19.911 * Replica 10.124.193.51:6379 asks for synchronization
10493:10506:S 12 Aug 2020 17:03:19.941 * Synchronization with replica 10.124.193.51:6379 succeeded

We didn’t use 3 node Multi-master configuration for our needs.

I did a test setup right now using Amazon OpsWorks and it started to work without any manual intervention.

Config

# Node1

# REPLICATION
multi-master yes
active-replica yes
replica-read-only no
replicaof node2 6379
replicaof node3 6379

And accordingly on other nodes, replication from the neighbours.

Replication

keydb-cli info replication

Shows on all 3 nodes that master links are up and slaves are connected

Keys

# Node1 - set key
keydb-cli set node keydb-test1
OK

# Node2 - get key
keydb-cli get node
"keydb-test1"

Keyspace

# On all nodes
keydb-cli info keyspace

# Keyspace
db0:keys=1,expires=0,avg_ttl=0

Logs

# Node1
13912:13921:S 13 Aug 2020 17:47:05.588 * MASTER <-> REPLICA sync started
13912:13921:S 13 Aug 2020 17:47:05.589 * Non blocking connect for SYNC fired the event.
13912:13921:S 13 Aug 2020 17:47:05.596 * Master replied to PING, replication can continue...
13912:13921:S 13 Aug 2020 17:47:05.629 * Partial resynchronization not possible (no cached master)
13912:13921:S 13 Aug 2020 17:47:11.697 * Full resync from master: a5cbcaee808a77a4eb770c131a894be22d3df462:600
13912:13921:S 13 Aug 2020 17:47:11.697 * Discarding previously cached master state.
13912:13921:S 13 Aug 2020 17:47:11.708 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
13912:13921:S 13 Aug 2020 17:47:11.719 * MASTER <-> REPLICA sync: Loading DB in memory
13912:13921:S 13 Aug 2020 17:47:11.719 * Loading RDB produced by version 6.0.13

I also see ‘no cached master’ in the logs but then replication finished successfully.

Quote from Redis GitHub issue:

After loading the RDB, if we found the replication-id / offset, and the instance is a slave, this information is just used in order to create a “cached master” in the slave replication state.

It seems hat such message indicates that on loaded RDB no information found about replication.

I have setup my replication like yours with 6 nodes. Below is the node info. Seems that my .111 is the master which has 5 slaves. I thought that this was supposed to be multi-master. I update on one of the slaves and it does not replicate to any other nodes. I update on the master .111 and yes it does replicate to the slaves. How do I make all of these nodes master nodes? Below is what all nodes are set to in the keydb.conf file except for replicaof is set to all ips except for itself.

[Development] root@d-gp2-keydbkyle2-1.imovetv.com:/srv/salt/dba
# grep "multi-master" /etc/keydb/keydb.conf | grep -v "#"
multi-master yes

[Development] root@d-gp2-keydbkyle2-1.imovetv.com:/srv/salt/dba
# grep "active-replica" /etc/keydb/keydb.conf | grep -v "#"
active-replica yes

[Development] root@d-gp2-keydbkyle2-1.imovetv.com:/srv/salt/dba
# grep "replica-read-only" /etc/keydb/keydb.conf | grep -v "#"
replica-read-only no

[Development] root@d-gp2-keydbkyle2-1.imovetv.com:/srv/salt/dba
# grep "replicaof" /etc/keydb/keydb.conf | grep -v "#"
replicaof 10.124.193.194 6379
replicaof 10.124.193.195 6379
replicaof 10.124.194.109 6379
replicaof 10.124.194.110 6379
replicaof 10.124.194.111 6379 
# hostname -i
10.124.193.51

[Development] root@d-gp2-keydbkyle2-1.imovetv.com:/srv/salt/dba
# keydb-cli -a dba123 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:active-replica
master_global_link_status:up
Master 0: 
master_host:10.124.194.111
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:27987
slave_priority:100
slave_read_only:0
connected_slaves:0
master_replid:1c31f590f049d3a074d727940cfc4904b8637ca3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:44199
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:40629
repl_backlog_histlen:3571
# hostname -i
10.124.193.194

[Development] root@d-gp2-keydbkyle2-2.imovetv.com:/srv/salt/dba
# keydb-cli -a dba123 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:active-replica
master_global_link_status:up
Master 0: 
master_host:10.124.194.111
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:27987
slave_priority:100
slave_read_only:0
connected_slaves:0
master_replid:0bdaa7d97bd694f48534123af9f51749a67c4eb2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:45238
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:41691
repl_backlog_histlen:3548
# hostname -i
10.124.193.195

[Development] root@d-gp2-keydbkyle2-3.imovetv.com:/srv/salt/dba
# keydb-cli -a dba123 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:active-replica
master_global_link_status:up
Master 0: 
master_host:10.124.194.111
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:27987
slave_priority:100
slave_read_only:0
connected_slaves:0
master_replid:d90bd436de6835dd0f7124103428525242a6337a
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:45238
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:41691
repl_backlog_histlen:3548
# hostname -i
10.124.194.109

[Development] root@d-gp2-keydbkyle2-4.imovetv.com:/srv/salt/dba
# keydb-cli -a dba123 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:active-replica
master_global_link_status:up
Master 0: 
master_host:10.124.194.111
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:27987
slave_priority:100
slave_read_only:0
connected_slaves:0
master_replid:4ac76cc7bda4b0cd02eae095ff6ed04993e7ba61
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:45238
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:41691
repl_backlog_histlen:3548
# hostname -i
10.124.194.110

[Development] root@d-gp2-keydbkyle2-5.imovetv.com:/srv/salt/dba
# keydb-cli -a dba123 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:active-replica
master_global_link_status:up
Master 0: 
master_host:10.124.194.111
master_port:6379
master_link_status:up
master_last_io_seconds_ago:8
master_sync_in_progress:0
slave_repl_offset:26436
slave_priority:100
slave_read_only:0
connected_slaves:1
slave0:ip=10.124.194.111,port=6379,state=online,offset=28045,lag=0
master_replid:28315b7a725dae32f589731e45218e2910f3423f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:28045
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:7285
repl_backlog_histlen:20761
# hostname -i
10.124.194.111

[Development] root@d-gp2-keydbkyle2-6.imovetv.com:/srv/salt/dba
# keydb-cli -a dba123 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:active-replica
master_global_link_status:up
Master 0: 
master_host:10.124.194.110
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:26755
slave_priority:100
slave_read_only:0
connected_slaves:5
slave0:ip=10.124.194.110,port=6379,state=online,offset=27987,lag=0
slave1:ip=10.124.194.109,port=6379,state=online,offset=27987,lag=1
slave2:ip=10.124.193.194,port=6379,state=online,offset=27987,lag=1
slave3:ip=10.124.193.51,port=6379,state=online,offset=27987,lag=0
slave4:ip=10.124.193.195,port=6379,state=online,offset=27987,lag=0
master_replid:e78741ab39b4d3c52e44113a9a44fed8eaa80fe2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:27987
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:25753
repl_backlog_histlen:2235

I’m always seeing a primary master with a backup master and then the rest are slaves just pointing to the primary master. These slaves will not replicate to the master so I’m not sure how this makes it multi-master except that it will replicate between the two masters and that’s it. If the primary master goes down then replication is done.

I’m using the following binary install.

keydb-server --version

KeyDB server v=6.0.13 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=5055ac929b8ff5fb

I even just setup a 3 node multi-master replication and still the one that is only a slave will not replicate to the masters. from what i see there can only be 2 masters, a primary and a secondary that the primary points to. after that, all nodes are just slave nodes and will not replicate to other nodes.

[Development] root@d-gp2-keydbkyle2-1.imovetv.com:/root
# hostname -i
10.124.193.51

[Development] root@d-gp2-keydbkyle2-1.imovetv.com:/root
# keydb-cli -a dba123 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:active-replica
master_global_link_status:up
Master 0: 
master_host:10.124.193.195
master_port:6379
master_link_status:up
master_last_io_seconds_ago:4
master_sync_in_progress:0
slave_repl_offset:5053
slave_priority:100
slave_read_only:0
connected_slaves:0
master_replid:bf1a26b296331d5086afdadf921096b1f59cc57d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:8444
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:262
repl_backlog_histlen:8183

Bascially i’m getting the same results when turning off multi-master. At most I can only get 2 nodes where I can write to both sides and have them replicate between each other. I’m starting to wonder if there is a bug in the multi-master on the debian version that I’m using.

so when i do the command line and only pass these values below, it works. each node has the 2 slaves connected. so some configuration in the default file is making it fail.

/usr/bin/keydb-server --bind 10.124.193.51 --multi-master yes --active-replica yes --replicaof 10.124.193.194 6379 --replicaof 10.124.193.195 6379
keydb-cli -h 10.124.193.51 -a dba123 info replication

/usr/bin/keydb-server --bind 10.124.193.194 --multi-master yes --active-replica yes --replicaof 10.124.193.51 6379 --replicaof 10.124.193.195 6379
keydb-cli -h 10.124.193.194 -a dba123 info replication

/usr/bin/keydb-server --bind 10.124.193.195 --multi-master yes --active-replica yes --replicaof 10.124.193.51 6379 --replicaof 10.124.193.194 6379
keydb-cli -h 10.124.193.195 -a dba123 info replication

I have put the same configs into the keydb.conf file and it will only work when running from the command line from my last post. Not sure how to get this to work using the keydb.conf config file. any ideas?

ok, i’ve gone line by line and it is now working with the config. this seems to be very buggy. i’ll continue to work on this and update with any new info.

so there must be some weird character or something. i remove all of the comments and just leave the variables and it works. i don’t remove the comments and it does not work. not sure which part is causing it though.

@kyle.stephenson, let’s start just from 3 nodes

  1. Install KeyDB with all defaults and modify just parts related to the Replication
  2. Make sure that in your configs order is preserved - KeyDB 6.0.13 ignore multiple REPLICAOF from config in Multi-master setup #213:
    # Node A
    multi-master yes
    active-replica yes
    replicaof NODE-B 6379
    replicaof NODE-C 6379
    
    # Node B
    multi-master yes
    active-replica yes
    replicaof NODE-A 6379
    replicaof NODE-C 6379
    
    # Node C
    multi-master yes
    active-replica yes
    replicaof NODE-A 6379
    replicaof NODE-B 6379