I/O error reading bulk count from MASTER: No space left on device

Good day,

I am probably doing something wrong here so any help would be appreciated. I am running KeyDB on a kubernetes cluster as 3 headless pods backed by 8 Gig statefulsets. They do not have much data but I keep on running out of space for some reason.

redis-master-0:
# Keyspace
db0:keys=39,expires=0,avg_ttl=0
db6:keys=493,expires=0,avg_ttl=0
db7:keys=5,expires=5,avg_ttl=333500870
db8:keys=2,expires=0,avg_ttl=0
db9:keys=1,expires=0,avg_ttl=0

My configuration is:

exec keydb-server /etc/keydb/redis.conf \
  --active-replica yes \
  --multi-master yes \
  --appendonly yes \
  --save "" \
  --bind 0.0.0.0 \
  --port "${PORT}" \
  --protected-mode no \
  --server-threads 4 \
  --repl-diskless-sync no \
  --repl-timeout 600 \
  "REPLICAS[@]}"

Errors that may help?

1:12:S 17 Feb 2021 16:46:38.373 * Full resync requested by replica 127.0.0.1:6379
1:12:S 17 Feb 2021 16:46:38.373 * Waiting for end of BGSAVE for SYNC
1:12:S 17 Feb 2021 16:46:38.453 # Background saving terminated by signal 11
1:12:S 17 Feb 2021 16:46:38.454 # Connection with replica 127.0.0.1:6379 lost.
1:12:S 17 Feb 2021 16:46:38.454 # SYNC failed. BGSAVE child returned an error
1:12:S 17 Feb 2021 16:46:38.454 # Connection with replica 127.0.0.1:6379 lost.
1:12:S 17 Feb 2021 16:46:38.454 # SYNC failed. BGSAVE child returned an error
1:12:S 17 Feb 2021 16:46:39.262 * Connecting to MASTER redis-master-1.redis-headless:6379
1:12:S 17 Feb 2021 16:46:39.264 * MASTER <-> REPLICA sync started
1:12:S 17 Feb 2021 16:46:39.264 * Non blocking connect for SYNC fired the event.
1:12:S 17 Feb 2021 16:46:39.266 * Master replied to PING, replication can continue...
1:12:S 17 Feb 2021 16:46:39.269 * Partial resynchronization not possible (no cached master)
1:12:S 17 Feb 2021 16:46:39.271 * Full resync from master: 1a96aa9c9141db4f5dee262e34e9a3a105b167f8:13342
1:12:S 17 Feb 2021 16:46:42.272 # I/O error reading bulk count from MASTER: Resource temporarily unavailable
1:12:S 17 Feb 2021 16:46:42.275 * Replica 127.0.0.1:6379 asks for synchronization
1:12:S 17 Feb 2021 16:46:42.275 * Full resync requested by replica 127.0.0.1:6379
1:12:S 17 Feb 2021 16:46:42.275 * Replication backlog created, my new replication IDs are 'ab5eb9631be42ce7daa6219d6a402165c90d24cd' and '0000000000000000000000000000000000000000'
1:12:S 17 Feb 2021 16:46:42.275 * Starting BGSAVE for SYNC with target: disk
1:12:S 17 Feb 2021 16:46:42.276 * Background saving started by pid 75087
1:12:S 17 Feb 2021 16:46:42.276 * Replica 127.0.0.1:6379 asks for synchronization
1:12:S 17 Feb 2021 16:46:42.276 * Full resync requested by replica 127.0.0.1:6379
1:12:S 17 Feb 2021 16:46:42.276 * Waiting for end of BGSAVE for SYNC
1:12:S 17 Feb 2021 16:46:42.373 # Background saving terminated by signal 11
1:12:S 17 Feb 2021 16:46:42.373 # Connection with replica 127.0.0.1:6379 lost.
1:12:S 17 Feb 2021 16:46:42.373 # SYNC failed. BGSAVE child returned an error
1:12:S 17 Feb 2021 16:46:42.373 # Connection with replica 127.0.0.1:6379 lost.
1:12:S 17 Feb 2021 16:46:42.373 # SYNC failed. BGSAVE child returned an error
1:12:S 17 Feb 2021 16:46:43.176 * Connecting to MASTER redis-master-2.redis-headless:6379
1:12:S 17 Feb 2021 16:46:43.178 * MASTER <-> REPLICA sync started
1:12:S 17 Feb 2021 16:46:43.178 * Non blocking connect for SYNC fired the event.
1:12:S 17 Feb 2021 16:46:43.180 * Master replied to PING, replication can continue...
1:12:S 17 Feb 2021 16:46:43.181 * Partial resynchronization not possible (no cached master)
1:12:S 17 Feb 2021 16:46:43.183 * Full resync from master: 31b695e92d8dc8eb7855fbe78e36cb860c1a9ce4:11456

Thanks!
Curt

I should also add that I have a lot of ‘core.keydb-server.*’ files that seem to be filling the drive up.

I loaded up one of the core files with gdb. I had to copy the core file into my docker container, install gdb and copy in the core file.

The result is:

Reading symbols from /usr/local/bin/keydb-server...
(No debugging symbols found in /usr/local/bin/keydb-server)
[New LWP 98453]

warning: Can't read pathname for load map: No error information.
Core was generated by `keydb-server /etc/keydb/redis.conf --active-replica yes --multi-master yes --ap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055b56b0c7e04 in lzf_compress ()

Full stack trace:

Core was generated by `keydb-server /etc/keydb/redis.conf --active-replica yes --multi-master yes --sa'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055596aab2e04 in lzf_compress ()
(gdb) bt
#0  0x000055596aab2e04 in lzf_compress ()
#1  0x000055596aad951b in ?? ()
#2  0x000055596aad9895 in rdbSaveRawString(_rio*, unsigned char const*, unsigned long) ()
#3  0x000055596aadb354 in rdbSaveKeyValuePair(_rio*, redisObject*, redisObject*, expireEntry*) ()
#4  0x000055596aadb776 in saveKey(_rio*, redisDb*, int, unsigned long*, char const*, redisObject*) ()
#5  0x000055596aadbb16 in rdbSaveRio(_rio*, int*, int, rdbSaveInfo*) ()
#6  0x000055596aadc74f in rdbSaveFile(char*, rdbSaveInfo*) ()
#7  0x000055596aadc98f in rdbSave(rdbSaveInfo*) ()
#8  0x000055596aadca87 in rdbSaveBackground(rdbSaveInfo*) ()
#9  0x000055596aad4325 in startBgsaveForReplication(int) ()
#10 0x000055596aad4cb8 in syncCommand(client*) ()
#11 0x000055596aaa93d3 in call(client*, int) ()
#12 0x000055596aaad731 in processCommand(client*, int) ()
#13 0x000055596aab9820 in processCommandAndResetClient(client*, int) ()
#14 0x000055596aabe7a3 in processInputBuffer(client*, int) ()
#15 0x000055596aac1293 in processClients() ()
#16 0x000055596aaa548b in beforeSleep(aeEventLoop*) ()
#17 0x000055596aaa1d65 in aeMain ()
#18 0x000055596aaa82c4 in workerThreadMain(void*) ()
#19 0x00007f440fd287b7 in ?? () from /lib/ld-musl-x86_64.so.1
#20 0x0000000000000000 in ?? ()

I was able to work around this by setting:

–rdbcompression no
–server-threads 1 \

Odds are it is fixed with the recent 8 megs of memory per thread fix but isn’t released yet.