Keydb and resque

We recently switched out redis for keydb on our web app at my company. We’ve got 2 instances up that are configured in the active replication mode. Since the changeover, we’ve noticed duplicate jobs being created and some jobs never being queued up. We’re receiving no errors on enqueuing the jobs, they just fail to persist in the keydb db.

We’ve run several experiments with a simple script that just enqueues jobs at a fast rate and then processes them.

  1. all writes going to one instance, all reads from one instance, no issues
  2. writes going to both instances, all reads from one instance, no issues
  3. writes going to both instances, reads from both instances, duplicates and dropped jobs

Is there something we can look into to see why the db may be duplicating and dropping jobs?

I’ve created a docker based project to demonstrate the issue

I don’t think that Active Replication is a good application with Resque or Sidekiq. They both use a list structure to store the job queue and then blindly left pop the jobs off the list structure. There’s a latency in replication that can result in some jobs getting processed twice and some jobs being lpop’d without ever having been processed.

@mcmoyer On a larger level yes, Active Replication does not provide full eventual consistency. This is a larger project we are tackling but it’s quite involved.

We have recieved feature requests for stronger guarantees specifically around PUB/SUB which is something we can certainly do before we offer full eventual consistency.