Documentation issue with bulk loader

If we run the bulk loader following the instructions using one zero node and one bulk instance, and then copy the p directories to each of the alpha nodes. Once we start the cluster, all mutations on the graph fail with a message like the following:

rpc error: code = Unknown desc = Uid: [195656434] cannot be greater than lease: [0]

Before that, when the cluster comes up, there is a message about the GraphQL schema updating and then the indices are deleted:

I0121 14:32:30.032892       1 admin.go:709] namespace: 0. Skipping GraphQL schema update. newSchema
.Version: 10010, oldSchema.Version: 0, schemaChanged: true.
I0121 14:32:30.034067       1 mutation.go:204] Max open files limit: 1048576
I0121 14:32:30.034576       1 index.go:783] Deleting indexes for
I0121 14:32:30.034672       1 index.go:783] Deleting indexes for
I0121 14:32:30.034708       1 index.go:783] Deleting indexes for

This seems to be unrecoverable. To be clear, we followed the instructions exactly and copied the p directories to the alpha servers, but we did not copy the zw directory to one of the zero servers (since that is not in the instructions).

To fix the problem, we copied the zw directory from the server we ran the bulk and zero processes on to one of the servers in the zero cluster. We are using docker swarm for orchestration and the service names for the original zero server (during bulk) and the zero server we copied the zw directory to are the same. There are some forum posts that hint at this (e.g. Serving bulk-loaded data (HA cluster) - #12 by EnricoMi).

I have not seen a definitive forum post and the documentation should be fixed to reflect the fact that the zw directory created during the bulk load process must be in the final cluster. This should be as easy as inserting a new step in the list here:

  1. Run bulk loader only on one server
  2. Copy (or use rsync) the p directory to the other servers (the servers you will be using to start the other Alpha nodes)
  3. Copy (or use rsync) the zw directory to one of the zero servers (the servers you will be using to start the other Zero nodes); note, the host name must match
  4. Now, start all Alpha nodes at the same time

We are using dgraph v21.03.2, but I think this issue is the same for prior versions as well.

You do need to keep the same Zero (zw directory) from the bulk loader run to the live cluster.

Alternatively, when you have the new Zero running you can bump up the UID lease by calling the /assign endpoint on Zero.

curl "localhost:6080/assign?what=uids&num=195656434"

Thanks @dmai ! I would suggest you update the documentation I linked to above. Anyone who, like us, does the bulk load on one cluster and copies the data to another cluster for production will run into the issue and the documentation does not make it clear that the zw directory needs to be copied as well.

Your workaround by calling the /assign endpoint is good to know - how do you determine the number to use for the UID lease?