CPU spike with heavy writes (1 zero, 3 alphas, replicas set to 3)

One of our alpha nodes spikes greatly when being written to (400%+ CPU usage according to docker stats). Once the writes stop, the CPU hovers between 10-20%. I have a few questions. Each alpha server is given 16GB of LRU.

  1. What is the best strategy for a heavy write situation? We currently have one zero, 3 alphas with the replica factor set to 3. Should we shard our data? Have multiple zeros? Which part of dgraph helps each type of situation (reads, writes, availability, etc).

  2. If we shard, how does requesting data from one alpha find data from a different alpha if the the predicate is on a different alpha?

  3. Are the badger options to help with writes? Reads? Etc.

We have over 100 millions edges just going off of the logs from dgraph bulk. Many times we are writing to an index, can we simply not do this in bulk or quickly?

If hardware wasn’t an issue, what would be the ideal setup for dgraph?

no need.

Try 6 Alphas

No sure what you mean

Alphas talk to each other and Zero manage them
Check the state http://localhost:6080/state of your Zero.

What really helps is a good storage and RAM. Try NVME tho.

don’t get it, are you talking about Bulk Loader or Live Loader? If you are using Live Loader and your server already has the Schema that you want. Just do not pass Schema again on the new load.

Thanks for the reply!

What really helps is a good storage and RAM. Try NVME tho.

We run the 3 alphas on one machine with 128GB of RAM and an SSD with 2.5TB of space using DigitalOcean. That machine has 24 cores available to it CPU wise. We are using Docker and Docker Compose right now to manage everything. We can’t host the machine ourselves but I’d assume the likes of DigitalOcean are probably using the best hardware for the job.

Now, while I understand we would better off with each alpha on a separate machine, only one alpha had a spike in CPU and the others were hovering around 10% so I don’t believe the machine itself was under any sort of crazy stress.

Which part of dgraph helps each type of situation

What I meant to ask is, are there ways to tune Dgraph to be more suited for writes vs reads. If so, which options (Badger options, no expanding edges, etc) affect which operation (read, write) positively or negatively?

Try 6 Alphas

Are you saying to try 6 alphas, with the replication factor set to 3 or simply to use 6 alphas and have every alpha be a replica? If you mean 6 alphas, 3 replicas, I have a question about setting that up. Currently, we set up our graph using the bulk loader and copy the p directory into each alpha and start the servers up. This works well and is fairly simple.

However, when I try to set the reduce_shards=2, map_shards=4 and copy the 0/p directory from the bulk loader into the first 3 alphas and the 1/p directory into the next three, nothing works. The graph seems to delete predicates and move things around and my queries return no data.

To quote the docs Get started with Dgraph

Reduce shards : Before running the bulk load, you need to decide how many Alpha groups will be running when the cluster starts. The number of Alpha groups will be the same number of reduce shards you set with the --reduce_shards flag. For example, if your cluster will run 3 Alpha with 3 replicas per group, then there is 1 group and --reduce_shards should be set to 1. If your cluster will run 6 Alphas with 3 replicas per group, then there are 2 groups and --reduce_shards should be set to 2. Because --reduce_shards was set to 2, there are two sets of p directories: one in ./out/0 directory and another in the ./out/1 directory. Once the output is created, they can be copied to all the servers that will run Dgraph Alphas. Each Dgraph Alpha must have its own copy of the group’s p directory output. Each replica of the first group should have its own copy of ./out/0/p , each replica of the second group should have its own copy of ./out/1/p , and so on.

That paragraph does not really explain much, at least for me. I understand that the groups will eventually be auto-balanced but I’ve had issues with that before since it takes awhile for all the data to copy over from one alpha to another.

don’t get it, are you talking about Bulk Loader or Live Loader? If you are using Live Loader and your server already has the Schema that you want. Just do not pass Schema again on the new load.

We use the bulk loader with a rather larger (1GB+) RDF file for initial setup. When we upgrade to new Dgraph versions that require an export and import (due to compatability changes like 1.0.10), we just use the bulk loader with an RDF we generate via Postgres instead of exporting the data out of Dgraph and back in.

All I meant is that it seems like we have a lot of data (more than the movie data set) and I was not sure if having that much data represented a problem for writes or if write performance is supposed to remain constant no matter the size of the graph.

UPDATE:

I get the following error with 6 alphas, 3 replicas.

Schema not defined for predicate: id.

However the state of my zero clearly shows it there.

{
	counter: "1051",
	groups: {
		1: {
			members: {
				1: {
					id: "1",
					groupId: 1,
					addr: "cluster0-node2:7082",
					leader: true,
					lastUpdate: "1541605474"
				},
				2: {
					id: "2",
					groupId: 1,
					addr: "cluster0-node0:7080"
				},
				3: {
					id: "3",
					groupId: 1,
					addr: "cluster0-node1:7081"
				}
			},
			tablets: {
				_predicate_: {
					groupId: 1,
					predicate: "_predicate_",
					space: "31"
				},
				id: {
					groupId: 1,
					predicate: "id"
				},
				matchOrder: {
					groupId: 1,
					predicate: "matchOrder",
					space: "270219578"
				}
			}
		},
		2: {
			members: {
				4: {
					id: "4",
					groupId: 2,
					addr: "cluster0-node4:7084",
					leader: true,
					lastUpdate: "1541605478"
				},
				5: {
					id: "5",
					groupId: 2,
					addr: "cluster0-node3:7083"
				},
				6: {
					id: "6",
					groupId: 2,
					addr: "cluster0-node4:7085"
				}
			},
			tablets: {
				email: {
					groupId: 2,
					predicate: "email",
					space: "249666916"
				},
				rawLocation: {
					groupId: 2,
					predicate: "rawLocation",
					space: "134326287"
				},
				title: {
					groupId: 2,
					predicate: "title",
					space: "298198008"
				}
			}
		}
	},
	zeros: {
		1: {
			id: "1",
			addr: "zero:5080",
			leader: true
		}
	},
	maxLeaseId: "10250000",
	maxTxnTs: "10000",
	maxRaftId: "6",
	cid: "d227d288-0d05-4b3d-a773-4aa3d81fee63"
}

Before I get into deep in your reply, please take a read on this conversation Zero rebalance_interval server write error predicate_move I’ll be back.

I read the thread. I don’t believe that answers any of my questions. I understand what reduce_shards is doing and what it needs to be set to. I switched the map_shards to be 2 instead of 4 (even though the docs say increasing that number helps with balance and only needs to be at-least the same number of reduce_shards). Doing so seems to have fixed my previous issue of Schema not defined for predicate: id; however, my query returns no data now.

EDIT: The zero decided to randomly delete one of my predicates. It has been moving others around and rebalancing but it completed removed this. It no longer shows up in the state of my zero.

zero_1 | I1107 16:31:52.912810 1 raft.go:272] Removing tablet for attr: [knows], gid: [2]

Not really sure what is going on here.

Some of my data does exist and I can make simply queries to return those nodes but it seems that some predicates are missing and other predicates were removed (potentially during re-balance) but never copied over to the other group.

I can confirm I am missing at least two predicates that are normally there if I only use replica alphas and no sharding.

TL;DR:
6 alphas, 1 zero, 3 replica factor. Using dgraph bulk with reduce_shards=2 and map_shards=2. Predicates are either missing or are removed randomly after waiting a few minutes. Some data is there and can be queried (simple node queries) while other data is just gone.

Current zero state

{
	counter: "1112",
	groups: {
		1: {
			members: {
				1: {
					id: "1",
					groupId: 1,
					addr: "cluster0-node0:7080",
					lastUpdate: "1541609896"
				},
				2: {
					id: "2",
					groupId: 1,
					addr: "cluster0-node1:7081",
					leader: true,
					lastUpdate: "1541610197"
				},
				3: {
					id: "3",
					groupId: 1,
					addr: "cluster0-node2:7082",
					lastUpdate: "1541610047"
				}
			},
			tablets: {
				_job: {
					groupId: 1,
					predicate: "_job",
					space: "10674657"
				},
				_predicate_: {
					groupId: 1,
					predicate: "_predicate_",
					space: "31"
				},
				id: {
					groupId: 1,
					predicate: "id",
					space: "302188347"
				},
				knows: {
					groupId: 1,
					predicate: "knows"
				},
				name: {
					groupId: 1,
					predicate: "name",
					space: "306195323"
				},
				owner: {
					groupId: 1,
					predicate: "owner",
					space: "658882"
				},
				potentialEmployers: {
					groupId: 1,
					predicate: "potentialEmployers",
					space: "199597363"
				}
			}
		},
		2: {
			members: {
				4: {
					id: "4",
					groupId: 2,
					addr: "cluster0-node3:7083",
					leader: true,
					lastUpdate: "1541607415"
				},
				5: {
					id: "5",
					groupId: 2,
					addr: "cluster0-node4:7084"
				},
				6: {
					id: "6",
					groupId: 2,
					addr: "cluster0-node4:7085"
				}
			},
			tablets: {
				_profile: {
					groupId: 2,
					predicate: "_profile",
					space: "199282347"
				},
				companyName: {
					groupId: 2,
					predicate: "companyName",
					space: "242811127"
				},
				group: {
					groupId: 2,
					predicate: "group",
					space: "228944626"
				},
				seniority: {
					groupId: 2,
					predicate: "seniority",
					space: "233003923"
				}
			}
		}
	},
	zeros: {
		1: {
			id: "1",
			addr: "zero:5080",
			leader: true
		}
	},
	maxLeaseId: "10250000",
	maxTxnTs: "10000",
	cid: "880d5d6c-188d-4daa-b03f-68dd01a210b7"
}

Notice that knows is back because I tried to query it but the space field is gone (no data).

There is information about /P directory moving, about reduce_shards. I did not intend to answer all your questions with just one Thread. It was just to advance the process.

Size isn’t a thing, we have a client with TB* of data. Looking at your specifications, I do not see an impasse there. Certainly it’s something else hard to find as obvious.

Dgraph does not delete anything, it does balancing. Deletion is only performed due to a mutation.

I’ll come back, checking all context and do tests.

@emhagman what version of Dgraph are you running here?

As Michel said, every Alpha knows which other member is able to serve a particular predicate, so any Alpha can respond with the complete request to the client. This is also explained step by step in this doc: Get started with Dgraph

I am using v1.0.10 for this setup. We are currently using v1.0.9 in production with 3 alphas and a replica of 3, so no sharding going on. Everything has been mostly OK except for the CPU spike during writes as I mentioned, which then causes us to have a bad time with reads from that server. I have v1.0.10 mostly working on our staging environment except the predicates either not showing or being removed.

Our alphas (in production) generally hover around 16GB of RAM usage each and the CPU normally around 10-20% without heavy activity. We can increase that even more (we have access to a server with 196MB of RAM) but I don’t know if that will truly help.

I am very concerned about the predicate being removed for apparently no reason. Any idea what could have happened? I don’t feel like I can set sharding up until I know why that happened.

Can you confirm that the predicate is present on disk? You can use the dgraph debug tool to do so. It requires that the p directory isn’t being used, so you should turn off the Alpha first, and then take a copy of the p directory from that Alpha. One from an Alpha of each group should be enough to confirm.

Here’s how you can use the debug tool:

To display all the (Badger) keys related to the knows predicate:

dgraph debug --postings=/path/to/p --pred=knows

You should see output prefixed with {d}, which signifies a data key.

I no longer have access to that folder so I can’t look it up.

I am now using v1.0.9 again, using 6 alphas with 3 replicas. reduce_shards=2 and I seem to have to set map_shards=2 because anything that does not match reduce_shards is causing predicates to not show up.

Something I am noticing is that the servers where I copy the 0/p directory into and 1/p directory don’t match the proper groups in my zero state. Is there a way to tell the alpha which group it should be in?

That is to say, I know the server hostnames for my 0 directory and my 1 directory and their groups do not match what they should be based on the directory I copied. I assume that the 0 directory is supposed to be group 1 and the 1 is supposed to be group 2 in my zero’s state?

EDIT: If I use map_shards=2 then the issue above for the groups does not exist but I can no longer see my knows predicate even at the beginning of the servers starting. I can confirm they’re in the RDF file and the format of this file has not changed for a long time. I am currently trying my old setup with 3 alphas and a replica of 3 to make sure nothing has changed.

EDIT2: Can confirm, with the same schema, same RDF file, that when I use 3 alphas and a replica of 3 all my predicates (including knows) show up in the zero state. Graph functioning as normal like it does on production for us. Definitely seems like something odd is happening with my bulk load / p directory copy part of my setup.

Zero state with no sharding

{
	counter: "1073",
	groups: {
		1: {
			members: {
				1: {
					id: "1",
					groupId: 1,
					addr: "cluster0-node0:7080",
					leader: true,
					lastUpdate: "1541697667"
				},
				2: {
					id: "2",
					groupId: 1,
					addr: "cluster0-node1:7081"
				},
				3: {
					id: "3",
					groupId: 1,
					addr: "cluster0-node2:7082"
				}
			},
			tablets: {
				_group: {
					groupId: 1,
					predicate: "_group",
					space: "122316009"
				},
				_job: {
					groupId: 1,
					predicate: "_job",
					space: "21349307"
				},
				_predicate_: {
					groupId: 1,
					predicate: "_predicate_",
					space: "1335217660"
				},
				_profile: {
					groupId: 1,
					predicate: "_profile",
					space: "244901195"
				},
				companyName: {
					groupId: 1,
					predicate: "companyName",
					space: "287111706"
				},
				email: {
					groupId: 1,
					predicate: "email",
					space: "295282365"
				},
				group: {
					groupId: 1,
					predicate: "group",
					space: "288492351"
				},
				id: {
					groupId: 1,
					predicate: "id",
					space: "567462587"
				},
				knows: {
					groupId: 1,
					predicate: "knows",
					space: "138238589"
				},
				location: {
					groupId: 1,
					predicate: "location",
					space: "216849805"
				},
				match: {
					groupId: 1,
					predicate: "match",
					space: "473800090"
				},
				matchOrder: {
					groupId: 1,
					predicate: "matchOrder",
					space: "387868186"
				},
				name: {
					groupId: 1,
					predicate: "name",
					space: "333813680"
				},
				name @en: {
					groupId: 1,
					predicate: "name@en",
					space: "149466290"
				},
				owner: {
					groupId: 1,
					predicate: "owner",
					space: "1115615"
				},
				potentialEmployers: {
					groupId: 1,
					predicate: "potentialEmployers",
					space: "299644820"
				},
				rawLocation: {
					groupId: 1,
					predicate: "rawLocation",
					space: "156663261"
				},
				seniority: {
					groupId: 1,
					predicate: "seniority",
					space: "280526570"
				},
				title: {
					groupId: 1,
					predicate: "title",
					space: "329736971"
				},
				title @en: {
					groupId: 1,
					predicate: "title@en",
					space: "28921895"
				}
			}
		}
	},
	zeros: {
		1: {
			id: "1",
			addr: "zero:5080",
			leader: true
		}
	},
	maxLeaseId: "10250000",
	maxTxnTs: "10000",
	cid: "1a82f854-9b05-4d10-9d2e-4dbeee665146"
}

You should first start all the alphas of the first group, and then start all the alphas of the second group. That would let Zero assign each of them to the correct group during initialization.

That’s right.

Okay, so bring up group 1, wait, then bring up group 2. I can do that. I’ll report back. Do you have any ideas about map_shards? Do you think I am having issues because of the group assignment? I understand it isn’t a huge deal but my groups will start out very uneven in the beginning if I have to set map_shards to the same number of reduce_shards

Also maybe we should add the group timings to the deployment docs?

You must set the correct p directory shard to the correct Alpha groups. Otherwise you would see issues.

Increasing --map_shards would do better with evenly distributing the data across the groups. Though when the cluster is active, Zero would decide to rebalance the predicates.

I tried what you said. The groups are assigned properly, but my knows predicate is nowhere to be found on either group. Used reduce_shards=2 and map_shards=2. I’m not sure if it is a coincidence that the only predicate missing is the one with type uid.

I ran the debug statement on my alphas, one from each group. Here is the output:

# dgraph debug --postings=p --pred=knows
Opening DB: p
badger2018/11/09 21:34:09 INFO: Replaying file id: 20 at offset: 114731620
badger2018/11/09 21:34:09 INFO: Replay took: 15.377µs
prefix = 00000000  00 00 05 6b 6e 6f 77 73                           |...knows|

Found 0 keys
# dgraph debug --postings=p --pred=knows
Opening DB: p
badger2018/11/09 21:35:33 INFO: Replaying file id: 23 at offset: 24325443
badger2018/11/09 21:35:33 INFO: Replay took: 16.69µs
prefix = 00000000  00 00 05 6b 6e 6f 77 73                           |...knows|

Found 0 keys

Same exact schema and RDF file as the 3 alpha, 3 replica setup that works and has all predicates. Waited 60 seconds after the first group to startup the 2nd group. This is version v1.0.10.

Just as a test, I used the debug command on another predicate and it worked.

Opening DB: p
badger2018/11/09 21:37:40 INFO: Replaying file id: 23 at offset: 24325443
badger2018/11/09 21:37:40 INFO: Replay took: 17.578µs
prefix = 00000000  00 00 04 6e 61 6d 65                              |...name|

{d} attr: name uid: 2  item: [102, b1000] key: 0000046e616d65000000000000000002
{d} attr: name uid: 3  item: [98, b1000] key: 0000046e616d65000000000000000003
{d} attr: name uid: 4  item: [94, b1000] key: 0000046e616d65000000000000000004
{d} attr: name uid: 5  item: [102, b1000] key: 0000046e616d65000000000000000005
{d} attr: name uid: 6  item: [102, b1000] key: 0000046e616d65000000000000000006
{d} attr: name uid: 8  item: [99, b1000] key: 0000046e616d65000000000000000008
{d} attr: name uid: 9  item: [92, b1000] key: 0000046e616d65000000000000000009
{d} attr: name uid: 10  item: [102, b1000] key: 0000046e616d6500000000000000000a
{d} attr: name uid: 11  item: [111, b1000] key: 0000046e616d6500000000000000000b
{d} attr: name uid: 12  item: [113, b1000] key: 0000046e616d6500000000000000000c
..... More

Are you running with the same Zero(s) used for the bulk load?

Yes, same zero. I am using docker and docker-compose so I delete the containers and volumes each time I try a new setup.

@dmai Any thoughts???

Can you try with v1.0.11-rc2? I’m unable to see the issue on my end when running a two-group cluster with bulk loaded data.

I am going to go ahead and try the new version. We’ve been running on v1.0.9 mostly stable for a few months. Am I correct that there was a race condition introduced into v1.0.11 and that I should either wait or try v1.0.12RC?

I think we should cut a new RC for v1.0.12. You could just jump straight to that.