Support Multi-Tenancy

Moved from GitHub dgraph/2693

Posted by prabirshrestha:

What you wanted to do

Create multiple db/schema on the same server such as tradition dbs (Postgres/MySql/SqlServer). Useful for personal VMs and raspberry pi where I can replace traditional db with dgraph and run in only on server.

What you actually did

Add prefix.

Why that wasn’t great, with examples

cumbersome because now I need to create variables for prefix. If I accidental drop schema, I drop everything.

Any external references to support your case

CREATE DATABASE foodatabase or CREATE SCHEMA fooschema.
Would love to have something like localhost:8080/alter/foodatabase. If foodatabase is not provided it would default to existing behavior.

manishrjain commented :

The advantage of a graph DB is that multiple data sources can be combined together into one, and queried across. Given that benefit, having the division of a database is at best a low priority feature request.

brianbroderick commented :

There are many reasons to have multiple databases; for example, it’s typical to have a dev, test, and prod environment with their respective databases. This makes it so the test database can be recreated before each test run. Right now, the only way to accomplish this is to either have multiple instances of Dgraph running, or to add a prefix to all predicates.

If I only want to clear test environment predicates, adding a prefix complicates queries like this: &api.Operation{DropAll: true}, which I run before any tests. It would also complicate Go structs when determining the right predicate values in JSON.

It’s also typical to work on many micro services at a time, but these micro services should not have any chance of data colliding with each other; they should be completely isolated. It doesn’t seem realistic to have 10+ instances of DGraph running at the same time on a laptop (5+ micro services, each with a dev and test environment)

Lastly, having multiple database support will help people transitioning from an RDMS world to have an easier time making the switch.

liqweed commented :

A major reason for us multiple databases is such an important factor is multitenancy. We intend to implement multitenancy with a database schema per account. That makes data isolation a lot easier (which includes removing an account for example) without provisioning and maintaining thousands of database servers. Implementing multitenancy in dgraph as a schema per account is even a stronger case in my mind since it lacks any mid-level namespace to segment the data (like tables in SQL/Cassandra or collections in MongoDB). That leaves very few options to go about segmenting the data effectively.

I agree with @brianbroderick - we implemented a microservices approach and one of the services is currently using dgraph. We avoid using dgraph for any other service since it would involve automation complexity which we find hard justifying. Had it been any easier to work with more than a single schema, dgraph usage would certainly proliferate in our case.

romshark commented :

Support for multiple isolated databases on a single server would significantly increase my API test’s execution speed which is currently over 226 seconds since all tests need to be executed serially! If I had the guarantee of isolated databases I could setup a database instance for each test individually allowing API tests to run in parallel. It’d theoretically be possible to go from 226s to under 10s (which is huge!)

I could do it myself with graph namespacing, but that’d be very error prone since there’s no isolation guarantees, one test could start mutating another tests’s database leading to a big mess.

I hope this feature will be implemented soon!

aoighost commented :

This would also be extremely useful for my use case as well using dgraph to support multiple workspaces. Also, the ability to reference nodes and create relationships across databases/workspaces would be useful as well.

romshark commented :

@aoighost

Also, the ability to reference nodes and create relationships across databases/workspaces would be useful as well.

It would be the opposite of useful. If you have relationships across “databases” you have a single database. The multi-database feature is about isolation such that one database is physically isolated from another yet maintained by the same process for convenience.

aoighost commented :

@romshark good point

campoy commented :

Whoa, this is a popular request!

OK, we’ll be working on this and seeing whether it can be part of our next release v1.2 expected to be released end of September.

AgentZombie commented :

This might be a good place for the label field in n-quads.

campoy commented :

Hi there @AgentZombie,

Could you explain what you mean by “the label field in n-quads”?

AgentZombie commented :

Sorry. I was speaking specifically about the graph label field in RDF n-quads. dgraph specifically reads RDF n-quads as a superset of n-triples but doesn’t use the fourth value to specify a named graph.

From RDF 1.1 N-Quads

The simplest statement is a sequence of (subject, predicate, object) terms forming an RDF triple and an optional blank node label or IRI labeling what graph in a dataset the triple belongs to, all are separated by whitespace and terminated by ‘.’ after each statement.

This was referenced here, #1143, and probably other places.

campoy commented :

I wasn’t aware of that, and it does make sense to consider it as part of our support for multi-tenancy.

Thanks, @AgentZombie

Willem520 commented :

I think it will be a great feature

aoighost commented :

I’d appreciate it if this were not an enterprise feature. I’m trying to build a app that uses dgraph on the backend as a graph store and multi tenancy would make it a lot easier to build without having to spin up a new docker instance for each workspace. Enterprise only would kill the use of that feature for me. I should also note multi tenancy would make it a lot easier for app developers to use dgraph in general, as it would make it easier to have multiple apps on one pc running dgraph for a backend.

seanlaff commented :

We’re attempting to use one big dgraph instance to serve many discrete customers and need data isolation. This would be a great feature for us.

In the meantime we’ve been experimenting with putting a tenant predicate on every entity- however I worry that this might have some performance drawbacks since every query we send into dgraph has to be a tenant = x query, followed by a @filter of what the end user actually wanted.

From how I understand how dgraph does query planning, I think this means all my queries can only be as fast as that original tenant = x lookup (which hits millions of documents), right? (Since I always need to start at the tenant predicate and then filter)

hubyhuby commented :

We are evaluating / prototyping further the use of dgraph.
For us the minimal set to evaluate Dgraph, requires 3 environments as per the regular dev pipeline : Development / Staging / Production.

Further more the GDPR constrains my company in Europe to partition the data.
We need security by design at the organization level.
A database without Multi-Tenancy feature is a No Go for most companies in Europe.

Even an academic project in Europe cannot use the community edition if they use some kind of personal data (As you cannot tell who can access the data precisely / easily).

As a core DB feature, I believe it should be part of the community edition.

cosmotek commented :

Another upvote for this feature :smiley:

ChStark commented :

Another upvote for this feature, it can also help Dgraph Labs to launch their own Dgraph As A Service easier

dvaldivia commented :

This feature was marked for the 1.2 milestone but I don’t see it in the change log of the 1.2 release, did this feature not made it?