Schema query filtering

Moved from GitHub dgraph/3518

Posted by barum:

Experience Report

Note: Feature requests are judged based on user experience and modeled on Go Experience Reports. These reports should focus on the problems: they should not focus on and need not propose solutions.

What you wanted to do

There should be a way to query the schema and have it return specific graphs and their predicts

example:
if I create the following:

W.X.Y.Z.Name: string .
W.X.Y.Z.phone: string .
W.X.Y.Z.email: string .

W.X.Y.A.Name: string .
W.X.Y.A.Address1: string .
W.X.Y.A.zipcode: string .

I should be able to say give me all the predicates associated with …

{
  getPred(schema: has(W.X.Y.A.*)) {
    uid
   expand(_all_) {
       expand(_all_)
    } 
  }
}

or

{
  getPred(schema: has(W.X.Y.Z.*)) {
    uid
   expand(_all_) {
       expand(_all_)
    } 
  }
}

here if we think of an enterprise:

W = group
X = application
Y = database
Z = table
predicate = column

in an org, thats how things can be separated and identified without having to install 10000 instances of dgraph.

danielmai commented :

This is a feature request to run a schema query with filtered results. This type of functionality is possible today with the filter as a post-processing step from the query response.

In the current version of Dgraph you can already accomplish this functionality by filtering the query response of a schema {} query from your app. e.g.,

  1. Run schema {} query
  2. Filter the results by prefix.

From the command line we can filter the query response using jq and get only the predicates that match a particular prefix. Similarly, this can be done with our programming language clients.

$ curl http://localhost:8080/query -d '
schema {
  predicate
}' | jq '.data.schema | .[] | select(.predicate| startswith("W.X.Y.A"))'
{
  "predicate": "W.X.Y.A.Address1"
}
{
  "predicate": "W.X.Y.A.Name"
}
{
  "predicate": "W.X.Y.A.zipcode"
}

iluminae commented :

Hi guys,
After a call with @danielmai I was pointed towards bumping this ticket for my feature request.

To explain why we would need this feature, see our issue where we needed to namespace our predicates to make them far less common to achieve the required ingestion speed. So a graph that would have 50 unique predicates with 10Million nodes becomes a graph with 1Million unique predicates. That, paired with our solution for multi-tenancy which entails namespacing everything further - means we could have 1M unique predicates per customer.

So, to achieve any inspection of available predicates, we would need to ask for all of them, then filter them to match some sort of simple regex, as @barum showed above. This would be improbable in a live-typeahead scenario. We would need to store our schema in a different database to make these types of queries, or a different place in dgraph. Either way, keeping them synced is a challenge.

Or, given a query of something like {q(schema: match(/blah.*/,8)){name}} the server could inspect the types for us, and show us just what the user would need.

manishrjain commented :

1M unique predicates is a lot. I’m not sure we have optimized the system for that many predicates (it should work, of course, but there might be bottlenecks in places).

Instead, a better thing to do here would be for us to just make the mutations concurrent even for the same predicate. The data ingestion rate should not be the decision making factor for how many predicates you need. It should be the data model – and data models should be as simple as they can.

iluminae commented :

@manishrjain - Exactly right! Obviously it would not be nearly as bad that I can only get every predicate in the system, if there are only 50 in the system. Certainly it would be nice not to highly federate the predicates when I do not want to, but we will need some form of data-isolation and concurrent mutations per predicate to make that happen.

That being said, I still think more powerful schema queries would be a good feature to get in. :slight_smile:

lgalatin commented :

Relates to Run mutations concurrently per predicate in Ludicrous mode · Issue #5403 · dgraph-io/dgraph · GitHub