Pagination with `offset` does not scale with var blocks

Moved from GitHub dgraph/5808

Posted by EnricoMi:

What version of Dgraph are you using?

20.03.3

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

Unrelated

Steps to reproduce the issue (command/config used to run Dgraph).

I have a predicate for 40m uids, one triple per uid (if it exists). When I read the first 1k uids that have this predicate as:

{
  pred as var(func: has(<http://www.w3.org/2000/01/rdf-schema#label>))

  result (func: uid(pred), first: 1000, offset: 0) {
    uid
    <http://www.w3.org/2000/01/rdf-schema#label>
  }
}

the query takes minutes. When I read them as:

{
  result (func: has(<http://www.w3.org/2000/01/rdf-schema#label>), first: 1000, offset: 0) {
    uid
    <http://www.w3.org/2000/01/rdf-schema#label>
  }
}

the query takes milliseconds.

My impression from the timings is that the pred as var evaluates the entire result list (hence the constant high query time that depends on the size of the result set) and result (func: uid(…)) then picks only the first 1000 results. Either result (func: uid(…)) should push a limit into the evaluation of pred as var, or pred as var implements a better evaluation that allows result (func: uid(…)) to seek and iterate.

If I modify the first query as:

{
  pred as var(func: has(<http://www.w3.org/2000/01/rdf-schema#label>), first: 1000)

  result (func: uid(pred), first: 1000, offset: 0) {
    uid
    <http://www.w3.org/2000/01/rdf-schema#label>
  }
}

I get my result in milliseconds.

With pred as var(func: has(<http://www.w3.org/2000/01/rdf-schema#label>), first: {limit + offset}) this query scales with the offset as result (func: uid(…)) does (see #5807). For more details see this forum comment.

The motivation why I am using pred as var here is that I want to read multiple predicates this way (see this forum comment).

Expected behaviour and actual result.

Both queries should be equally fast.