Standard Deviation

Since I didn’t find any dataset samples, I decided to create one based on this wikihow. How to Calculate Standard Deviation: 12 Steps (with Pictures) See the dataset at the end of this comment.

Side comment

If you have time, read this comment Bug with multi-level (nested bocks) in Upsert Block · Issue #4779 · dgraph-io/dgraph · GitHub it is the same issue, but in a different situation.

Also, Custom Block - Discussion would help and maybe Feature: Add foreach() function. · Issue #5335 · dgraph-io/dgraph · GitHub. Cuz I think it would be much harder to fix this changing the design of the query system.

My query

{
  var(func: eq(dgraph.type, "Track")){
    a as count(uid)
    c as count(length)
  }
  f(){
    n: n as sum(val(a))
    sum : S as sum(val(c))
    mean : mean as math(S / n)
    #result : math(c - mean) #ignore this
    x : math(pow(n - mean, 2))
    variance: v as math(S/n) #is it right? this is the same as mean
    stdev: math(sqrt(v))
  }
}

Results

{
  "data": {
    "f": [
      {
        "n": 6
      },
      {
        "sum": 48
      },
      {
        "mean": 8
      },
      {
        "x": 4
      },
      {
        "variance": 8
      },
      {
        "stdev": 2.828427
      }
    ]
  }
}

About the Results

I managed to find N and Sum to find the mean. That is, the results “n”, “sum” and “mean” are correct. However, I found problems to continue. BTW, “x”, “variance” and “stdev” are wrong.

To continue this calculation, I would need to apply the “mean” against the variable “a” (which is ‘n’) in a subtraction for each value (as you can see on the wikihow - in part 2 “Finding the Variance In Your Sample” ). But it is not possible to do this in an empty block. It was not designed for this. So the idea would be to create a new block with repeating the var block query. And apply the subtraction.

e.g: (this won’t work)

{
  T as var(func: eq(dgraph.type, "Track")){
    a as count(uid)
    c as count(length)
  }
  f(){
    n: n as sum(val(a))
    sum : S as sum(val(c))
    mean : mean as math(S / n)
  }
  q(func: uid(T)){ #That was the idea
    c2 as count(length)
    test : math(c2 - mean) #but mean var is empty.
  }
}

But there is another problem. The empty/aggregation block will not pass the value of the variables according to the “map” of previous queries. That is, this is not possible to do.

e.g:

{
  T as var(func: eq(dgraph.type, "Track")){
    a as count(uid)
    c as count(length)
  }
  f(){
    n: n as sum(val(a))
    sum : S as sum(val(c))
    mean : math(S / n)
  }
  q(func: uid(T)){
    c2 as count(length)
    test : math(c2 - 8) # "8" Thats the value that should come from f() block
  }
}

This example bellow is the desirable result

{
  T as var(func: eq(dgraph.type, "Track")){
    a as count(uid)
    c as count(length)
  }
  f(){
    n: n as sum(val(a))
    sum : S as sum(val(c))
    mean : math(S / n)
  }
  q(func: uid(T)){
    c2 as count(length)
    fv : fv as  math(c2 - 8) # "8" Thats the value that should come from f() block
    SquareAll : SA as  math(pow(fv, 2))
  }
  FINAL(){
    SquareAll_sum : FA as sum(val(SA))
    VARIANCE : VA as math(FA / (n - 1))
    Standard_Deviation : math(sqrt(VA))
  }
}

Desireble Result

{
  "data": {
    "f": [
      {
        "n": 6
      },
      {
        "sum": 48
      },
      {
        "mean": 8
      }
    ],
    "q": [
      {
        "count(length)": 8,
        "fv": 0,
        "SquareAll": 0
      },
      {
        "count(length)": 10,
        "fv": 2,
        "SquareAll": 4
      },
      {
        "count(length)": 8,
        "fv": 0,
        "SquareAll": 0
      },
      {
        "count(length)": 4,
        "fv": -4,
        "SquareAll": 16
      },
      {
        "count(length)": 8,
        "fv": 0,
        "SquareAll": 0
      },
      {
        "count(length)": 10,
        "fv": 2,
        "SquareAll": 4
      }
    ],
    "FINAL": [
      {
        "SquareAll_sum": 24
      },
      {
        "VARIANCE": 4.8
      },
      {
        "Standard_Deviation": 2.19089
      }
    ]
  }
}

As you can see, the end result is perfect if we get the value right.

Dataset

{
   "set": [
      {
         "dgraph.type": "Track",
         "length": [
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            }
         ]
      },
      {
         "dgraph.type": "Track",
         "length": [
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            }
         ]
      },
      {
         "dgraph.type": "Track",
         "length": [
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            }
         ]
      },
      {
         "dgraph.type": "Track",
         "length": [
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            }
         ]
      },
      {
         "dgraph.type": "Track",
         "length": [
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            }
         ]
      },
      {
         "dgraph.type": "Track",
         "length": [
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            },
            {
               "dgraph.type": "any"
            }
         ]
      }
   ]
}