SQL Server Query to find CHI-SQUARE Values (Not Working)

大憨熊 提交于 2019-11-30 21:16:44


I am trying to find the Chi-Square test from my following SQL Server Query on the sample data:

 SELECT sessionnumber, sessioncount, timespent, expected, dev, dev*dev/expected as    chi_square
 FROM (SELECT clusters.sessionnumber, clusters.sessioncount, clusters.timespent,
 (dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as expected,
 clusters.cnt-(dim1.cnt * dim2.cnt * dim3.cnt)/(dimall.cnt*dimall.cnt) as dev
 FROM clusters JOIN
 (SELECT sessionnumber, SUM(cnt) as cnt FROM clusters
 GROUP BY sessionnumber) dim1 ON clusters.sessionnumber = dim1.sessionnumber JOIN
 (SELECT sessioncount, SUM(cnt) as cnt FROM clusters
 GROUP BY sessioncount) dim2 ON clusters.sessioncount = dim2.sessioncount JOIN
 (SELECT timespent, SUM(cnt) as cnt FROM clusters
 GROUP BY timespent) dim3 ON clusters.timespent = dim3.timespent CROSS JOIN
 (SELECT SUM(cnt) as cnt FROM clusters) dimall) a

My table has this sort of sample data:

sessionnumber   sessioncount    timespent       cnt
1                  17               28          NULL
2                  22               8           NULL
3                  1                1           NULL
4                  1                1           NULL
5                  8               111          NULL
6                  8                65          NULL
7                  11               5           NULL
8                  1                1           NULL
9                  62               64          NULL
10                 6                42          NULL

The problem is that this query works fine but it gives wrong output or you can say no output at all. The output it gives my is like:

sessionnumber   sessioncount    timespent       expected    dev     chi_square
1               17              28              NULL        NULL    NUL
2               22              8               NULL        NULL    NULL
3               1               1               NULL        NULL    NULL
4               1               1               NULL        NULL    NULL
5               8               111             NULL        NULL    NULL
6               8               65              NULL        NULL    NULL
7               11              5               NULL        NULL    NULL
8               1               1               NULL        NULL    NULL
9               62              64              NULL        NULL    NULL
10              6               42              NULL        NULL    NULL

How can I get rid of this problem because I tried my best at all! Thanks in advance telling me what I' doing wrong!


In your sample data, cnt is NULL, so the results are also NULL. You can replace these NULL values with a default value (1 for example, I don't know what is the context) using ISNULL, like

SELECT sessionnumber, SUM(ISNULL(cnt, 1)) as cnt FROM clusters GROUP BY sessionnumber

