问题
I want to create a statistical process control chart for each Name in this dataframe and extract the rows that are out of control for each individual Name.
Below is the dataframe:
DATE <- as.Date(c('2016-06-18', '2016-06-19', '2016-06-20',
'2016-06-21', '2016-06-22', '2016-06-23',
'2016-06-24', '2016-06-25', '2016-06-26',
'2016-06-27', '2016-06-28', '2016-06-29',
'2016-06-30', '2016-06-18', '2016-06-19',
'2016-06-20', '2016-06-21', '2016-06-22',
'2016-06-23', '2016-06-24', '2016-06-25',
'2016-06-26', '2016-06-27', '2016-06-28',
'2016-06-29', '2016-06-30', '2016-06-18',
'2016-06-19', '2016-06-20', '2016-06-21',
'2016-06-22', '2016-06-23', '2016-06-24',
'2016-06-25', '2016-06-26', '2016-06-27',
'2016-06-28', '2016-06-29', '2016-06-30'))
Name <- c('A', 'A', 'A', 'A', 'A', 'A','A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B','B', 'B', 'B', 'B', 'B', 'B', 'B',
'C', 'C', 'C', 'C', 'C', 'C','C', 'C', 'C', 'C', 'C', 'C', 'C')
Revenue <- c(0.08, 0.03, 0.09, 2, 0.09, 0.29, 0.56, 0.23, 0.12, 0.76, 0.23, 0.45,0.32,
0.10, 0.14, 0.80, 0.3, 0.12, 0.75, 0.20, 0.09, 0.22, 0.11, 4, 0.30, 0.45,
0.19, 0.23, 0.56, 0.77, 0.9, 9, 0.38, 0.11, 0.98, 0.87, 0.09, 0.20, 0.65)
df<-data.frame(DATE, Name, Revenue)
dput(df)
structure(list(DATE = structure(c(16970, 16971, 16972, 16973,
16974, 16975, 16976, 16977, 16978, 16979, 16980, 16981, 16982,
16970, 16971, 16972, 16973, 16974, 16975, 16976, 16977, 16978,
16979, 16980, 16981, 16982, 16970, 16971, 16972, 16973, 16974,
16975, 16976, 16977, 16978, 16979, 16980, 16981, 16982), class = "Date"),
Name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), Revenue = c(0.08, 0.03, 0.09,
2, 0.09, 0.29, 0.56, 0.23, 0.12, 0.76, 0.23, 0.45, 0.32,
0.1, 0.14, 0.8, 0.3, 0.12, 0.75, 0.2, 0.09, 0.22, 0.11, 4,
0.3, 0.45, 0.19, 0.23, 0.56, 0.77, 0.9, 9, 0.38, 0.11, 0.98,
0.87, 0.09, 0.2, 0.65)), .Names = c("DATE", "Name", "Revenue"),
row.names = c(NA, -39L), class = "data.frame")
df
> df
DATE Name Revenue
1 2016-06-18 A 0.08
2 2016-06-19 A 0.03
3 2016-06-20 A 0.09
4 2016-06-21 A 2.00
5 2016-06-22 A 0.09
6 2016-06-23 A 0.29
7 2016-06-24 A 0.56
8 2016-06-25 A 0.23
9 2016-06-26 A 0.12
10 2016-06-27 A 0.76
11 2016-06-28 A 0.23
12 2016-06-29 A 0.45
13 2016-06-30 A 0.32
14 2016-06-18 B 0.10
15 2016-06-19 B 0.14
16 2016-06-20 B 0.80
17 2016-06-21 B 0.30
18 2016-06-22 B 0.12
19 2016-06-23 B 0.75
20 2016-06-24 B 0.20
21 2016-06-25 B 0.09
22 2016-06-26 B 0.22
23 2016-06-27 B 0.11
24 2016-06-28 B 4.00
25 2016-06-29 B 0.30
26 2016-06-30 B 0.45
27 2016-06-18 C 0.19
28 2016-06-19 C 0.23
29 2016-06-20 C 0.56
30 2016-06-21 C 0.77
31 2016-06-22 C 0.90
32 2016-06-23 C 9.00
33 2016-06-24 C 0.38
34 2016-06-25 C 0.11
35 2016-06-26 C 0.98
36 2016-06-27 C 0.87
37 2016-06-28 C 0.09
38 2016-06-29 C 0.20
39 2016-06-30 C 0.65
I want to write a script that will create a control chart using the qcc
package for each Name and then callout the rows that are beyond the limits for the control chart of each name.
So I know how to make a normal individual control chart for each name separately which would be this. Name: "A" for example:
library(dplyr)
library(qcc)
target <- c("A")
testing<-filter(df, Name %in% target)
library(qcc)
my.xmr.x <- qcc(testing$Revenue, type = "xbar.one", plot=TRUE)
beyond.limits(my.xmr.x)
Result:
Result is one data point beyond the limits being just for Name A:
6
My goal is to write a script that will extract the rows for all names(not just the row number like 6).
So here is what I want the output to look like in the end:
Out_of_Control_Rows_By_Name
DATE Name Revenue
6 2016-06-21 A 2.00
11 2016-06-28 B 4.00
6 2016-06-23 C 9.00
Any help would be great, thanks!
回答1:
require(pacman) # you don't have to use this; it's my package manager
p_load(dplyr,qcc)
tmp <- df[0,]
for(i in unique(df$Name)){
testing<-filter(df, Name == i)
my.xmr.x <- qcc(testing$Revenue, type = "xbar.one", plot=TRUE)
beyond.limits(my.xmr.x)
df2 <- df[df$Name == i,]
tmp <- rbind(tmp,df2[beyond.limits(my.xmr.x),])
}
DATE Name Revenue 4 2016-06-21 A 2 24 2016-06-28 B 4 32 2016-06-23 C 9
来源:https://stackoverflow.com/questions/38661660/spc-control-charts-by-group-in-r