interpretation of the output of R function bs() (B-spline basis matrix)

后端 未结 2 1818
旧巷少年郎
旧巷少年郎 2021-02-03 10:52

I often use B-splines for regression. Up to now I\'ve never needed to understand the output of bs in detail: I would just choose the model I was interested in, and

相关标签:
2条回答
  • 2021-02-03 11:18

    The matrix b

    #              1         2         3
    # [1,] 0.0000000 0.0000000 0.0000000    
    # [2,] 0.8270677 0.0000000 0.0000000    
    # [3,] 0.8198433 0.1801567 0.0000000    
    # [4,] 0.0000000 0.7286085 0.2713915    
    # [5,] 0.0000000 0.0000000 1.0000000  
    

    is actually just the matrix of the values of the three basis functions in each point of x, which should have been obvious to me since it's exactly the same interpretation as for a polynomial linear model. As a matter of fact, since the boundary knots are

    bknots <- attr(b,"Boundary.knots")
    # [1]  0.0 77.4
    

    and the internal knots are

    iknots <- attr(b,"knots")
    # 33.33333% 66.66667% 
    #  13.30000  38.83333 
    

    then the three basis functions, as shown here, are:

    knots <- c(bknots[1],iknots,bknots[2])
    y1 <- c(0,1,0,0)
    y2 <- c(0,0,1,0)
    y3 <- c(0,0,0,1)
    par(mfrow = c(1, 3))
    plot(knots, y1, type = "l", main = "basis 1: b1")
    plot(knots, y2, type = "l", main = "basis 2: b2")
    plot(knots, b3, type = "l", main = "basis 3: b3")
    

    Now, consider b[,1]

    #              1
    # [1,] 0.0000000
    # [2,] 0.8270677
    # [3,] 0.8198433
    # [4,] 0.0000000
    # [5,] 0.0000000
    

    These must be the values of b1 in x <- c(0.0, 11.0, 17.9, 49.3, 77.4). As a matter of fact, b1 is 0 in knots[1] = 0 and 1 in knots[2] = 13.3000, meaning that in x[2] (11.0) the value must be 11/13.3 = 0.8270677, as expected. Similarly, since b1 is 0 for knots[3] = 38.83333, the value in x[3] (17.9) must be (38.83333-13.3)/17.9 = 0.8198433. Since x[4], x[5] > knots[3] = 38.83333, b1 is 0 there. A similar interpretation can be given for the other two columns.

    0 讨论(0)
  • 2021-02-03 11:18

    Just a small correction to the excellent answer by @DeltaIV above (it looks like I can not comment.)

    So in b1, when he calculated b1(x[3]), it should be (38.83333-17.9)/(38.83333-13.3)=0.8198433 by linear interpolation. Everything else is perfect.

    Note b1 should look like this

    \frac{t}{13.3}I(0<=t<13.3)+\frac{38.83333-t}{38.83333-13.3}I(13.3<=t<38.83333)

    0 讨论(0)
提交回复
热议问题