Format model display in texreg or stargazer R as scientific

后端 未结 3 951
感动是毒
感动是毒 2021-02-08 18:59

I just ran a statisitical model and i want it to display the results of the model as a table using stargazer. However, the large numbers are displayed in full.

         


        
相关标签:
3条回答
  • 2021-02-08 19:36

    The problem is not that these packages cannot display scientific notation. The problem is rather that your independent variables are on an extremely small scale. You should rescale them before you use them in your model by multiplying the values by some constant. For example, when you deal with the size of persons in kilometers, you may want to rescale them to meters or centimeters. This would make the table much easier to read than displaying the results in scientific notation.

    Consider the following example:

    a <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
    b <- c(0.00020, 0.00024, 0.00024, 0.00026, 0.00021, 0.00022, 0.00023, 
        0.00022, 0.00023, 0.00022)
    model.1 <- lm(a ~ b)
    

    Next, create your table with texreg:

    library("texreg")
    screenreg(model.1)
    

    This yields the following table:

    =========================
                 Model 1     
    -------------------------
    (Intercept)     -2.27 *  
                    (0.94)   
    b            32168.58 ***
                 (4147.00)   
    -------------------------
    R^2              0.88    
    Adj. R^2         0.87    
    Num. obs.       10       
    =========================
    *** p < 0.001, ** p < 0.01, * p < 0.05
    

    So the coefficients are pretty large. Let's try the same thing with stargazer:

    library("stargazer")
    stargazer(model.1, type = "text")
    

    The resulting table:

    ===============================================
                            Dependent variable:    
                        ---------------------------
                                     a             
    -----------------------------------------------
    b                          32,168.580***       
                                (4,146.999)        
    
    Constant                     -2.270**          
                                  (0.944)          
    
    -----------------------------------------------
    Observations                    10             
    R2                             0.883           
    Adjusted R2                    0.868           
    Residual Std. Error       0.212 (df = 8)       
    F Statistic            60.172*** (df = 1; 8)   
    ===============================================
    Note:               *p<0.1; **p<0.05; ***p<0.01
    

    Same problem: large coefficients. Now rescale your original variable b and recompute the model:

    b <- b * 10000
    model.2 <- lm(a ~ b)
    

    Try it again with texreg:

    screenreg(model.2)
    
    ======================
                 Model 1  
    ----------------------
    (Intercept)  -2.27 *  
                 (0.94)   
    b             3.22 ***
                 (0.41)   
    ----------------------
    R^2           0.88    
    Adj. R^2      0.87    
    Num. obs.    10       
    ======================
    *** p < 0.001, ** p < 0.01, * p < 0.05
    

    And with stargazer:

    stargazer(model.2, type = "text")
    
    ===============================================
                            Dependent variable:    
                        ---------------------------
                                     a             
    -----------------------------------------------
    b                            3.217***          
                                  (0.415)          
    
    Constant                     -2.270**          
                                  (0.944)          
    
    -----------------------------------------------
    Observations                    10             
    R2                             0.883           
    Adjusted R2                    0.868           
    Residual Std. Error       0.212 (df = 8)       
    F Statistic            60.172*** (df = 1; 8)   
    ===============================================
    Note:               *p<0.1; **p<0.05; ***p<0.01
    

    Now the coefficients look nicer and you do not need scientific notation.

    0 讨论(0)
  • 2021-02-08 19:40

    To do this, you can write your own function to take the large numbers and put them into scientific notation.

    First, load the stargazer package:

    library(stargazer)
    

    Then, create data with large numbers for the example:

    set.seed(1)
    
    C <- data.frame("A" = rnorm(10000, 30000, 10000),
                    "B" = rnorm(10000, 7500, 2500))
    

    Fit the model and store the stargazer results table in an object:

    fit2 <- lm(A ~ B, data = C) 
    
    myResults <- stargazer(fit2, type = "text")
    

    Create a function to take a stargazer table and convert large numbers into scientific notation. (This is not very flexible but can be with simple modifications. Right now only works for 1,000 - 99,999)

    fixNumbers <- function(stargazer.object){
    
      so <- stargazer.object
      rows <- grep(".*[\\d+],[\\d+].*", so, perl = T)
      for(row in rows){
    
        # Get number and format into scientific notation
        number <- as.numeric(sub(".*([0-9]{1,2}),([0-9]+\\.?[0-9]*).*", "\\1\\2", so[row], perl = T))
        formatted_num <- sprintf("%.2e", number)
        so[row] <- sub("(.*)[0-9]{1,2},[0-9]+\\.?[0-9]*(.*)", paste0("\\1", formatted_num, "\\2"), so[row], perl = T)
      }
    
      # Print result
      for(i in 1:length(so)){
        cat(so[i], "\n")
      }
    }
    

    Give the new function (fixNumbers) your stargazer object:

    fixNumbers(myResults)
    

    -- Here's all the code in one chunk: --

    library(stargazer)
    
    set.seed(1)
    
    C <- data.frame("A" = rnorm(10000, 30000, 10000),
                    "B" = rnorm(10000, 7500, 2500))
    
    fit2 <- lm(A ~ B, data = C) 
    
    myResults <- stargazer(fit2, type = "text")
    
    fixNumbers <- function(stargazer.object){
    
      so <- stargazer.object
      rows <- grep(".*[\\d+],[\\d+].*", so, perl = T)
      for(row in rows){
    
        # Get number and format into scientific notation
        number <- as.numeric(sub(".*([0-9]{1,2}),([0-9]+\\.?[0-9]*).*", "\\1\\2", so[row], perl = T))
        formatted_num <- sprintf("%.2e", number)
        so[row] <- sub("(.*)[0-9]{1,2},[0-9]+\\.?[0-9]*(.*)", paste0("\\1", formatted_num, "\\2"), so[row], perl = T)
      }
    
      # Print result
      for(i in 1:length(so)){
        cat(so[i], "\n")
      }
    }
    
    fixNumbers(myResults)
    
    0 讨论(0)
  • 2021-02-08 19:51

    Following Adam K idea, but with a bit more of optimized regex (and making use of vectorisation, which is good idea in R):

    fit2<-lm(CO~NO2,data=df)
    test <- stargazer(fit2,type="text",)
    

    It is a two line regex: you need to find the number (here of more than five numbers), that are string with number, comma and points

    m <- gregexpr("([0-9\\.,]{5,})", test)
    

    you need to apply a transformation function to that (here supress the comma, make a number, and display it in scientific with 2 digits. You can consider also the formatC which gives a lot of possibility):

    f = function(x){
      sprintf("%.2e",as.numeric( gsub(",","",x)))
    }
    

    and you apply it to your regex using the regmatches function

    regmatches(test, m) <- lapply(regmatches(test, m), f)
    test
    
    
     [1] ""                                                           
     [2] "========================================================"   
     [3] "                            Dependent variable:         "   
     [4] "                    ------------------------------------"   
     [5] "                                     CO                 "   
     [6] "--------------------------------------------------------"   
     [7] "NO2                              6.26e+02**              "  
     [8] "                                 (2.41e+02)              "  
     [9] "                                                        "   
    [10] "Constant              1.81e+18***  "                        
    [11] "                       (4.62e+17)    "                      
    [12] "                                                        "   
    [13] "--------------------------------------------------------"   
    [14] "Observations                         10                 "   
    [15] "R2                                 4.58e-01                "
    [16] "Adjusted R2                        3.90e-01                "
    [17] "Residual Std. Error 1.57e+17 (df = 8)"                      
    [18] "F Statistic                 6.76e+00** (df = 1; 8)         "
    [19] "========================================================"   
    [20] "Note:                        *p<0.1; **p<0.05; ***p<0.01"   
    

    To otbain the same output as the original:

    print(as.data.frame(test),quote = F,row.names = FALSE)
    
    
    
                                                           test
    
        ========================================================
                                    Dependent variable:         
                            ------------------------------------
                                             CO                 
        --------------------------------------------------------
       NO2                              6.26e+02**              
                                        (2.41e+02)              
    
                             Constant              1.81e+18***  
                                                  (4.62e+17)    
    
        --------------------------------------------------------
        Observations                         10                 
     R2                                 4.58e-01                
     Adjusted R2                        3.90e-01                
                           Residual Std. Error 1.57e+17 (df = 8)
     F Statistic                 6.76e+00** (df = 1; 8)         
        ========================================================
        Note:                        *p<0.1; **p<0.05; ***p<0.01
    

    the data:

    df <- read.table(text  = "
    CO NO2 SM
     2.750000e+18 1.985136e+15 0.2187433
     2.980000e+18 2.144211e+15 0.1855678
     2.810000e+18 1.586491e+15 0.1764805
     3.010000e+18 1.755409e+15 0.2307153
     3.370000e+18 2.205888e+15 0.2046671
     3.140000e+18 2.084682e+15 0.1834232
     2.940000e+18 1.824735e+15 0.1837391
     3.200000e+18 2.075785e+15 0.1350665
     3.060000e+18 1.786481e+15 0.1179924
     2.750000e+18 1.645800e+15 0.2037340",header = T)
    
    0 讨论(0)
提交回复
热议问题