问题
I am doing a fixed effect model to research on support’s effect on reducing number of injured employees. I have a dataset on company level from 2012-2020:
year | average age | total salary | total number of employees | Segment | Industry | Risk Index | support | total number of Injured employees | |
---|---|---|---|---|---|---|---|---|---|
company A ID | 2012 | 45 | 5 Million | 55 | S | IT | 1 | 0 | 1 |
company B ID | 2012 | 48 | 40M | 500 | B | Service | 3 | 0 | 20 |
Data clarification |
---|
- Industry, Segment, Risk index, support are set as factor |
- support: (0: not received support on year 2014, 1: support received on year 2014) |
*support is set to 0 for all entries <=2014, so this variable will not be dropped from the fixed effect model |
- Dependent variable: total number of injured employees |
Since the goal is to research the impact of support on number of injured employees after the support has been received, I have divided the data into 5 subsets:
- support impact for year 2016-2017=> with data from year: 2012, 2013, 2014, 2016, 2017
- support impact for year 2017-2018=> with data from year: 2012, 2013, 2014, 2017, 2018
- support impact for year 2018-2019=> with data from year: 2012, 2013, 2014, 2018, 2019
- support impact for year 2019-2020=> with data from year: 2012, 2013, 2014, 2019, 2020
For each subset, I have done Fixed Effect models from 2 aspects:
Fixed effect (Within industry, with time effect)
plm(number_of_injured_employees~salary+employee_number+avg_age+segment+industry+index+support, index=c('industry','year'),model='within', effect = "twoways", data=data)
Fixed effect (Within customer, with time effect) for each risk index
plm(number_of_injured_employees~salary+employee_number+ support+avg_age, index=c('Customer','year'),model='within', effect = "twoways", data=data_index1)
plm(number_of_injured_employees~salary+employee_number+ support+avg_age, index=c('Customer','year'),model='within', effect = "twoways", data=data_index2) …
I would like to ask:
Can I interpret the result from within industry to be the general impact of support (since within industry, within segment, within index give the same result), and within customer to be the support impact on customer level? Or what is the correct way to interpret the model?
Is it ok to divide the data into 5 subsets, so I can investigate the impact for different years after receiving the support on 2014?
I got an alternate opinion: I should make the model with only support as independent variable, as we are investigating cause effect, and the dependent variable should be modified into: number_of_injured_employees/total_number_of_employees
Fixed effect (Within customer, without time effect)
plm(number_of_injured_employees/total_number_of_employees~support, index='customer' model='within', data=data)
I still think that salary, total number of employees, avg_age etc. should be included in the model, as they are related to the dependent variable => which will affect the coefficient of support. And the dependent variable should just be the number_of_injured_employees, as I have already included employee_number in my independent variable, and also by simply dividing the employee_number, could make the DV very strange—customer with a lot of employees will end up with a very small number.
Is my thinking correct?
来源:https://stackoverflow.com/questions/65528497/what-variables-to-include-in-fixed-effect-model-panel-data