Paper Replication Petersen 2009

Quickstart

The goal of this exercise is to replicate some of the results in Petersen (2009) using the R programming language.

# load required packages
library(lfe)

Estimating Standard Errors with a Firm Effect OLS and Rogers Standard Errors

In Petersen (2009), the author simulated a panel data set and then estimated the slope coefficient and its standard error. By doing this multiple times we can observe the true standard error as well as the average estimated standard errors. In the first version of the simulation, the author includes a fixed firm effect but no time effect in both the independent variable as well as in the residual. Across simulations it is assumed that the standard deviation of the independent variable and the residual were both constant at one and two respectively. This will produce an R2 of 20 percent which is not unusual for empirical finance regressions. Across different simulations, the fraction of the variance in the independent variable which is due to the firm effect is altered. This fraction ranged from zero to seventy-five percent in twenty-five percent increments. The same is done for the residual. This allows to demonstrate how the magnitude of the bias in the OLS standard errors varies with the strength of the firm effect in both the independent variable and the residual. The R function to run the simulation is provided below.

simulate <- 
  function(n.year = 10,   # number of years in the panel
           n.firms = 500, # number of firms in the panel
           n.iter = 5000, # number of iterations to run the simulation
           sd.x = 1,      # standard deviation of the independent variable
           sd.r = 2,      # standard deviation of the error term
           firm_x = 0,    # % of independent variable variance which is due to firm fixed effect 
           firm_r = 0     # % of residual variance which is due to firm fixed effect 
  ) {
  
  # set RNG seed t 
  set.seed(123)
  
    # total observation in the panel  
  n.obs <- n.year*n.firms
  
  # run simulatins 
  # store the coefficient estimate, its standard deviation and the cluster robust standard deviation  
  b <- sapply(1:n.iter, function(i){
    
    # replicate firm ID for each year
    firm_id <- rep(1:n.firms, each = n.year)
    
    # standardized regression variable
    x <- 
      rnorm(n.obs, mean = 0, sd = sqrt(1-firm_x)) +                     # non-fixed firm effect 
      rep( rnorm(n.firms, mean = 0, sd = sqrt(firm_x)), each = n.year ) # fixed firm effect 
    # standardized error term
    
    r <- 
      rnorm(n.obs, mean = 0, sd = sqrt(1-firm_r)) +                     # non-fixed firm effect 
      rep( rnorm(n.firms, mean = 0, sd = sqrt(firm_r)), each = n.year ) # fixed firm effect 
    
    # scale regressor by its standard deviation
    x <- sd.x * x
    
    # scale error term by its standard deviation
    r <- sd.r * r
    
    # response variable
    y <- 1*x + r
    
    # standard OLS
    m1 <- felm(y ~ x) 
    
    # OLS cluster by firm
    m2 <- felm(y ~ x | 0 | 0 | firm_id) 
    
    # store and return coefficients
    c1 <- coef(summary(m1))['x', ]
    c2 <- coef(summary(m2))['x', ]
    
    # return
    return( c(c1['Estimate'], c1['Std. Error'], c2['Cluster s.e.']) )
    
  })
  
  # store and return average coefficients
  res <- c(apply(b, 1, mean), 'Sample Std. Error' = sd(b['Estimate',]))
  
  # return
  return(res)
  
}

We can now easily reproduce Petersen’s results. The following example simulate the model where 25% of the independent variable’s variance and 50% of the residual variance is due to a firm specific effect.

simulate(firm_x = 0.25, firm_r = 0.50)

##          Estimate        Std. Error      Cluster s.e. Sample Std. Error 
##        0.99900551        0.02826734        0.04111877        0.04151040

Running simulations for several combination of independent variable’s variance and residual variance due to a firm specific effect produces the following results, in accordance with the original paper.

	Source of Independent Variable Volatility
Estimating Standard Errors with a Firm Effect OLS and Rogers Standard Errors. The magnitude of the bias in the OLS standard errors varies with the strength of the firm effect in both the independent variable and the residual.
Source of Residual Volatility	0%	25%	50%	75%
0%
Avg(B)	1.000	1.000	1.000	1.000
Std(B)	0.028	0.029	0.029	0.029
Avg(SE.OLS)	0.028	0.028	0.028	0.028
Avg(SE.R)	0.028	0.028	0.028	0.028
25%
Avg(B)	0.999	0.999	0.999	0.999
Std(B)	0.028	0.036	0.042	0.047
Avg(SE.OLS)	0.028	0.028	0.028	0.028
Avg(SE.R)	0.028	0.035	0.041	0.046
50%
Avg(B)	0.999	0.999	0.999	0.999
Std(B)	0.028	0.042	0.051	0.060
Avg(SE.OLS)	0.028	0.028	0.028	0.028
Avg(SE.R)	0.028	0.041	0.051	0.059
75%
Avg(B)	1.000	0.999	0.999	0.999
Std(B)	0.029	0.047	0.060	0.070
Avg(SE.OLS)	0.028	0.028	0.028	0.028
Avg(SE.R)	0.028	0.046	0.059	0.069
† The table contains estimates of the coefficient and standard errors based on 5000 simulation of a panel data set (10 years per firm and 500 firms). The true slope coefficient is 1, the standard deviation of the independent variable is 1 and the standard deviation of the error term is 2. The fraction of the residual variance which is due to a firm specific component is varied across the rows of the table and varies from 0% (no firm effect) to 75%. The fraction of the independent variable’s variance which is due to a firm specific component also varies across the columns of the table and varies from 0% (no firm effect) to 75%. Each cell contains the average slope coefficient estimated by OLS and the standard deviation of this estimate. This is the true standard error of the estimated coefficient. The third entry is the OLS estimated standard error of the coefficient. The fourth entry is Rogers’ (clustered) standard error which accounts for possible clustering at the firm level (i.e. accounts for the possible correlation between observations of the same firm in different years).