And the data is finally in “response variable, explanatory variable 1, explanatory variable 2,… explanatory variable n” format. My code (Java) is very unreadable, but it gets the job done. There were only about 18000 earnings surprise data points. But my other explanatory variables right now, momentum and volatility, require historical price data and so I had to process 10 years worth of daily price data for the 500 stocks in the S&P 500.
Analysis of data with R is up next. I wonder what the initial results look like compared to the results in the Doyle et al. paper because I only have 3 explanatory variables and I am only testing on a small subset of the stocks they tested.