My accruals anomaly final presentation

Troy Shu Accrual Presentation

This was my final presentation on the accrual anomaly.

At the very basics, remember from accounting 101 that earnings = cash flow from operations + accruals (derived from the indirect method of calculating cash flow from ops). For example, you make a sale, which goes to revenue and earnings, the left side of the equation. For the RHS: that sale could’ve been paid in cash, in which it counts CFO; if it was made on account, it counts as an accounts receivable, an accrual. Essentially, accruals are a measure of earnings quality.

Intuitively, an investor would want to invest in a company with relatively low accruals, which means that the company generates lots of cash to pay expenses, to use to invest, etc. There are countless papers that have shown that trading a portfolio that goes long the stocks with the lowest accruals and shorts the stocks with the highest accruals is profitable, even after size and value adjusting returns.

Rough outline of presentation:

  • What is an accrual: earnings = CFO + accrual
  • How it’s measured: usually signal = accrual/average assets
  • Three papers: trading a hedged portfolio based on accruals produces roughly 10% size and value adjusted annual returns.
  • Improvement: percent accruals (signal=accrual/abs(earnings)) seems to be a more profitable measure of accruals
  • My replication: despite data and time constraints, ordinality of returns by deciles is still present
  • Interesting topics for future research: the death of the accrual anomaly (or what happened to performance post 2007?), scaling quarterly accruals by earnings instead of assets

Upcoming posts

Back at school. Upcoming posts: more turn of month effect, accruals, and recent performance of ETFRot. What I’ve learned: Cliff Asness of AQR, and I’m sure many others, are right, “diversification is the only free lunch in finance”. Diversification across asset classes, and across time as well? More to come…

Socially generated financial data the next big thing?



A ton of articles/papers released recently on this topic:

Social media seems like the best way to gauge sentiment without actually asking the person. Bloomberg has already included Twitter feeds on their platform, Yahoo Finance has StockTwits feeds integrated on their stock pages. StockTwits hasn’t even released their API yet. With the rise of social media, will we see the rise of socially generated financial data?

All the above algorithms that trade Twitter sentiment are very short term. This could be due to the fleeting nature of Twitter posts, thus relevant financial tweets are dominated by those from short term traders with short term outlooks. Social media gauges short term sentiment, but its hard to tell whether it can gauge what I call longer term “behavioral biases” such as price momentum (“let’s jump on the bandwagon”) and post earnings announcement drift (people being slow to realize and price in the better prospects suggested by a positive earnings surprise).

The first of month effect exists?

Many have noticed a predictable “first of month” effect in equities; the hypothesis is that the beginning of the month is when funds buy the shares of last month’s top companies, thus pushing up prices.


Black is SPY, red is DIA, and green is QQQQ (now apparently just QQQ?). I use those three ETFs to represent the stocks in the S&P 500, DJIA, and NASDAQ 100. There does seem to be a strong avg daily return on the first of the month across these indices (day 0 is the last day of last month, and the return on day 1 is calculated day1close/day0close-1). That spike in avg returns on the 3rd to last day of the month in the QQQ is interesting… The avg daily returns were calculated from historical data since inception for each ETF.

Of course the chart above says relatively little: we’d need to see if the effect persists across stocks and ETFs, and how strong/weak it is given certain characteristics (e.g. marketcap, price momentum, etc.). Though even on the three largecap index ETFs, the effect seems to persist month after month (stddev would tell us more…), which suggests the possibility of a high probability, very low exposure trading system based on this first of month effect.

R code TTR library’s “getYahooData” is a life saver: all the price data downloading and processing is self-contained in that single R script.

larger earnings surprises associated with higher returns

Surprisingly I find myself working on my PEAD research more during finals week… Below is the component residual plot for the current model:

stock's return in the next 30 days ~ last 30 days price change + last 30 days historical volatility + earnings surprise


There’s an apparent association b/w larger earnings surprises and higher returns. Looks like I have some transforming to do on ROC30 and or Volatility30 and possibly some high leverage points to cut. The normal Q-Q plot looks terrible too (not displayed here), so it also looks like I need to use robust SEs or bootstrap to estimate a confidence interval for the Surprise variable’s regression coefficient.

400 lines of code and 1.6 million data points later


And the data is finally in “response variable, explanatory variable 1, explanatory variable 2,… explanatory variable n” format. My code (Java) is very unreadable, but it gets the job done. There were only about 18000 earnings surprise data points. But my other explanatory variables right now, momentum and volatility, require historical price data and so I had to process 10 years worth of daily price data for the 500 stocks in the S&P 500. 

Analysis of data with R is up next. I wonder what the initial results look like compared to the results in the Doyle et al. paper because I only have 3 explanatory variables and I am only testing on a small subset of the stocks they tested.

starting on PEAD (post earnings announcement drift) analysis


I decided to start on my PEAD project, which is to semi-replicate the analysis in the research paper “The Extreme Future Stock Returns Following I/B/E/S Earnings Surprises”. One of the things the researchers do in that paper is construct a multiple regression of 1 year, 2 year, and 3 year returns on earnings surprise %, beta, market cap, momentum, accruals, and a few other explanatory variables. They find that the coefficients for earnings surprise % and accruals are the most significant, followed by market cap, beta, and momentum I believe.

The plan is to construct a multiple regression similar to theirs: for now it is a regression of future intermediate term returns (1 month? 6 months? 1 year? haven’t decided yet) on surprise, historical volatility, and momentum. Earnings estimates will be obtained from IBES, price data from CRSP (I am grateful to be a Wharton student…). Replicating the research done in a research paper really forces you to understand it and actually allows you to see areas for improvement…

It’s funny how the stars kind of aligned on this one. For my statistics class we have to do a final project using something we’ve learned: multiple regression is a big one. I’m running a small fund with a few friends; one of them introduced me to PEAD and wanted to learn more about it as a potential strategy. I proceed to read research on it and find out that in one paper the researchers use a multiple regression model. So now I’m doing this PEAD exploration as a stat class final project, as preliminary research to a potential investment strategy, and as a way to learn and practice R. Talk about killing multiple birds with one stone.

This is way over my self-imposed word count…

performance of the ETFRot strategy

ETFRot is an ETF rotation strategy that I’ve been working on for a while now (almost a year). Essentially it uses a couple of momentum and volatility indicators to rank ETFs in a basket spanning across asset classes, and then trades the top ranked ETF. When the top ranked ETF changes, it “rotates” into the the new one. Simple logic, simple to trade.

To test the influence of data mining bias, I ran a walk forward optimization, optimizing the parameters on in sample data up to X and then using those optimized parameters to trade and evaluate the strategy in X+1. Repeat, now including X+1 in the in-sample and testing out of sample on X+2. If data mining bias is rampant, we would expect performance of the optimized parameters to be poor out of sample: they would have little predictive power. After doing a walk forward optimization of ETFRot, the parameters seem to be intrinsically predictive rather than just curve-fit:


It starts in 2005 because I used 2003-2005 (before 2003 some ETFs in my basket didn’t exist) as my first in-sample sample. The second chart compares ETFRot performance with the SPY. It seems to have done very well since 2008, which is when the market tanked. Maybe ETFRot is capturing a market regime shift that happened in 2008…