Quantitative Equity Investing

About the Data and Methodology

Data Sources and Methodology

There are several sources of data that are used to create the data series for this site, all of which are freely available on the Internet. Results are calculated and updated once daily (usually between 4am - 5am, Eastern Standard/Daylight Time), in a completely automated process, as follows:

  1. A list of tickers is downloaded from www.marketwatch.com. These tickers represent all stocks trading on the NYSE, NASDAQ and AMEX exchanges.
  2. Historical prices for each ticker (adjusted for stock splits and dividends) are downloaded from Yahoo Finance (technically, to save bandwidth, only the latest day's price is downloaded and appended unless there was a stock split or dividend that day).
  3. Fundamental data metrics for each ticker are downloaded from Yahoo Finance.
  4. Momentum is calculated for each ticker on a 6, 9 and 12 month basis. Note that unlike most of the academic literature on momentum, I do not exclude the latest month when calculating momentum. This is something that I may change in the future, though my own research indicates that it makes more sense to exclude (due to reversals) only about the latest 2 weeks of returns and not a full month.
  5. A single data file is created which combines each ticker with its corresponding momentum statistics and relevant fundamental data. It is this file that, if this were a real fund, would be used to create the portfolios for the following trading day. Note that we are implicitly rebalancing the portfolios daily.
  6. The previous day's data file is updated to include the calculated one-day returns for each ticker based on the most recent day's closing prices.
  7. The previous day's data file (which now includes one-day returns) are sorted and ranked for each Value, Momentum or Quality factor or combination of factors and separated into deciles and fixed number of stocks. Finally, portfolio returns are calculated for each decile (and fixed number of stocks) on both an equal weighted and market capitalization weighted basis. Note that only stocks trading at more than $5.00 with a market capitalization greater than $100 million are included. In addition, outliers are eliminated for the value and quality factors based on certain thresholds.
  8. For the portfolio results that are based on a combination of two factors (i.e. a value and a momentum factor), tickers are sorted and ranked for each of the two factors and a simple average of the ranking of the two factors is used to create the various data series.
  9. Rinse and repeat daily (or at least every trading day).

Note that the data went "live" at the beginning of April 2010 (April 1, 2010 for momentum and April 9, 2010 for value and quality). The data prior to those dates (which begin February 1, 2006) are essentially back-tested using the same historical price data downloaded from Yahoo and using fundamental data calculated from historical financial statements downloaded from moneycentral.msn.com. The one caveat to the back-tested data is that, to my knowledge, there exists no freely available data source for either stock prices or fundamental data of tickers that no longer trade. In other words, all of the back-tested data is for stocks that currently trade. This may introduce some survivorship bias to the results, though the significance and direction of this bias is unknown to me (though my hunch is that the bias is not very significant). Note also that there are no back-tested results for the Forward P/E, PEG and Rev. Growth factors.

As I discussed in the About the Site section, I had previously back-tested similar strategies using the significantly higher quality CRSP/Compustat datasets. I have not felt the need to include those results here on this site, since I believe that whatever value this site offers, comes from the daily updated "live" portion of the data.

Caveat

The results exclude trading costs (likely significant, especially since we rebalance daily) and we calculate all returns based on each stocks's daily closing price. The quality of the freely available data used on this site is certainly questionable and no attempt has been made to clean or verify the accuracy of the data, other than excluding some outliers.

The Factors

Value Factors

Quality Factors

Momentum Factors

Technical Stuff

All of the code written to download the necessary data (list of tickers, historical stock prices and fundamental data) and to sort, rank and calculate the portfolio results was created in PHP. The data is then saved in a series of CSV files. I also use a MySQL database to help keep track of all of the tickers (and to record when a ticker is added or removed from the active ticker list) and to keep a log of each day's processes. The interactive charting tool integrated with this site (which I happen to think is pretty cool) is the free version of a product by amCharts.