Monday, May 22, 2017

Big Data in Econometric Modeling

Here's a speakers' photo from last week's Penn conference, Big Data in Dynamic Predictive Econometric Modeling.  Click through to find the program, copies of papers and slides, a participant list, and a few more photos.  A good and productive time was had by all!


Monday, May 15, 2017

Statistics in the Computer Age

Efron and Tibshirani's Computer Age Statistical Inference (CASI) is about as good as it gets. Just read it. (Yes, I generally gush about most work in the Efron, Hastie, Tibshirani, Brieman, Friedman, et al. tradition.  But there's good reason for that.)  As with the earlier Hastie-Tibshirani Springer-published blockbusters (e.g., here), the CASI publisher (Cambridge) has allowed ungated posting of the pdf (here).  Hats off to Efron, Tibshirani, Springer, and Cambridge.

Monday, May 8, 2017

Replicating Anomalies

I blogged a few weeks ago on "the file drawer problem".  In that vein, check out the interesting new paper below. I like their term "p-hacking". 

Random thought 1:  
Note that reverse p-hacking can also occur, when an author wants low p-values.  In the study below, for example, the deck could be stacked with all sorts of dubious/spurious "anomaly variables" that no one ever took seriously.  Then of course a very large number would wind up with low p-values.  I am not suggesting that the study below is guilty of this; rather, I simply had never thought about reverse p-hacking before, and this paper led me to think of the possibility, so I'm relaying the thought.

Related random thought 2:  
It would be interesting to compare anomalies published in "top journals" and "non-top journals" to see whether the top journals are more guilty or less guilty of p-hacking.  I can think of competing factors that could tip it either way!

Replicating Anomalies
by Kewei Hou, Chen Xue, Lu Zhang - NBER Working Paper #23394
Abstract:
The anomalies literature is infested with widespread p-hacking. We replicate the entire anomalies literature in finance and accounting by compiling a largest-to-date data library that contains 447 anomaly variables. With microcaps alleviated via New York Stock Exchange breakpoints and value-weighted returns, 286 anomalies (64%) including 95 out of 102 liquidity variables (93%) are insignificant at the conventional 5% level. Imposing the cutoff t-value of three raises the number of insignificance to 380 (85%). Even for the 161 significant anomalies, their magnitudes are often much lower than originally reported. Out of the 161, the q-factor model leaves 115 alphas insignificant (150 with t < 3). In all, capital markets are more efficient than previously recognized.  


Thursday, May 4, 2017

Network Tools for Understanding High-Dimensional Dynamic Models

The slides from my "overview" IMF talk two weeks ago proved popular, so here are some different overview slides on a different topic ("Estimating and Understanding High-Dimensional Dynamic Stochastic Econometric Models"), from my talk at last week's NYU Stern Conference on Volatility and Derivatives.

Sunday, April 30, 2017

One Millionth Birthday...

Image result for 1 year birthday cake
 ...in event time.  It's true, yesterday No Hesitations passed 1,000,000 page views.  Totally humbling.  I am grateful for your interest and support.

Thursday, April 20, 2017

Automated Time-Series Forecasting at Google

Check out this piece on automated time-series forecasting at Google.  It's a fun and quick read. Several aspects are noteworthy.  

On the upside:

-- Forecast combination features prominently -- they combine forecasts from an ensemble of models.  

-- Uncertainty is acknowledged -- they produce interval forecasts, not just point forecasts.

On the downside:

-- There's little to their approach that wasn't well known and widely used in econometrics a quarter century ago (or more).  Might not something like Autobox, which has been around and evolving since the 1970's, do as well or better?

Friday, April 14, 2017

On Pseudo Out-of-Sample Model Selection

Great to see that Hirano and Wright (HW), "Forecasting with Model Uncertainty", finally came out in Econometrica. (Ungated working paper version here.)

HW make two key contributions. First, they characterize rigorously the source of the inefficiency in forecast model selection by pseudo out-of-sample methods (expanding-sample, split-sample, ...), adding invaluable precision to more intuitive discussions like Diebold (2015). (Ungated working paper version here.) Second, and very constructively, they show that certain simulation-based estimators (including bagging) can considerably reduce, if not completely eliminate, the inefficiency.


Abstract: We consider forecasting with uncertainty about the choice of predictor variables. The researcher wants to select a model, estimate the parameters, and use the parameter estimates for forecasting. We investigate the distributional properties of a number of different schemes for model choice and parameter estimation, including: in‐sample model selection using the Akaike information criterion; out‐of‐sample model selection; and splitting the data into subsamples for model selection and parameter estimation. Using a weak‐predictor local asymptotic scheme, we provide a representation result that facilitates comparison of the distributional properties of the procedures and their associated forecast risks. This representation isolates the source of inefficiency in some of these procedures. We develop a simulation procedure that improves the accuracy of the out‐of‐sample and split‐sample methods uniformly over the local parameter space. We also examine how bootstrap aggregation (bagging) affects the local asymptotic risk of the estimators and their associated forecasts. Numerically, we find that for many values of the local parameter, the out‐of‐sample and split‐sample schemes perform poorly if implemented in the conventional way. But they perform well, if implemented in conjunction with our risk‐reduction method or bagging.

Monday, April 10, 2017

BIg Data, Machine Learning, and the Macroeconomy

Coming soon at Bank of Norway:

CALL FOR PAPERS 
Big data, machine learning and the macroeconomy 
Norges Bank, Oslo, 2-3 October 2017 

Data, in both structured and unstructured form, are becoming easily available on an ever increasing scale. To find patterns and make predictions using such big data, machine learning techniques have proven to be extremely valuable in a wide variety of fields. This conference aims to gather researchers using machine learning and big data to answer challenges relevant for central banking. 

Examples of questions, and topics, of interest are: 

Forecasting applications and methods
-Can better predictive performance of key economic aggregates (GDP, inflation, etc.) be achieved by using alternative data sources? 
- Does the machine learning tool-kit add value to already well-established forecasting frameworks used at central banks? 

 Causal effects
- How can new sources of data and methods be used learn about the causal mechanism underlying economic fluctuations? 

Text as data
- Communication is at the heart of modern central banking. How does this affect markets? 
- How can textual data be linked to economic concepts like uncertainty, news, and sentiment? 

Confirmed keynote speakers are: 
- Victor Chernozhukov (MIT) 
- Matt Taddy (Microsoft, Chicago Booth) 

The conference will feature 10-12 papers. If you would like to present a paper, please send a draft or an extended abstract to mlconference@norges-bank.no by 31 July 2017. Authors of accepted papers will be notified by 15 August. For other questions regarding this conference, please send an e-mail to mlconference@norges-bank.no. Conference organizers are Vegard H. Larsen and Leif Anders Thorsrud.

13th Annual Real-Time Conference

Great news: The Bank of Spain will sponsor the 13th annual conference on real-time data analysis, methods, and applications in macroeconomics and finance, next October 19th and 20th , 2017, in its central headquarters in Madrid, c/ Alcalá, 48. 

The real-time conference has always been unique and valuable. I'm very happy to see the Bank of Spain confirming and promoting its continued vitality.

More information and call for papers here.

Topics include:

• Nowcasting, forecasting and real-time monitoring of macroeconomic and financial conditions.
• The use of real-time data in policy formulation and analysis.
• New real-time macroeconomic and financial databases.
• Real-time modeling and forecasting aspects of high-frequency financial data.
• Survey data, and its use in macro model analysis and evaluation.
• Evaluation of data revision and real-time forecasts, including point forecasts, probability forecasts, density forecasts, risk assessments and decompositions
.