Machine learning in statistics
Authors
More about the book
Given the growing awareness of machine learning outside of computer science in both academia and business in recent years, the author examines two corporate finance forecasting problems and shows how machine learning models compare to established literature models. The forecasting of future corporate bankruptcies serves vicariously as a classification problem and the forecasting of future corporate earnings as a regression problem, which are both traditionally approached with econometric techniques such as logistic and linear regression. The author forecasts bankruptcies and earnings using a diverse set of machine learning models—ranging from subset selection models to highly flexible Boosting models—and combines these models to stacked ensembles. Ensemble learning can potentially outstrip the performance of individual machine learning models and to date has not been investigated for bankruptcy and earnings forecasts. Besides the focus on high performing models, the author creates highly interpretable logistic and linear regression models with the most predictive variables assessed over all machine learning models. For both forecast problems, stacked ensembles show in their optimal calibration the best forecast performance, exceeding established literature models by at least 5% in terms of standard evaluation criteria. Individual machine learning models achieve at least an improvement of 4% and self-generated regression models at least 1%. Despite the already capable performance of established literature models in the field of corporate finance, the utilization of machine learning successfully enriches the way researchers can transform data into forecasts.