Saturday, October 20 • 10:00am - 10:30am
Is the Best Predictor Actually the Best?

Sign up or log in to save this to your schedule and see who's attending!

Feedback form is now closed.
It is common that to build a predictive model the data analyst tries to select the best subset of predictors. In this presentation we answer the question, does this subset include the single best predictor, always? We show examples where the best predictor is not always in the best subset, and the worst one actually is included in the best subset. We discuss both numeric and categorical predictors, and show simple extreme examples on how not to set up the data set before building the model. Otherwise the results may be misleading. We use data visualization and measures to explain the differences, such as going from 0.05 to 0.95 r-squared or from a negative adjusted r-squared to a model with that measure close to one.
We conclude with suggestions on how to (or how not to) build models in high-dimensional space, where graphical displays may not be as helpful as desired.

avatar for Cesar Acosta

Cesar Acosta

Professor, University of Southern California
Dr. Acosta is a Data Science and Data Analytics professional with many years of experience analyzing highly complex data, building advanced data mining models to predict market outcomes useful to improve decision making in Marketing Analytics, Financial investing, and business operations... Read More →

Saturday October 20, 2018 10:00am - 10:30am
Ballroom # 403B