Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Data Science [clear filter]
Saturday, October 20


Is the Best Predictor Actually the Best?
It is common that to build a predictive model the data analyst tries to select the best subset of predictors. In this presentation we answer the question, does this subset include the single best predictor, always? We show examples where the best predictor is not always in the best subset, and the worst one actually is included in the best subset. We discuss both numeric and categorical predictors, and show simple extreme examples on how not to set up the data set before building the model. Otherwise the results may be misleading. We use data visualization and measures to explain the differences, such as going from 0.05 to 0.95 r-squared or from a negative adjusted r-squared to a model with that measure close to one.
We conclude with suggestions on how to (or how not to) build models in high-dimensional space, where graphical displays may not be as helpful as desired.

avatar for Cesar Acosta

Cesar Acosta

Professor, University of Southern California
Dr. Acosta is a Data Science and Data Analytics professional with many years of experience analyzing highly complex data, building advanced data mining models to predict market outcomes useful to improve decision making in Marketing Analytics, Financial investing, and business operations... Read More →

Saturday October 20, 2018 10:00am - 10:30am
Ballroom # 403B


The New Media Ecosystem – Disruption or Dissolution?
We’ll explore the ways data at scale, machine learning, crypto currencies and distributed databases are remaking what we think of as media. We’ll do this from two perspectives:

Disruption - Are established media distribution, content creation, and audience engagement platforms doomed by design? As the media industry is remade by new entrants and new business models, the distinction between media and technology has effectively disappeared. Strategy and differentiation for new media entities is as much about flexible engagement platforms, integrated operations, and partner ecosystems as it is about new ideas and new content. While audiences fragment and migrate, new audience currencies are emerging distinguished by speed, scale, and flexibility. We'll explore the evolution of media engagement models, understand these in the context of current market trends, and explore some of the ways new entrants are using emerging technologies to enable groundbreaking customer experiences.

Dissolution - We’ll also explore the underside of new media platforms, specifically the ways personal data syndication, programmatic content, simplistic learning algorithms, and automated decisioning can be manipulated to perpetuate bias, raise rancor and undermine good faith. Do media and technology companies have obligations that extend beyond the confines of customer relationships? What are the implications of captive ecosystems that are becoming increasingly enmeshed in more areas of our lives?

avatar for Gerald Parham

Gerald Parham

Fan Experience and Omnichannel Customer Engagement Leader, IBM
Mr. Parham is a senior product & service innovation leader with 20 years’ experience in media and marketing, technology and creative strategy. Prior to joining IBM, he founded a social discovery platform (yuni vrs) and authored a patent for enabling immersive interaction of user-generated... Read More →

Saturday October 20, 2018 10:30am - 11:00am
Ballroom # 403B


(Cancelled) Why Data Governance Matters
The adage of "garbage in, garbage out" could not be truer for data analytics. While the least visible (and not the flashiest), data governance is the most impactful factor to any data driven organization. A good data governance system will allow organizations to lay down a strong foundation to build their data strategies on with consistent and reliable data. This presentation will give an overview of why every organization needs data governance and a framework to build theirs on.

Saturday October 20, 2018 11:00am - 11:30am
Ballroom # 403B


Analytics for Industrial Cyber Control Security
Modern industrial control systems are highly automated and often controlled by computers and connected devices. In addition to physical failures, cyber attacks present an increasing threat to normal system operations. We propose an unsupervised learning approach for automated detection of adverse cyber attacks. In particular, we will devise the unsupervised learning based approaches for anomaly detection using only datasets collected under normal system conditions. Anomaly detection techniques are extremely useful for real-life system security management because cyber attacks rarely happen in practice even though their impact could be disastrous. We conduct detailed studies on a water treatment plant in which water processes are all connected by computer control systems. We demonstrate our analytics tools can effectively detect new types of cyber attacks and, moreover, can locate potential attacking points and compromised control units or devises

avatar for Honggang Wang

Honggang Wang

Assistant Professor, University of La Verne
Honggang Wang is an assistant professor in Analytics in College of Business and Public Management at University of La Verne. He received his Bachelor of Science degree in Power Engineering from Shanghai Jiao Tong University, Shanghai, China, in 1996, Master of Science in Manufacturing... Read More →

Saturday October 20, 2018 2:00pm - 2:30pm
Ballroom # 409


Machine Learning on Social Networks
Social networks are extremely common in our modern-day world. These networks are composed of individuals and connections, where connections here represent social interactions such as friendship, peer or work relationships, and communication. Social network data can originate from many sources including online social networks, mobile phones, email, financial transactions or a variety of digital communications channels over which individuals interact. Social networks are typically very large and heterogeneous, and while the construction of features from this data can be quite complex it can often justify itself by providing great signal for machine learning prediction tasks.

In this presentation I will describe the social networks that I work with as a data scientist in Tala, how we construct these networks from our various data sources and how we use these networks for machine learning tasks including credit scoring and fraud prediction. I will talk about the complexities and nuances of using this data, and considerations and challenges that arise when building live scoring machine learning models based on this data. I will describe local network features (based on individuals' direct connections) and global network features (based on an individual’s position or centrality within the network), and approximation methods for global metrics which allow for their rapid implementation in live models. I will discuss the effectiveness of models based on network features and conclude with other use cases of networks in data science.

avatar for Peter Fennell

Peter Fennell

Data Scientist, Tala
peter fennell is a Senior Data Scientist at Tala, where he uses a suite of tools including EDA, statistics, machine learning and engineering to build credit and fraud models for our global markets.Previous to Tala peter fennell was a postdoctoral researcher in statistics, networks... Read More →

Saturday October 20, 2018 3:00pm - 3:30pm
Ballroom # 403B