Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Big Data [clear filter]
Saturday, October 20

10:00am PDT

Predictive Analysis of Financial Fraud Detection using Azure ML and Spark ML in AWS
This talk aims at providing insights, performance, and architecture on Financial Fraud Detection on a mobile money transactional activity in Azure ML and Spark. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure ML and Spark ML, which are traditional systems and Big Data respectively. I will present predictive analysis with several classification models experimenting in Azure and Spark ML. Besides, scalibility of Spark ML will be presented for the models with different number of nodes for Spark clusters in Amazon AWS.

avatar for Jongwook Woo

Jongwook Woo

Professor, California State Univerisity Los Angeles & Big Data AI Center
Dr Jongwook Woo received his Ph.D from USC and went Yonsei University. He is a Professor at CIS Department of California State University Los Angeles and serves as a Technical Advisor of Isaac Engineering, Council Member of IBM Spark Technology Center and as a president at KSEA-SC... Read More →

Saturday October 20, 2018 10:00am - 10:30am PDT
Ballroom # 408A

10:30am PDT

An Opensource Data Governance System for IoT Communities
The I3 (Intelligent Internet-of-Things Integrator) Consortium is a newly formed open consortium being created to encourage the accelerated formation of community-based IoT networks. These networks are formed when independent IoT device owners work together to create data “rivers” that have more composite value than a series of individual IoT data streams. I3 is an innovative IoT data management system was first conceived at USC and evolved with encouragement from the City of Los Angeles and other entities seeking to create an ecosystem that will encourage the accelerated deployment of IoT technology by creating an environment that allows citizens and businesses to form community managed data marketplaces. This presentation will cover the challenges that must be overcome to realize what is, in effect, an open data market place that allows producers and consumers to connect and exchange data on mutually acceptable terms. The presentation will also cover the motivational issues that have driven the program with a focus on the implications such a system has for the larger IoT market. To close the session, a number of use cases will be presented that serve to illustrate application of the system in a live IOT environment.

avatar for Jerry Power

Jerry Power

Executive Director, Institute for Communications Technology Management, University of Southern California
Jerry Power is the Executive Director of The Institute for Communication Technology Management (CTM) at The University of Southern California’s Marshall School of Business.  CTM is actively engaged in identifying, understanding, and leveraging emerging trends driven by the rapid... Read More →

Saturday October 20, 2018 10:30am - 11:00am PDT
Ballroom # 408A

11:00am PDT

Building a Human – Centered Tech Company
There is a growing need for tech companies to take greater responsibility for users' privacy and for the way their business models produce social impacts. Regulations like the General Data Protection Regulation in the EU and California's proposed bill AB 2182 are forcing penalties for uses of personal data that had previously been considered legally acceptable, even if they creeped some people out. How do we build companies that listen to the canaries in our data mines, the employees who are more easily creeped out, the employees who are particularly adept at imagining the unintended impacts of the technology they are building?

This talk presents a reporting structure for listening to the canaries and broad, critical thinkers among our tech staff. I discuss how to make these conversations productive, how to reward employees for coming forward, and how to assess which anxieties require further investigation and action. Inviting all employees to hone their ethical imaginations about the impact of the products they are building is a good first step. Building organizations that are capable of listening to and rewarding employees for their critical thinking skills, tech companies will be more resilient to future regulation and more beneficial to the communities they serve and in which they operate.

avatar for Laura Noren

Laura Noren

VP, Obsidian Security Inc
Laura Norén is VP of Privacy and Trust at Obsidian Security in Newport Beach, CA. She also holds Visiting Scholar positions at NYU's Center for Data Science and UC Berkeley's Berkeley Institute for Data Science. Her public facing research focuses on challenges and opportunities of... Read More →

Saturday October 20, 2018 11:00am - 11:30am PDT
Ballroom # 408A

11:30am PDT

Introducing Keras- Pandas, an Open Source Package That Allows Users to Rapidly Build and Iterate on Deep Learning Models
Introducing keras-pandas, an open source package that allows users to rapidly build and iterate on deep learning models.

Deep learning provides amazing opportunities to build highly predictive models. However, historically this modeling has required deep domain experience, and keeping up with a massive body of literature. Further, applying these approaches to tabular data has been a highly-specialized field.

My recently open sourced package, keras-pandas, allows users to rapidly prototype and deploy models by providing state-of-the art defaults and heuristics, based on the techniques used in Fortune 100 companies and by kaggle grandmasters.

During this talk, we'll dive into keras-pandas's design, have a look at a case study, and discuss real-world approaches for handling numerical, categorical, text, and timestamp data with deep learning.

avatar for Brendan Herger

Brendan Herger

Senior Data Scientist, Metis Data Science
Brendan Herger enjoys bridging the gap between data science and engineering, to build and deploy data products.Brendan brings a unique combination of machine learning, deep learning, and software engineering skills. In his previous work at Capital One and startups, he has built authorization... Read More →

Saturday October 20, 2018 11:30am - 12:00pm PDT
Ballroom # 408A

12:30pm PDT

Flexible and Fast Storage with Alluxio for Deep Learning
In the age of growing data and increased computing power, deep learning models continue to improve their performance across a variety of domains, with access to more and more data, and the processing power to train larger neural networks. This rise of deep learning advances the state-of-the-art for AI but also exposes some challenges for the access to data in different storage systems especially in cloud object storage. In this talk, we will describe the storage challenges for deep learning workloads and how Alluxio can help to solve them.

Alluxio is an open-source memory-speed storage system, which unifies disparate persistent storage systems and provides data access as a local folder to the deep learning frameworks. With Alluxio, data scientists can gain easy access to a variety of storage systems (including Azure block store, AWS S3, and many others) and flexibility without the compromise on performance. This talk will use Tensorflow as an example to show how Alluxio can help data access and management for deep learning frameworks.

avatar for Bin Fan

Bin Fan

VP of Open Source, Founding Engineer, Alluxio, Inc.
Bin Fan is the founding member of Alluxio Inc and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure and won Google’s Technical Infrastructure award. Bin got his Ph.D. in Computer Science from... Read More →

Saturday October 20, 2018 12:30pm - 1:00pm PDT
Ballroom # 408A

1:00pm PDT

Practical Aspects of Machine Learning Models
Deep learning techniques are rapidly gaining popularity in a variety of industries. Developing accurate forecasting models can guarantee success in a corporate environment. Specifically, accurate demand prediction can empower decision making and expand vital planning capabilities of a company. These models can allow better negotiation on flexible shipping rates, as well as avoiding premium fees.
This talk will go over an industry use case that focuses on forecasting using both classical and modern machine learning techniques. The practical aspects of model building process will be discussed: utilizing the CRISP-DM approach and communicating results at the executive level.

avatar for Inga Maslova

Inga Maslova

Professor, University of Southern California
Inga Maslova is a data scientist specializing in predictive analytics, machine learning, big data, and applications to finance, business analytics, economics, hydrology, remote sensing, precision agriculture, and climate change problems. She is currently teaching at Marshall School... Read More →

Saturday October 20, 2018 1:00pm - 1:30pm PDT
Ballroom # 408A

1:30pm PDT

AutoML - The Future of AI
The key challenge in making AI technology more accessible to the broader community is the scarcity of AI experts. Most businesses simply don’t have the much needed resources or skills for modeling and engineering. This is why automated machine learning and deep learning technologies (AutoML and AutoDL) are increasingly valued by academics and industries. The core of AI lies in the model design. Automated machine learning technologies reduce the barrier to AI application, enabling developers with no AI expertise to independently and easily develop and deploy AI models. Automated machine learning is expected to completely overturn the AI industry in the next few years, making AI accessible to everyone

avatar for Ning Jiang

Ning Jiang

CTO, OneClick.ai
Ning Jiang is CTO of oneclick.ai, a leading platform in automating deep learning model design and deployment. Ning has over 15 years of experience in Machine Learning across multiple industries, including web/entity search, search ads, online retail, and cyber security. His was a... Read More →

Saturday October 20, 2018 1:30pm - 2:00pm PDT
Ballroom # 408A

2:00pm PDT

Modernizing and Digitally Transforming Traditional Industrial Companies Using IoT and AI
Most companies today are still relying on traditional assets -- equipment and people -- and outdated business models to compete in an highly dynamic, disruptive industry landscape. In this talk, presenter will describe in in detail, an actual recent client case study where he helped a $1.5B traditional B2B services company which owns over one million "machines" and 100K+ clients to move to a IoT and AI based business model to both generate new $250M earnings (both new sources of revenue and operational cost savings), and in doing so, begin to serve 20M consumers (i.e., also become a B2B2C company), and in the process, transform from a traditional blue collar service company to a modern, data-driven digital company. The presentation will include details examples of use of IoT, advanced data science, AI (Machines Learning, Predictive Analytics) across broad areas of the project.

avatar for Sugath Warnakulasuriya

Sugath Warnakulasuriya

Managing Director, Thalamus Labs
Dr. Sugath Warnakulasuriya, Managing Director of Thalamus Labs, is a strategic advisor and entrepreneur with 25+ years of experience in innovation and growth for enterprises and startups. He has been a consultant with McKinsey & Co, and co-founded two companies: 10EQS and eLink Commerce... Read More →

Saturday October 20, 2018 2:00pm - 2:30pm PDT
Ballroom # 408A

2:30pm PDT

Big Data for the Rest of Us
Its an exciting time in the Big Data and Machine learning space! Never before has there been such an abundance of open-source tools and projects available for companies to leverage when they build their big data solution.

Leveraging the correct framework can significantly accelerate the development of a big data solution, making it simplicity for small teams to develop solutions that scale to terabyte data sets with relative ease. However, it is important to understand that each of the available frameworks are targeted at addressing specific pain points, that may, or may not, be relevant to your specific requirements and environment. The use of poorly suited frameworks, at best, provide little benefit to development and potentially introduces significant unnecessary complexities and downstream limitations.

In this presentation we highlight key factors to consider when developing a big data architecture, discuss the applicability of different big data frameworks to design around and the benefits and pitfalls associated with many of the common frameworks.

avatar for Lawrence Spracklen

Lawrence Spracklen

VP of Engineering, SupportLogic, Inc.
Dr. Lawrence Spracklen leads engineering at SupportLogic, where he leads a team applying AI to the enterprise technical support space. Prior to joining SupportLogic, Lawrence lead engineering teams at two other ML startups; Alpine Data and Ayasdi. Before this, Lawrence spent over... Read More →

Saturday October 20, 2018 2:30pm - 3:00pm PDT
Ballroom # 408A

3:00pm PDT

Demand Forecasting with Real World Data
7 years ago , a consulting assignment for demand forecasting for one of our client sent us on a wild goose chase and brought some interesting actionable insights for our client. I am planning to do a working session with real world like data to derive insights while guiding the attendees on common pitfalls. We had done our initial analysis in SPSS, but I am planning to do the session on R/Python .
1) Introduction to Demand Forecasting Concepts - 5 mins
2) Algorithms typically used for Demand forecasting - 5mins
3) Introduce the data , business context - 5 mins
4) Tasks include cleaning data for outliers, null values, modelling - 30 mins
5) Analysis of Results and model selection - 5 mins
6) Analysis of Residual Error - 5 mins
7) Other Insights - 5 mins

avatar for Sarat Tatineni

Sarat Tatineni

Senior Manager, Cognizant
I am a development manager , currently leading a team of data analysts for an Insurance client. My areas of expertise include predictive analytics, modeling, business intelligence , big data and data warehousing. I work on platforms & languages such as Python , Scala, Spark ML , SQL... Read More →

Saturday October 20, 2018 3:00pm - 3:30pm PDT
Ballroom # 408A

3:30pm PDT

Using Big Data, Insight and Automation to Transform Industries
As Machine Learning becomes mainstream, companies and individuals are trying to figure out how to navigate through the various options. There are opportunities, thanks to the availability of a massive amount of data (both personal and contextual) and information-crunching algorithms. The key is to gain insight that provides real benefits. And achieve automation that closely resembles our decision-making processes.

During my presentation, I will provide technical details and will summarize how companies in various industries (healthcare, marketing, retail, distribution, automotive, agriculture, etc) are leaping ahead of their competition by utilizing these added insights. I will also cover opportunities for consumers and how they stand to benefit from this revolution.

It's time Artificial Intelligence stops being just a buzzword

avatar for Nadeem Moghal

Nadeem Moghal

Chief Data Office, AT&T
Technology and Operations Executive, leading strategic growth and multi-year roadmap. Strong background in designing large systems (by addressing the WHAT and WHY) and then working on actual implementation (the HOW). Experience at companies ranging from Fortune 100 (Paramount Pictures... Read More →

Saturday October 20, 2018 3:30pm - 4:00pm PDT
Ballroom # 408A

4:00pm PDT

Data Rescue – Guerrilla Data Capture for Extremely Large Data Sets
The US Government creates and maintains perhaps the largest datasets of any single government. But often times that data is not easily available, or is not publicly available for various reasons. Here the interesting escapades of how a group of citizen scientists harvested the single greatest treasure trove of raw scientific data in the United States, recovered from a harvest-day server crash, and got kicked off GoDaddy. What is the role of the public? The private sector? Or the Tech community in this important effort? Who are these people, anyway? What we did, how we did it, and where the effort stands today

avatar for Joan Saez

Joan Saez

Chief Data Officer, Cloud BIRST, Inc.
Joan Torres-Saez is the Chief Data Officer of CloudBIRST, Inc. where she manages huge datasets that provide meaningful insights on consumer behavior . She spearheaded the collection and topology of over 2,000 data points on every single city, town, county and zip code in the United... Read More →

Saturday October 20, 2018 4:00pm - 4:30pm PDT
Ballroom # 408A

4:30pm PDT

Realtime Analytics Leveraging Serverless Computing
1.) Introduction to Serverless computing/Architecture - Serverless computing is a cloud-computing execution model in which the cloud provider dynamically manages the allocation of machine resources. Building serverless applications means that your developers can focus on their core product instead of worrying about managing and operating servers or runtimes, either in the cloud or on-premises.
2.) How we use to generate our metrics in batch manner - Explaining nightly ETL's, collecting data throughout the day and running nightly ETL with spark and Scala to generate metrics (with diagram). Drawbacks of batch approach - the lag (up to 24 hours) in reporting provided a poor experience for end-users, as client base grew, the ETL costs were increasing significantly. Finally, the overhead of managing and provisioning resources to run the ETLs was becoming a pain point.
3.) How to generate metrics in real-time manner using Serverless computing - Leveraged Aws Lambda and explain the real-time system(with diagram). Advantages - pay per use, concentrate on the code, no managing of the servers and other resources, reduced lag to 30 secs, simplified infrastructure
4.) Caveats of serverless computing - statelessness, limited native language support, vendor dependencies, concurrent execution limits etc.

Saturday October 20, 2018 4:30pm - 5:00pm PDT
Ballroom # 408A