Case Studies

Big Data in Health
Juergen A. Klenk, PhD, Principal Scientist, Exponent®, Inc.
Erik Brynjolfsson of MIT’s Sloan School of Management found a remarkable correlation between an organization’s performance and its data driven decision making (DDD) ability, thus demonstrating the actual value of data. Fueled by findings like this and other promises, hospital systems, health insurance providers, pharmaceutical and biotech firms, and medical devices manufacturers alike now have high expectations to generate value from Big Data. Indeed, we are nearing the “Peak of Inflated Expectations” in Gartner’s Hype Cycle, and organizations must quickly learn to develop and implement Big Data strategies. In this talk, we will review key components of a Big Data strategy, and demonstrate how it can be successfully implemented to benefit your organization, whether you are interested in improvement of the quality of your healthcare delivery, the safety and benefits of your drugs and devices, the effective management of your operations, or the control of cost.

Big Data in Practice - Scientific Information as a Business Asset – Driving Productivity at Merck Research Labs Through Novel Approaches to Scientific Information Management
John Erik Koch, Director, Informatics, Merck & Co., Inc.
BioPharma companies often struggle to manage scientific information – study results, analyses and historical records are lost due to poor information management practices and failure to steward information in a way that can be leveraged for future purposes. We will share examples of our strategy, execution and progress to date for improving information management through a set of innovative capabilities focused on information Search, Access and Analytics.

Big Data in Official Statistics
Cavan Capps, Big Data Lead, U. S. Census Bureau
Big Data provides both challenges and opportunities for the official statistical community.  The difficult issues of privacy, statistical reliability, and methodological transparency will need to be addressed in order to make full use of Big Data in the official statistical community.  Improvements in statistical coverage at small geographies, new statistical measures, more timely data at perhaps lower costs are the potential opportunities. This talk will provide an overview of some of the research being done by the Census Bureau as it explores the use of “Big Data” for statistical agency purposes.

Conception to Deployment of Big Data Projects at a Large Financial Institute

Mike Aguiling, Head of Big Data Technology, JP Morgan Intelligent Solutions
What does it take to directly impact top line revenue with Big Data projects? What are the typical hurdles to overcome as you move from a research project to integrate directly into an existing business? We review the lifecycle (conception to deployment) of Big Data projects at a large financial institute. The talk will cover topics including project concept and design, guidelines to Big Data project development, a high level overview of a ‘Big Database’, and end project deployment. The focus will be on the specific business use case, the analytic methodology used, and the measurable outcome. In addition, there will be a review of lessons learned and typical challenges faced working as the “Big Data Team” in a large business. We will also briefly cover regulatory hurdles framing Big Data solutions (specific to financial institutions.)

Big Data Analytics Application to Jet Engine Diagnostics
Link C. Jaw, Fellow, Intel Corporation
The “explosion of information” that we have witnessed is not just contributed by a large number of data sources, but also by the large amount of data originating from these sources. Together they have created the so-called “big data” environment characterized by variety, volume, and velocity. The challenge of information explosion is how to extract the right information at the right time from the big data environment which we live in. Extracting the right information from the data is an analytic process, and this process uses data platforms and analytic algorithms to spot trends and patterns so as to derive predictive indicators. These indicators are then used to make proper recommendations or to take timely actions.  In this case study, an aircraft engine diagnostic problem is described and a solution is discussed. Specifically, the discussion will include:

  • Background of the aviation industry.
  • Characteristics of machine diagnostic problem.
  • Data elements.
  • Algorithms.
  • Results of analytics.

This problem is analogous to a large class of machine monitoring and control problems, hence, the solution approach discussed here is applicable to various industrial sectors.

A Case Study on Grid Data Analytics
Marina Thottan, Director, Bell Laboratories, Alcatel-Lucent
Smart grid evolution involves tremendous growth in sensor deployment in both the distribution and the transmission grid, and an exponential growth in the availability of both structured and unstructured data collected from sensors and other external sources such as demographic and environmental data. To realize the return on investment in smart grid technology it is important to maximize the benefit of the substantial amount of data flowing into utility control centers. The wealth of data obtained can be leveraged to extend visibility into the grid from the traditional boundaries of the Supervisory Control and Data Acquisition (SCADA) network (i.e., substations and transformers) all the way down to the end user customer premises. Utilities have to develop a data analytics strategy for timely and precise analysis of the grid data to define novel business and operational applications that will greatly enhance grid operations. In this talk we will describe the design of an online grid data analytics system along with lessons learned.

Improving Efficiency of Google’s Infrastructure Using Big Data Tools:
Behdad Masih-Tehrani, PhD, Quantitative Analyst, Google
In this talk, we will review how Google’s big data tools are being used internally to improve the efficiency of Google’s infrastructure. Every minute, hundreds of thousands of servers across Google’s fleet report performance metrics of hardware components such as cpu, disk, memory and network bandwidth usage and these metrics are saved in different formats, in different databases, across multiple datacenters. But as consumers of the data, Google’s data analysts can easily merge and aggregate these metrics and perform data mining techniques to identify and address inefficiency of the resource deployment. As the first case study, we will show how we have used Google’s application programming interfaces to merge performance metrics and forecast the required networking capacities at server, cluster and datacenter levels as a function of other hardware resources such as cpu and disk. During the second case study, we share our experience setting up a bandwidth monitoring dashboard used to inform flash deployment decisions.

Optimal Fusion for Predictive Road-way Traffic Speeds
Toby Tennent, HERE, a Nokia Business- Connected Drive, Core Components and Algorithms
The Estimation of Short Term Future Traffic Conditions (15 mins - 12 hours ahead) can be accomplished using a Multitude of Methods. In this case-study, we discuss our development of a principled algorithmic approach to fusing different estimates of future traffic speed conditions for an ‘optimal estimate’. Data sparseness, noise and competing model inputs were issues to overcome. In our case, though use of a Minimum Variance Unbiased Estimator (MVUE) approach to data-fusion, we were able to see a 25%+ improvement over our legacy approach. Lessons from this case-study can be applied to many applications involving current and predictive state-modelling.

Optimizing Media Purchasing Through Big Data
Alan Papir, Software Engineer, Analytics Media Group (AMG)
Analytics Media Group (AMG) is using lessons learned from the 2012 Obama presidential campaign to bring new, data-driven insights into the world of media buying. Using various modeling and data mining techniques in conjunction with large and rich datasets (such as billions of set top box records), AMG discovers who is most likely to "convert" to a product or candidate at the person-level. AMG then takes these desirable targets and uses a trove of set top box data to produce a near-optimal solution to problem of purchasing the most valuable placements given a limited budget (a multi-objective variation of the knapsack problem). This presentation will cover some of AMG’s techniques for identifying targets, strategies for efficiently storing and retrieving tens of billions of TV viewing records, and heuristics for finding a near-optimal media buy plan. AMG has been featured on the cover of the New York Times magazine as well as in Bloomberg, Politico, the Cook Political Report and elsewhere.

Smart Machine Learning to Lend to Small Businesses
Pinar Donmez, Chief Data Scientist, Kabbage, Inc., PhD in Computer Science, CMU
The way traditional financial institutions underwrite loans and manage risk has been changing incredibly with the help of emerging technologies. Consumer/business loans and other areas of credit scoring have been disrupted by interdisciplinary technologies combining risk management with machine learning and advanced data analytics. The talk will cover how we turn data into decisions for underwriting and dynamic risk management at Kabbage, Inc.

Developing a Lean Data Science Strategy
Joel Horwitz, Director of Products and Marketing, Alpine Data Labs
When developing a data science project strategy, the scope, tools, and team are crucial. In Lean Data Science projects, these considerations require special attention prior to kicking off the engagement. We will present possible paths to success, by focusing on topics including:

  • Setting “bite-sized” project goals
  • Using tools that provide a display of process and information
  • Utilizing a cross-functional team

It is important to map out the path to a minimally viable product such that critical points of failure are evaluated early when developing a process.