# Predictive Analytics

## 3 Things You Need to Know

Predictive analytics uses historical data to predict future events. Typically, historical data is used to build a mathematical model that captures important trends. That predictive model is then used on current data to predict what will happen next, or to suggest actions to take for optimal outcomes.

Predictive analytics has received a lot of attention in recent years due to advances in supporting technology, particularly in the areas of big data and machine learning.

### Rise of Big Data

Predictive analytics is often discussed in the context of big data, Engineering data, for example, comes from sensors, instruments, and connected systems out in the world. Business system data at a company might include transaction data, sales results, customer complaints, and marketing information. Increasingly, businesses make data-driven decisions based on this valuable trove of information.

### Increasing Competition

With increased competition, businesses seek an edge in bringing products and services to crowded markets. Data-driven predictive models can help companies solve long-standing problems in new ways.

Equipment manufacturers, for example, can find it hard to innovate in hardware alone. Product developers can add predictive capabilities to existing solutions to increase value to the customer. Using predictive analytics for equipment maintenance, or predictive maintenance, can anticipate equipment failures, forecast energy needs, and reduce operating costs. For example, sensors that measure vibrations in automotive parts can signal the need for maintenance before the vehicle fails on the road.

Companies also use predictive analytics to create more accurate forecasts, such as forecasting the demand for electricity on the electrical grid. These forecasts enable resource planning (for example, scheduling of various power plants), to be done more effectively.

### Cutting-Edge Technologies for Big Data and Machine Learning

To extract value from **big data**, businesses apply algorithms to large data sets using tools such as Hadoop and Spark. The data sources might consist of transactional databases, equipment log files, images, video, audio, sensor, or other types of data. Innovation often comes from combining data from several sources.

With all this data, tools are necessary to extract insights and trends. Machine learning techniques are used to find patterns in data and to build models that predict future outcomes. A variety of machine learning algorithms are available, including linear and nonlinear regression, neural networks, support vector machines, decision trees, and other algorithms.

### Predictive Analytics Examples

Predictive analytics helps teams in industries as diverse as finance, healthcare, pharmaceuticals, automotive, aerospace, and manufacturing.

**Automotive**– Breaking new ground with autonomous vehicles

Companies developing driver assistance technology and new autonomous vehicles use predictive analytics to analyze sensor data from connected vehicles and to build driver assistance algorithms.**Aerospace**– Monitoring aircraft engine health

To improve aircraft up-time and reduce maintenance costs, an engine manufacturer created a real-time analytics application to predict subsystem performance for oil, fuel, liftoff, mechanical health, and controls.**Energy Production**– Forecasting electricity price and demand

Sophisticated forecasting apps use models that monitor plant availability, historical trends, seasonality, and weather.**Financial Services**– Developing credit risk models

Financial institutions use machine learning techniques and quantitative tools to predict credit risk.**Industrial Automation and Machinery**– Predicting machine failures

A plastic and thin film producer saves 50,000 Euros monthly using a health monitoring and predictive maintenance application that reduces downtime and minimizes waste.**Medical Devices**– Using pattern-detection algorithms to spot asthma and COPD

An asthma management device records and analyzes patients' breathing sounds and provides instant feedback via a smart phone app to help patients manage asthma and COPD.

**Predictive analytics** is the process of using data analytics to make predictions based on data. This process uses data along with analysis, statistics, and **machine learning** techniques to create a predictive model for forecasting future events.

The term “predictive analytics” describes the application of a statistical or machine learning technique to create a quantitative prediction about the future. Frequently, supervised machine learning techniques are used to predict a future value (*How long can this machine run before requiring maintenance?*) or to estimate a probability (*How likely is this customer to default on a loan?*).

Predictive analytics starts with a business goal: to use data to reduce waste, save time, or cut costs. The process harnesses heterogeneous, often massive, data sets into models that can generate clear, actionable outcomes to support achieving that goal, such as less material waste, less stocked inventory, and manufactured product that meets specifications.

### Predictive Analytics Workflow

We are all familiar with predictive models for weather forecasting. A vital industry application of predictive models relates to energy load forecasting to predict energy demand. In this case, energy producers, grid operators, and traders need accurate forecasts of energy load to make decisions for managing loads in the electric grid. Vast amounts of data are available, and using predictive analytics, grid operators can turn this information into actionable insights.

### Step-by-Step Workflow for Predicting Energy Loads

Typically, the workflow for a predictive analytics application follows these basic steps:

**Import data from varied sources, such as web archives, databases, and spreadsheets.**

Data sources include energy load data in a CSV file and national weather data showing temperature and dew point.**Clean the data by removing outliers and combining data sources.**

Identify data spikes, missing data, or anomalous points to remove from the data. Then aggregate different data sources together – in this case, creating a single table including energy load, temperature, and dew point.**Develop an accurate predictive model based on the aggregated data using statistics, curve fitting tools, or machine learning.**

Energy forecasting is a complex process with many variables, so you might choose to use neural networks to build and train a predictive model. Iterate through your training data set to try different approaches. When the training is complete, you can try the model against new data to see how well it performs.**Integrate the model into a load forecasting system in a production environment.**

Once you find a model that accurately forecasts the load, you can move it into your production system, making the analytics available to software programs or devices, including web apps, servers, or mobile devices.

## Developing Predictive Models

Your aggregated data tells a complex story. To extract the insights it holds, you need an accurate predictive model.

**Predictive modeling** uses mathematical and computational methods to predict an event or outcome. These models forecast an outcome at some future state or time based upon changes to the model inputs. Using an iterative process, you develop the model using a training data set and then test and validate it to determine its accuracy for making predictions. You can try out different machine learning approaches to find the most effective model.

Examples include time-series regression models for predicting airline traffic volume or predicting fuel efficiency based on a linear regression model of engine speed versus load, and remaining useful life estimation models for prognostics.

## Predictive Analytics vs. Prescriptive Analytics

Organizations that have successfully implemented predictive analytics see prescriptive analytics as the next frontier. Predictive analytics creates an estimate of what will happen next; *prescriptive* analytics tells you how to react in the best way possible given the prediction.

Prescriptive analytics is a branch of data analytics that uses predictive models to suggest actions to take for optimal outcomes. Prescriptive analytics relies on optimization and rules-based techniques for decision making. Forecasting the load on the electric grid over the next 24 hours is an example of *predictive analytics*, whereas deciding how to operate power plants based on this forecast represents *prescriptive analytics*.

## Interesting Predictive Analytic Examples with MATLAB

Companies are finding innovative ways to apply predictive analytics using MATLAB^{®} to create new products and services, and to solve long-standing problems in new ways.

These examples illustrate predictive analytics in action:

### Baker Hughes Develops Predictive Maintenance Software for Gas and Oil Extraction Equipment Using Data Analytics and Machine Learning

Baker Hughes trucks are equipped with positive displacement pumps that inject a mixture of water and sand deep into drilled wells. With pumps accounting for about $100,000 of the $1.5 million total cost of the truck, Baker Hughes needed to determine when a pump was about to fail. They processed and analyzed up to a terabyte of data collected at 50,000 samples per second from sensors installed on 10 trucks operating in the field, and trained a neural network to use sensor data to predict pump failures. The software is expected to reduce maintenance costs by 30–40%—or more than $10 million.

### BuildingIQ Develops Proactive Algorithms for HVAC Energy Optimization in Large-Scale Buildings

Heating, ventilation, and air-conditioning (HVAC) systems in large-scale commercial buildings are often inefficient because they do not take into account changing weather patterns, variable energy costs, or the building’s thermal properties. Building IQ’s cloud-based software platform uses advanced algorithms to continuously process gigabytes of information from power meters, thermometers, and HVAC pressure sensors. Machine learning is used to segment data and determine the relative contributions of gas, electric, steam, and solar power to heating and cooling processes. Optimization is used to determine the best schedule for heating and cooling each building throughout the day. The Building IQ platform reduces HVAC energy consumption in large-scale commercial buildings by 10–25% during normal operation.

### Developing Detection Algorithms to Reduce False Alarms in Intensive Care Units

False alarms from electrocardiographs and other patient monitoring devices are a serious problem in intensive care units (ICUs). Noise from false alarms disturbs patients’ sleep, and frequent false alarms desensitize clinical staff to genuine warnings. Competitors in the PhysioNet/Computing in Cardiology Challenge were tasked with developing algorithms that could distinguish between true and false alarms in signals recorded by ICU monitoring devices. Czech Academy of Sciences researchers won first place in the real-time category of the challenge with MATLAB algorithms that can detect QRS complexes, distinguish between normal and ventricular heartbeats, and filter out false QRS complexes caused by cardiac pacemaker stimuli. The algorithms produced a true positive rate (TPR) and true negative rate (TNR) of 92% and 88%, respectively.

To unlock the value of business and engineering data to make informed decisions, teams developing predictive analytics applications increasingly turn to MATLAB.

Using MATLAB tools and functions, you can perform predictive analytics with engineering, scientific, and field data, as well as business and transactional data. With MATLAB, you can deploy predictive applications to large-scale production systems, and embedded systems.

### Why Use MATLAB for Predictive Analytics?

**MATLAB analytics work with both business and engineering data.**

MATLAB has native support for sensor, image, video, telemetry, binary, and other real-time formats. Explore this data using MATLAB Tall arrays for Hadoop and Spark, and by connecting interfaces to ODBC/JDBC databases.**MATLAB lets engineers do data science themselves.**

Enable your domain experts to do data science, with powerful tools to help them do machine learning, deep learning, statistics, optimization, signal analysis, and image processing.**MATLAB analytics run in embedded systems.**

Develop analytics to run on embedded platforms, by creating portable C and C++ code from MATLAB code.**MATLAB analytics deploy to enterprise IT systems.**

MATLAB integrates into enterprise systems, clusters, and clouds, with a royalty-free deployable runtime.

### Your Data + MATLAB = Success with Predictive Analytics

In this simplified view, engineering data arrives from sensors, instruments, and connected systems out in the world. The data is collected and stored in a file system either in-house or in the cloud.

“No matter what industry our client is in, and no matter what data they ask us to analyze—text, audio, images, or video—MATLAB code enables us to provide clear results faster.”

Dr. G. Subrahamanya VRK Roo, Cognizant

This data is combined with data sourced from traditional business systems such as cost data, sales results, customer complaints, and marketing information.

After this, the analytics are developed by an engineer or domain expert using MATLAB. Preprocessing is almost always required to deal with missing data, outliers, or other unforeseen data quality issues. Following that, analytics methods such as statistics and machine learning are used to produce an “analytic”–a predictive model of your system.

To be useful, that predictive model is then deployed—either in a production IT environment feeding a real-time transactional or IT system such as an e-commerce site or to an embedded device—a sensor, a controller, or a smart system in the real-world such as an autonomous vehicle.

Applying MATLAB and Simulink^{®} as part of this architecture is ideal, because the tools enable easy deployment paths to embedded systems with Model-Based Design, or to IT systems with application deployment products.

“MATLAB has helped accelerate our R&D and deployment with its robust numerical algorithms, extensive visualization and analytics tools, reliable optimization routines, support for object-oriented programming, and ability to run in the cloud with our production Java applications.”

Borislav Savkovic, lead data scientist, BuildingIQ