Advanced analytics and predictive analytics
Potentials and processes of modern analysis methods
Advanced analytics is the autonomous or semi-autonomous analysis of data or content using sophisticated techniques and tools. This typically goes beyond traditional business intelligence (BI) in order to gain deeper insights, make predictions or generate recommendations. This advanced analytical process includes data mining methods, machine learning processes, neural networks and predictive analytics.
Areas of application for advanced analytics and predictive analytics
Advanced analytics describes data analyses that go beyond simple mathematical calculations such as totals and averages or filtering and sorting. These advanced analyses use mathematical and statistical formulas and algorithms to generate new information, recognize patterns and identify trends. Machine learning also plays a central role within advanced analytics. Typical areas of application for advanced analytics are:
- Segmentation (creation of groups based on similarities)
- Association (determination of the frequency of common occurrences)
- Classification (e.g. of elements not previously classified)
- Correlation analysis (identification of relationships)
Predictive analytics focuses on identifying future events and their respective probabilities. This primarily involves using historical data to create a mathematical model and identify trends. This model is then applied to current data in order to make statements about future events. There are a large number of possible use cases for predictive analytics:
- Aerospace: Condition monitoring of engines and other important machine parts
- Energy production: forecasting electricity demand and price
- Financial services: Prediction of credit risks
- Mechanical engineering and automation: Predicting failures
- Medicine: Pattern recognition algorithms for the detection of diseases
- Automotive industry: development of driver assistance algorithms
Other general examples include the creation of forecasts for income, prices or sales, as well as requirements or customer benefits, for example to minimize contract cancellation and termination rates. Big data and machine learning are also used in predictive analytics processes.
Workflow of predictive analytics
Within predictive analytics processes, mathematical models (predictive models) are created to determine current trends and then make predictions about future events. To create such models, these processes use data (including big data) in combination with analyses, statistics and machine learning methods.
Such predictions are used to optimize the use of resources, save time and reduce costs. Optimized timelines for the introduction of new products or services can also be created. The models developed in the process should help to achieve or support the goals set.
The data basis
Step 1: Data import
First, all relevant data that is important for the forecast is imported. This is done from various data sources such as databases, web archives, spreadsheets or other types of files.
Step 2: Data preparation
To ensure that the analysis produces valuable results, the imported data is first prepared. This includes removing outliers, identifying missing data and combining different data sources.
The model
Step 3: Development of the predictive model
Supervised machine learning methods are often used in the development of the predictive model. Supervised learning is one of two types of machine learning. Here, an algorithm is applied to a data set to find hypotheses and make predictions. This so-called training data set contains input data and corresponding reaction values. The supervised learning algorithm uses this to build a model that can make predictions about the reaction values for a new data set. Accordingly, the use of larger training data sets often leads to models with higher predictive power, which are well suited to new data sets.
Step 4: Integration of the model into the system
Once a suitable model has been developed using machine learning techniques, it is then implemented in the company environment or in a production system. This makes analyses available for other software programs and devices, such as server applications, mobile devices, web applications and enterprise systems.
This workflow is similar to the iterative process of the CRISP-DM model - the CRoss-Industry Standard Processfor Data Mining. This cross-industry model describes the underlying process behind every data analysis project in six phases. The six phases are:
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment (provision)
Within the model, the phases do not run in strict succession, but alternate and often repeat themselves.
The next step: prescriptive analytics
After the successful implementation of predictive analytics, companies often aim to introduce prescriptive analytics. In addition to the prediction function of predictive models, prescriptive analytics also provide a recommendation on how best to react to certain future events.
One example of a prescriptive analysis is the determination of production and stock levels that coincide with a predicted demand. Prescriptive analytics processes make it possible to provide recommendations for action, for example how much merchandise individual sales locations should store in order to react efficiently to the corresponding forecast.
The predictive models can therefore be extended to the extent that they not only predict events. They can also abstract actions so that these events lead to optimal results.
The challenges of advanced analytics
Conventional BI reports often only depict data and therefore only visualize an actual state. If the data set is of high quality, then the reports are most likely to be reliable. Especially since most modern BI environments are now quite mature and their reporting methods and reporting concepts have reached a high level of development. However, there is no 100 percent guarantee that advanced analytics will always deliver the desired results.
Today, a large number of standard algorithms and standard methods are available for specific use cases, such as customer classification. The search for the most suitable solution for a data set depends heavily on the skills of the user and the software used. However, it is also possible for algorithms to fail due to missing or incorrect data. If an advanced analytics process shows that no results can be found, the process should be aborted and the data reprocessed.
In addition, users of advanced analytics should have knowledge of methods for working with probabilities. While traditional BI reporting almost always delivers the right figures, specialist users have to interpret the probabilities generated by advanced analytics. For example, the quality of a sales forecast or customer classification must not only be noted and communicated for each individual analysis, but also continuously monitored and optimized.