Big Data Analytics for Manufacturing - OptimalPlus

Why applying machine learning to smart manufacturing is such a powerful move

Smart manufacturing lines collect information from a myriad of sources including equipment, sensors, and systems–often generating more data than manufacturers know what to do with. But in recent years, machine learning has appeared on the scene to help make sense of it all.

Machine learning algorithms can be trained on huge data sets to draw insights and predictions about future data they will encounter. This is perfect for manufacturers, which typically generate dizzying amounts of data on a daily basis. By applying machine learning to production lines on the factory floor, companies reap major benefits, including process optimization, higher product quality, fewer escapes, and lower scrap rates.

And yet, despite such advantages, most manufacturers today are not getting the most out of machine learning. One of the reasons for this is that ML is not just about the algorithms; it also depends on setting up an infrastructure that can support the full machine learning lifecycle from beginning to end.

In this blog post, we explain the potency of machine learning for manufacturing and detail the machine learning lifecycle as well as the challenges involved. We’ll then introduce OptimalPlus, a solution that automates the entire machine learning lifecycle, allowing companies to roll out machine learning capabilities with agility and extend them to all aspects of production.

Machine learning in smart manufacturing:
what’s all the excitement about?

Did you know that by using ML manufacturers can:

50 %
Reduce scrap
20 %
Cut unplanned downtime
0 %
Enhance equipment productivity

On the backdrop of Industry 4.0, IDC forecasts that by 2025, there will be 41.6 billion connected IoT devices that generate 79.4 zettabytes of data. For manufactures, somewhere in the sea of data collected, there are valuable lessons to be learned about operations. But how to get to them? Most enterprises struggle to extract value from the data they collect. On average, 60% to 73% of company data goes unused. This is where machine learning comes in.

Machine learning algorithms are capable of ingesting vast data sets and flagging anomalies and defects of all kinds across production. Since they are trained and retrained on huge volumes of data, the algorithms can reach extreme levels of precision. For instance, AI-based machines can identify faulty parts with up to 90% more accuracy than humans.

Machine learning can be applied to just about anything in manufacturing, including predictive maintenance, supply chain management, inventory, product quality, and scrap reduction. According to DataRPM, using machine learning for anomaly detection and prediction in manufacturing can reduce scrap by 75%, cut unplanned downtime in half, and enhance equipment effectiveness by 25%. In automotive, malfunctions can be identified 75% faster, and assembly line uptime can be improved by 35%. This is an exceptional feat for an industry in which unplanned downtime can cost $20,000 dollars a minute. According to McKinsey, machine learning can improve yields up to 30% in the semiconductor industry and reduce supply-chain forecasting errors by 50%.

Manufacturing Big Improvements

Slashing unplanned downtime by 50%

Reducing scrap Production by 75%

Improving overall equipment effectiveness by 25%

Value-Driven Benefits

Optimize spare part availability by 13%

Diagnose potential malfunctions 75% faster

Improve assembly line uptime by 35%

Source: DataRPM’s “Anomaly Detection & Prediction Decoded: 6 Industries, Copious Challenges, Extraordinary Impact


Despite this great potential, few companies are actually using machine learning on the factory floor. According to TrendForce, most manufacturing companies only move to machine learning gradually, spending years and long-term investments on each adoption stage. Why is this the case?

An important thing to understand about ML is that the code behind it is actually the smallest part of the problem. For machine learning to be deployed successfully, an infrastructure is needed that supports the entire machine learning lifecycle. Let’s take a look at what this lifecycle entails and what difficulties can arise when setting it up.

The machine learning lifecycle: details & challenges

Machine learning algorithms can’t create an impact in a vacuum–they need to be supported by an infrastructure that bridges the machine learning code with business requirements and actual processes on the factory floor. To do this, machine learning can be conceptualized in terms of a complete lifecycle, which can be broken down into four interconnected stages: learn, act, validate, and adapt.


Every machine learning model starts with learning from existing data. To this end, data scientists must prepare data sets on which the machine learning models will be trained. This mundane task takes a shockingly significant amount of time. The process of obtaining data, cleaning it, and aligning it takes up to 50% to 80% of data scientists’ time.

Extracting features from the data is particularly complex and can be done with many different techniques. At this stage, data scientists identify which parameters in the data are important and look for efficient ways to calculate them. In semiconductor manufacturing, for instance, each die has an x and y location. One important feature to calculate would be how far each die is from the center of the wafer.

Another issue to consider is that there are many different data science tools on the market. Data scientists typically have strong preferences for certain platforms and coding languages, such as Jupyter Notebooks, Tensor Flow, PyTorch, Python, and R. To deploy their models, data scientists need systems that can augment these specific tools and provide both the data and deployment framework seamlessly.


Once the machine learning algorithms have been trained, data scientists can waste a lot of time working out the kinks to get their model into production. According to one study, 40% of companies reported spending over one month to deploy just one machine learning model.

Furthermore, for machine learning algorithms to create value across the entire supply chain, they must integrate directly into the factory floor, informing MES decisions in near real-time. This integration is often further complicated since it can span systems and equipment that are distributed across multiple factories or even outsourced to other suppliers.


Even when the machine learning model is up and running, a data scientist’s job is far from over. As Google pointed out in its article “Hidden Technical Debt in Machine Learning Systems,” machine learning models incur a huge amount of technical debt.

Machine learning models can be quite fragile. Just because they were successful once doesn’t mean they will be again when conditions are slightly altered. As a result, data scientists must be extra vigilant, monitoring production models constantly. Instead of moving on to new projects, data scientists often find themselves in a race to collect more data, fix problems in the feature extraction code, detect errors in the model, reproduce results to ensure the model is still working, and manage dependencies within the model.


Finally, for machine learning models to generate meaningful insights, they must constantly adapt to remain in tune with business objectives and the changing processes on the factory floor. Changes in production inevitably cause models to go stale.

One of the greatest strengths of machine learning models is that they are always relearning. In automotive, for example, if a recall occurs, the OEM could run data about the defective product through the machine learning algorithms to create a failure signature. This would ensure that going forward, the model will flag this type of problem on the production line, preventing future escapes. Still, this crucial relearning process is often manual, costing additional time and effort.

The four stages presented above are all interconnected and pose serious challenges. Without a solution to automate the entire machine learning lifecycle, adoption will be slow and very limited with teams confined to a small number of localized projects, leading to results that may not be as good as they could be.

Automating the ML lifecycle with OptimalPlus

OptimalPlus is an end-to-end big-data analytics platform that also automates the entire machine learning lifecycle. At the learning stage, the solution ensures data is collected and harmonized continuously and is available at the click of a button.

In addition, OptimalPlus equips its customers with creative and state-of-the-art feature extraction. Using advanced capabilities, such as geographic and parametric outliers, it extracts meaningful features from the raw data taken from products, machinery, inspections, and processes on the factory floor. These data are combined to calculate domain-driven engineering features such as wafer geography, welding parameters, and cycle time. The data science teams can then apply statistical methods such as PCA and normalization to this information, extracting meaningful features for the machine learning model. OptimalPlus also integrates seamlessly with multiple data science tools, allowing teams to do this work within their chosen environments. 

Once the model is trained, the OptimalPlus platform takes full charge of deployment and integration with systems and equipment in the production line. Through “Virtual Operation Rules” executed at the edge, actionable insights are delivered to the MES in near real-time.

With OptimalPlus, rules can be published easily to any location, allowing the machine learning models to scale and be updated as needed. The solution enforces versioning, auditing, and tracking of models, their supporting code, and its usage.  This makes it possible to track the evolution of the model over time and accurately analyze historical decisions made by the model.

Through its extensive library of statistical rules, OptimalPlus takes care of validation by monitoring the key parameters of the model inputs, identifying any change in their behavior, and, if needed, acting as a safety net to prevent incorrect decisions made by the model. This enables data scientists to escape serious technical debt as well as get working on their next project.  

Finally, OptimalPlus can ensure that models remain up-to-date. Relearning, which, as mentioned above, is typically handled manually, can be partially or fully automated.






In the age of Industry 4.0, manufacturers cannot afford to dawdle when it comes to ML adoption. As long as machine learning is perceived as merely a bunch of code to be written by data scientists, it will take manufacturers years to put it into practice.

With OptimalPlus, data scientists can focus on what they do best, trusting that their models will be supported with a full-blown machine learning infrastructure. Thus, manufacturers can stop worrying about how to deploy machine learning and start reaping the benefits right away, meaning higher-quality products, lower scrap rates, automated inspection, and increased yield.

Sign up for updates, news, use cases, best practices and events

OptimalPlus is the global leader in big data manufacturing analytics solutions for the automotive, semiconductors and electronics industries, serving tier-1 suppliers and OEMs.