The Always Confusing World of Data Scaling and Normalization

The Always Confusing World of Data Scaling and Normalization - July 2021 Newsletter

Many IIoT devices create ‘landfills’ of unusable data because it hasn’t been standardized, scaled or normalized.

Often, in many areas of life, we miscommunicate by choosing words that are poorly understood or have multiple meanings. Most of the time it’s not a problem. Our minds fill in the blanks from the context of the conversation. There are words though that we all think we understand but constantly misuse. Topping the misused list in the English language is the word “irony”. If you investigate it, you’ll find that people in the field of linguistics snarl and rage at each other over it.

All fields have an issue with technical terms that are poorly understood and widely used. In industrial networking, HTTP and REST are ones that I’ve had to confront with my team. They are used almost interchangeably but the true definitions are unclear to the most technical of programmers. In fact, the REST architecture is so under-defined that you can find scores of websites with programmers debating each other as to its exact meaning.

I am not going to take on the debate over irony or REST in this article. Instead, I am going to focus on two terms that bedevil the control engineer on a more daily basis: scaling and normalization. Once again, just like HTTP/REST, the terms are related, used interchangeably but are actually different concepts. We’ll start with the easier term, scaling, before tackling the more sinister term, normalization.

SCALING

We all first encountered scaling in 4th or 5th grade mathematics. Scaling is simply transforming your data such that it fits a common scale. The data itself doesn’t change, only the range of the data is modified when scaled. Whenever you change the units on a value, you are scaling the value; millimeters to inches, yens to dollars, and so on.

The most common example in industrial automation is scaling of analog voltage measurements. Sensors often scale analog voltages to different ranges. One analog temperature sensor might use a 0-10V scale while the other uses a 0-24V scale. Data captured from these sensors must be scaled before any calculations or comparisons are valid.

Scaling simply changes the range on a value as shown in the following figure:

The shape of the data doesn’t change. Only the range changes when you scale data.

The technical term for the kind of scaling shown in the previous graphic is feature scaling. There is also geometric scaling, a linear transformation on an object which expands or compresses it and image scaling, which refers to the practice of enlarging or expanding the size of an object.

NORMALIZATION

Normalization is a big kettle of worms compared to the simplicity of scaling. Normalization means different things to different people and it’s one of the reasons that as a control engineer, you’ll have difficulty understanding what your colleagues mean when you hear things like, “Our normalization process is suspect.” That could mean almost anything.  Normalization means different things to different people and has distinctive meanings in different industries. Different organizations in a company might use the term differently.

The Database Team – When database people talk of “normalization,” they are speaking about the process of organizing their database tables efficiently. Unlike most other disciplines, when database people talk normalization, it doesn’t have anything to do with data values. Instead, normalization for these folks means reducing redundancy and dependencies in their databases. For example, instead of including state names in every record of an employee database, database folks will “normalize” that data by creating a US state table and referencing that table from the employee database. For them, normalization means organizational simplicity and elegance.

Your Process Engineering Team – Unlike the database team, your process people are often math geeks and might talk mean Z normalizations and with formulas like this when discussing normalization.

Some of your other process people might use normalization to discuss modifying the data set so that all elements of a dataset lie between 0 and 1. Process people have several different ways to normalize data and it varies by industry. In the steel industry, normalizing is a kind of heat treatment. Steel normalization removes impurities in steel and improves its strength and hardness. This happens by changing the size of the grain, making it more uniform throughout the piece of steel.

The Machine Learning Group – Compared to everybody else, these people have the most radical view of normalization. For people using machine learning tools, the shape of the data must be “normalized” to be processed by many of the machine learning algorithms. This can mean different things, but in one method, a variable like product weight is visualized as a “normal” distribution. The data is transformed to approximate a bell curve where the mean or average is the center of the curve and 67.5% of the data is one standard deviation from the center as in the following figure:

Your Control Engineering Peers – Most of your colleagues probably use the word normalization when they just mean scaling. If they do, they aren’t wrong, as scaling (described at the beginning of this article) is just one of the many meanings for the word normalization.

All the new IIoT devices are making standardization, scaling and normalization important. The easy part of IIoT is moving data from its source to some database in a server or in the Cloud. Making the data useful by being able to compare datasets from disparate sources using different scales, units and distributions is much more difficult. This is the reason I’ve said in the past that a lot of IIoT devices are just creating data landfills full of unusable data because it hasn’t been standardized, scaled or normalized.

Understanding what people are talking about is only the beginning to the path of building data sets that can turn raw data into actionable information.