Close this search box.
Get an exclusive look at Ignition's latest demos, remote energy management and more.
By Tebello Masedi
16 October 2020

How-To: Aggregation Methods In Flow



Welcome again. Thanks again for joining our Friday change maker sessions. Really appreciate it that you guys are joining us here this afternoon. And obviously today, what we’re going to talk about is the third part in our series. We had a three part series about our product stack that we have here at element eight. And all about Friday changemaker sessions is where we dive a little bit more detail into the technical side of the product stack that we have. And we’re going to have some interesting topics in the future as well that we’re going to cover exploring our different products in the stack, that being ignition canary and then the flow information platform. Now, today is the third one, session three in our three part series, and we’re going to look at flow. 

And last or two weeks ago, we had a look at the canary story and how we can get data into the canary story, and by utilizing various numbers of protocols and mechanisms. And two weeks before that, we had a look at the ignition PLC. And I think we’re going full circle today here by now to use this vast amount of real time process data that our SCADA solutions and historians will store and performing some aggregation techniques on top of that by using the flow information platform. So I’m going to go and explore the different methods that’s available out of the box with flow, as well as a little bit on how you can customize and create your own potentially aggregation method on top of the raw data. 

Just to put that a little bit in context, I’m just going to talk a bit about data handling and how flow gets the data from these different systems. Now, as we saw four weeks ago, we have our PLCs, and our PLCs is what’s controlling our plant. It’s generating a lot of this time series process data inside of our plant environments. And normally what we do is we connect to our PLCs by some kind of driver or OPC server that literally is just that translation layer between our control level machines, or PLCs, and then whatever protocol they speak to, a protocol that’s a little bit more universal that other pieces of software, like historians, can, for instance, get their data in. 

So normally what we’ll do in a plant environment is we’ll connect our PLC to some OPC server and we’ll use that data now, or that OPC server as a communication hub to get data into our time series historians. Now, a time series historian like the Canary historian, is really purposefully built to get high resolution time series data into it and to store that in a lossless manner. So we lose any of our process data that we’ve got generated on our plant floor. We will use this high resolution data normally for technical analysis, fault finding, trending of these values on charts, and we’ll use that high resolution data to do that type of technical analysis. Where Flow comes in is obviously on top of this. 

Now, the historian can store data from numerous PLCs, and you’ll have a lot of data coming from all of these PLCs as they change. We like to store the data in what we call a delta storage mechanism. So that means whenever that data point changes inside of that PLC, we will go and historize it in that historian. We’ll add a timestamp to it and a quality component to it, and we’ll store it inside of that historian component. Now, what flow will do is flow sits on top of this vast sea of process data. And what flow’s job is to take this data and to clean it a bit and to add some context and aggregation to that data. For an instance, if you’ve got a raw flow transmitter on your plant, flow will take that flow transmitter, it will clean up the data. 

It will perform an aggregation for you to, at the end of the day, to actually tell you what was the total flow that’s been produced for the hour or for the day. So flow’s job is to actually perform this ETL layer on top of our process data. And flow is also purposefully built for the manufacturing environment. We understand aggregation methods when it comes to flow transmitters, like time weighted average. We understand aggregations when it comes totalizers, to do a counter retrieval on that, to see what the actual total is. So we fit very well on top of the plant historians to get this data and to add context to it. And obviously, the first context that we add to this data is the context of time being hourly buckets or daily buckets, shifty buckets, et cetera. 

Just a little bit of a pseudo architecture on the flow platform before we get cracking into the aggregation methods, is when you install flow, you’ll install what we call the information platform. And that has a whole bunch of different parts to it. Now, the first part of the flow information platform is obviously we need to be able to collect data. Now we’ve got data sources, we call them data sources, and that’s to connect to these different types of data sources or silos of information that we might have on our plant environment. And that can range from SQL databases, it can range to historians, cloud devices, IoT platforms. So we’ve got a generic web service call that we can do that as well. And then very importantly, we also still have the capability to add manual data into the platform. 

We’ve got a very powerful calculation engine, and that’s where all the magic happens. The calculation engine is now that will take these raw data points that sits in the historian, and it will actually go and apply whatever aggregation method or aggregation technique you’ve configured it to be on top of this data and store that result inside of our own little ods or operational data store that flow creates. And that is the data point that we will report on. And that is the data point that we will go and create our visualization component on. So you can go and create a whole bunch of different visualization types and dashboarding. These can be used in morning meetings and packaging halls, et cetera. But flow feeds off that already aggregated data that we’re now storing inside of our ods. 

So that’s a very important point, is that if I want to get a total for the hour, I’m not going to run that aggregation. If I launch that report now, I’m not going to go and run that raw data straight from the historian to perform that total flow has already done that hard work. We’ve already connected to the historian, we’ve already created the aggregation method on top of that data, and that is now stored. So it makes us very light connecting to our historian. It doesn’t matter how many people view that report, they won’t go and over connect and overutilize the story and layer to get that data. We also have a mobile app that we can now collaborate and send data to, and that is really a true, we like to call it almost like a WhatsApp for business. 

So you can create these different channels, and in this channel you can send specifics around specific data that’s related to certain parts on the plant. People can subscribe to these different channels and then truly collaborate to bring the human aspect of collaboration around information into the fray as well. By no means are we a closed system. We can now take the data that’s being collated inside of flow and we can go and integrate to other systems, so we can push it to ERP solutions to other SQL databases. You can use a bi tool on top of the data that’s stored in flow. We can even push it out via MQTT, potentially right back to the operator’s station so that he can see visualization into his KPIs and aggregated information, and we can even send it to another flow system. 

If you have a multi site kind of organization and you want to take all the data from a certain site to potentially a headquarters solution. We can definitely send the data to another flow instance and we like to call that our tiering capability to tier different levels of the organization with different data that sits in those different parts. So that in a nutshell is just the pseudo architecture. And what we’re going to focus on today a little bit is how does flow create these aggregation methods, how to clean the data, and what is the standard aggregation methods that’s available inside of the flow platform. Before we get to that, it’s just very important to understand what flow will do with raw data. 

As I explained, the first context that we add to data when we go through this ETL layer or process is we add the context of time to data. So let’s have a little bit of a practical example of how that will work. This is a boiler temperature that the raw process values that’s been stored potentially in a time series historian. So this is the raw high resolution data that’s now available inside of my historian. Now, depending on what time frame you want to look against this data, it could be as high resolution as every five minutes, it could be every hour, it could be every shift or every day or every week. 

What flow will do is it will create these different time buckets depending on what you want to see in this example, what we’re going to look at is a time bucket of an hour. So flow will go and create these hourly time buckets. Now for each and every one of these time buckets, I can go and say, you know what should flow do from an aggregation perspective on this data? So it’s going to take that hourly raw data out of the historian. It’s going to query the historian for hours worth of data. And in this case, if you selected to do a time weighted average on that data, flow will go and create a time weighted average information and it will do this in the background as the engine runs. 

It will at the end of every hour, query the historian again, get that raw data, apply the aggregation technique on top of that data, and this is the result that we will then store inside of our ods and this is the result that flow will report on now depending on what the data looks like. So in this next example, this is a classic kind of totalizer example, and obviously depending on what the data looks like, you have to go and apply different aggregation methods on top of it. So in this case, a time weighted average wouldn’t work. It wouldn’t give you the result that you require. But what you would need to do is to go and configure flow to perform what we call a counter aggregation on top of this raw data. 

So what we will do is we will look at the first value, we’ll look at the last value, we will perform a delta on that. But a little bit more than that is we will also go and look for rollovers or potential where the totalizer has rolled over and use that in our calculations as well and store that result inside of the ods. So very important to understand what the data looks like because that will have a great influence on what type of aggregation you’re going to apply to it as well as the use case of the data that will also have a bearing on what type of aggregation you want to apply. Right? 

So if we understand the context of time that we add to data, let’s go straight into the little bit of the examples that I’ve prepared here today for the different aggregation methods. Now in flow, these aggregation methods is something that you select. It’s something that you go and add via means of a little drop down box. And as you can see there, I’ve got a whole bunch of different standard aggregation methods that I can select. I can select the sum, the average, the min, the max, the counter, the range, the delta, the first, the last, the count, the time and state the variance and the standard deviation. And then as you can see there, I’m actually getting data from a SQL database to perform these aggregation methods on top of it. 

So I’m going to go through these guys, I’m going to show you guys what the actual results would be. Also discuss a little bit of use cases that we can use these different aggregation methods for. And then I’ll also demo potentially how to create your own one, a custom one, if one of these methods that you see in front of you will not give you the result that you actually are interested in. Okay, now I’ve prepared a little bit of a data set and I’m looking at an hour’s worth of data from 06:00 in the morning to 07:00 in the morning. And there’s my data that I’m getting from historians. So it’s all time series data. So I’ve got a timestamp and I’ve got a value for this example that I’ve got here today. 

I’ve got twelve values that sits between my time frame of six and seven. So you’ll notice that there’s some values that’s not on the dot 06:00 and also not on the dot, 07:00. And this is very typical, right? I mean, our plant data changes and we got no idea of exactly when that timestamp is going to be recorded against what the data is recorded. So you might not have a value exactly at 06:00. You also might not have a value exactly at 07:00. And that’s a very important concept to understand, is that potentially you would need to have a value exactly on the boundary of six. And we’ll explain a little bit of that. We like to call that concept the boundary value. So if I plot this on a chart, that’s typically how it would look. 

So you can see the green highlighted area. There is my boundaries six to seven, and then the twelve different points that I’ve got inside of my boundary there. So these guys are all in line with my hourly data. This is the point just before six. And then I’ve got another point just outside of that time frame. It’s of 07:00. And it’s very important to understand how many of these points will flow actually utilize when it does certain calculations and aggregation methods. Now, it works a little bit better if we plot this data as a stairstep just to highlight those two points. That’s outside of my hourly range there. So if we plot it as a stairstep, and this is also a very important concept to understand, is that flow tries very hard to use all the raw data points. 

We do not interpolate the data. So we will not interpolate values between two different points. We will use the raw data points as it’s been served by the historian. And the best way for us to visualize that is to actually plot it then as a stairstep graph like we’ve got in this example here. Now, you’ll notice that I’ve actually moved this first point that was stored at 55 or five minutes to six. We actually moved that point forward all on top of the boundary of 06:00. And that’s very important because that is what we call the boundary value. And we will use this value depending on what type of aggregation method we select on top of this raw data. So it’s very important to understand that concept of a boundary value. 

Now, a lot of historians and OpChDa sources, when you query it from six to seven, it will actually go and supply you with that value exactly at 06:00. But just be a little bit careful when you do have transactional type of data that’s potentially inside of a SQL database kind of solution. You need to supply flow with that boundary value condition. So that we can utilize that value inside of the aggregation methods. Else potentially you might get the wrong result from the aggregation method. So it’s very important to understand this concept of a boundary value. All right, so let’s look at the first aggregation method. So let’s look at a minimum. So, minimum is quite simple to understand. So what we will do with the minimum is we obviously look at what is the minimum value within my time frame. 

And it’s very important to understand that, yes, flow will utilize the boundary value as part of this evaluation. So it will take the value because the value is only changed from that initial value at 06:00 at three minute part six. So we have to use this first part in the evaluation to determine what the minimum value for our six to seven time bucket is. So when we do a minimum calculation, we will do that. We will move or include the boundary value in the aggregation method. So in this case, we’ve got two points that is equal. So you’ll notice that the boundary value is 1.6. We also have another value that’s 1.6 a little bit later, around 35 minutes past six. So the minimum of my data set is 1.6. 

We do not use the value that’s outside, obviously not, because that will be utilized in the next hours calculation to see what the minimum is. So from this data set, the minimum aggregation that we can see from the data is 1.6. And that’s exactly what flow will give us. And I’ll show you that in the results when we actually go and compute that. So, minimum is quite easy to understand. A few use cases of where we can use that is obviously ph, tank levels, temperatures, all of that. We can use the minimum kind of range from that perspective, maximum, same kind of scenario as well. Flow will also use the boundary value to use for a maximum retrieval method. So in this case, we will look at that, we will see what the value is. 

And you’ll notice there’s a value of three around 10.6. So from a maximum perspective, we will utilize that point as the maximum value. And that’s the only value that flow will then return as a maximum retrieval of the value from a maximum perspective. Again, use cases for maximums. We can use that to determine the maximum ph for the hour tank levels. Temperatures works very well for specific quality samples. That we might need to determine what was the maximum or the minimum kind of quality sample. We potentially get from a limb system and report on that, as well as pressures to see, what is the maximum value for pressures that we can retrieve by using the maximum aggregation method. Next thing that we have is the range. 

Now the range is utilizing the concept of the min and the max to determine what the range is of a value point for an hour. Again, very important, we will utilize the entire data set plus the boundary value in this calculation. And essentially what we will do is we will say, okay, what was the maximum point in this case, we know it’s three that we’ve achieved in this hour. We will then use the minimum point and as we know, the boundary value was the minimum, which is 1.6. So the range that this value or this data set had within the hour would be the maximum minus the minimum. So the answer for the range within this kind of solution would be three. -1.6 and that will give you a 1.4. 

Now this is very relevant when we look at storage levels, we look at temperature ranges. So how much did the temperature fluctuate within an hour, as well as again, quality samples. How big difference or gap was there between your quality samples within the hour? So it’s very useful to use a range for these types of use cases to look at storage levels, tank levels, as well as our temperature ranges. All right, the next one, what we’ve got is sum. So as Sam mentioned, what we will do with the sum aggregation is we will not use the boundary value for a sum aggregation method. So we will only use the twelve points that we’ve got inside of the hour, and we will use that to add all of these different data points together. 

The reason why we don’t use the boundary value in this case, because for the previous hour, from five to six, we would have utilized that point in the sum for that previous hour. So we only add the points together. That falls within our actual time frame here. So that’s why we only utilize the twelve value points and we will go and add them together to give you a sum of the values for the different values. Now, use cases for this is we see a lot of this being used a little bit more on transactional type of data. 

For example, if you’ve got a transactional record of different weights that’s been recorded or off weighed in a weighing station kind of solution, financial transactions, obviously, if you’ve got cost and you need to sum all the costs together, production transfers, to see how much product has been transferred from one point to another by means probably of a transferring vessel, et cetera, we will use the sum quite aesthetically in this type of transactional data, but it’s not something that we see being used a lot in normal time series related data. It’s more something that we use in transactional type of data. Cool. And if we obviously add all of these different points together, we’ll get to answer of 28.8. All right, then we’ve got the concept of the first value in the hour. 

So there’s an aggregation method that says, what is the first value that we can go and select? Again, we will go and look at the boundary value in this example. Definitely. So the first value that we have in the hour, we will move that boundary point closer to the hour and that we will actually record then as the first value. So we will use the boundary value when we return a first retrieval. So that will be the exact first value in our hourly block. Again, something that we can use this for is when you do maybe a stock taking report or you want to look at tank levels, you can do that. It’s not something we see a lot in time series data, but we do see it when we look at tank levels to do reporting. 

On top of that, just as we have the first value, we also can go and obviously look at the last value. Now, the last value will be the point, exactly the last point that we’ve got within the hour. So we will use that to record what is the last value in the hour. And very interesting. If, for instance, if that last value point was exactly at 07:00 so if we had a point exactly on seven, we will not use that point because that will now be the point for the next hour. So we will still utilize that previous point as the last value point that we have for the hour. And with our sample set, little bit of data that we’ve got here, that value will then be a 2.2. Perfect. 

All right, again, we can use this for tank levels, stock levels, as well as mass balancing kind of equations to see for the last part of the day or the last value of the day, how much mass is still available in potentially inventory locations, how much has been utilized, how much is left, and we can then do a little bit of mass balance calculations by using the lost retrieval method. Cool. We also have a retrieval method that we call delta. Now, a delta retrieval method will use all the values, including the boundary value within the calculation. And what it will do is it will go and compute a difference between each and every point in our data set, and then we will go and add all of those differences together. 

So I’ve extended the sample data a little bit so I’ve got a little different table here as well. And what we do is we say, okay, let’s take point number two, which is 2.3, subtract 1.6 from it, and that will give us our delta of 0.7. And we’ll continue to do that. So we will say, okay, 2.5 -2.3 is to 0.23 -2.5 is zero. Five, 2.1 minus three is -0.9 so we’ll determine the deltas of each consecutive point with the previous point. And then at the end of the day, we will sum all of these delta points together to give me what is the delta of my range. Now again, it’s very critical to understand what the data is going to look like when you apply this delta calculation. 

If this was a totalizer, a delta and a counter retrieval would have probably given you the same result. But very important that a delta will not look at rollovers. As you can see here, it will add a negative drop from a continuous point. Obviously, that’s something that we would cater for or look at when we do a counter retrieval because we need to determine rollovers. But in the delta, it will just subtract it and add all of these different points together for us and give us the sum of the total. So in this case, my delta values is 0.6 for the sample data set that I’ve got here. Again, very important for tank levels reporting or stock levels. It is a very good way to understand are you actually depleting stock out of a vessel or a silo or a tank? 

Or are you actually replenishing stock? Are you busy putting more product or raw material into that vessel or tank? So it’s a very good concept to understand. Are you adding, are you subtracting? If it’s constant, then you’re probably keeping your level by adding and subtracting product. So it’s a very good KPI or metric to understand how to manage your stock levels from a depletion or replenishing perspective. Right. Then we also have count. So literally what we will do in a count retrieval is we will just count how many data points is in the sample set. So in this case, if I look at my sample set here, I’ve got twelve sample sets within the range. We will not count the boundary value in this example. So we will not count this guy as part of a sample in the sample set. 

So in this case, I’ve got twelve samples. So. 1234-5678 910 1112 a little bit of something that you probably won’t use on floats or integer type of data. It’s probably something that you use more in a discrete kind of manner. And the reason why we would probably use it a little bit more in discrete kind of manufacturing is because we want to count stuff like how many times did a machine start today? Or how many times did a valve stroke or cycle. So we will use this very heavy when we start looking at those type of metrics, cycle counts, stroke counts, and machine start and stop counts within allotted time period. All right then, another important one when we talk about machines and stopping and starting machines and all of these types of wonderful things, is what we call time and state. 

Now, a time and state is, again, we will look at how long was a value point in a particular state for the hour, and then flow will return that total duration in milliseconds for that. Again, we will use the stair state to do that, to do the evaluation of how long it was in these different states, and depending on a specific condition or state. So in this example, I can say, let’s look at whenever the value was in state 1.6. Now you can think about it. You can use this as well to say, let’s see how long it is running or active. So you can do this on Boolean type of value points as well. An example that I’ve got here. We’ll say, okay, how long was it in the value 1.6 for now, as we know, we’ve got the boundary value here again. 

So the boundary value was 1.6. So what we will do is we will look at what was the duration from 06:00 till the next change, which is three minutes, 23 seconds past 06:00. And we’ll use that duration to determine how long it was in that 1.6 states. And you’ll also notice that was another time when it went into that state and it was in that states for another 360 seconds. And what we will do is we will actually add these together to give you a total time and state. So it will between the boundary point and zero two, as well as between zero nine and 00:10. Those two durations we will add together, and we will then determine what is the time and state in milliseconds for the raw data points that we’ve got. 

So this is a very powerful thing, especially if we want to go and look at utilization of data points or pieces of equipment. So we’ll use this to say, for the shift or for the hour, how long was my piece of equipment running so that I can do utilization type of calculations, can also use it to say, you know what, for the past shift, how long was I on a particular process step? So how long was I in a CIP process? Or how long was I actually a running or a faulted process. So it is a very powerful retrieval method that we can use to see how well are we utilizing our assets from a time perspective. Cool. Then we’ve got variance so we can go and also utilize our data set to determine the variance of the population. 

In this case, we will use the boundary value as well. And it’s very important to understand that flow will utilize the statistical population when it does a variant aggregation method. So we will use all the data points in the population between six and seven. Now, the best way that I can explain the difference between, there’s two different ways that we can do it, and you can actually do it for standard deviation as well. And that is to use a population variance or standard deviation, or just a standard statistical standard deviation. Now, the best way that I can explain it is to say, you know what, you’ve got four friends, you all had written the same test, and you want to see what is the variance between your test results. 

So you’re using all four of you guys’test results in the populations, you’re using all of the data points to determine that aggregation method. If you had a group of 100 friends who written the same test, but you’re only going to take a sample out of that population to say, let’s just choose ten of the guys to determine that. That’s when we’ll use the standard or the more normal type of standard deviation and variance calculation. But it’s important to note that flow will use every point in the population to determine that. And we’ll do exactly the same thing when we do a standard deviation aggregation method. So again, you can use this for process parameters to see how far the variance is in your specific process parameters, recipe parameters, as well as control parameters. As an example, as I mentioned, standard deviation. 

Yes, we have a method for that as well. Again, we will use the entire population of data points in the hour to determine the standard deviation. So we will do a statistical population of standard deviation to determine that. And with all the points we will see what is the mean or the square root difference of the mean of the data points. In this case, it’s zero point 42 as the standard deviation for my little sample set of data that I’ve got here. Again, you can use this for process parameters, recipe parameters, and to see how well you are controlling by potentially using standard deviation on control parameters to see how far you’re away from set points and process variables perspective. So it’s a good way to determine how far off are you from your optimal or nominal control perspective. 

I know the guys use this quite a lot for if they want to do some six sigma control to see how narrow they can get their standard deviation banned from a process perspective. All right, the counter retrieval, we do get a lot of support questions around this one. There’s two retrieval methods, well, three of them that we get quite confused with a lot. It’s sum, count and counter, and just to understand what those three different ones will do. So if you’ve got a totalizer, normally we’ve got totalizers in our PLCs or on our instruments can do totalizing as well to tell us what is the total of product that went through, potentially a flow transmitter, et cetera. Flow has a very specific retrieval method to deal with that type of data. 

So you’ll notice in my sample data set that I’ve got here is that I’ve got an incremental totalizer, but that incremental totalizer also has what we call a rollover. So sometimes these totalizers in the PLC, they program them to start afresh, potentially at the start of the day, or when just the value is so big that the PLC can’t handle it anymore and you have to restart and start counting again at zero. So what we’ve done from a flow perspective is that we’re looking for these drops and we detect these rollovers automatically, and we will utilize that to add to our counts. Now, counter and delta is very similar. So we will take the point here, minus the point there, and add them all together. 

But the difference is with the totalizer is when it goes negative, we will see that as a rollover where the delta will just subtract it and add it to the sum. We will see that as a rollover and actually add these two points together again to give me a new number from that perspective. So it’s just important to understand the difference between what a delta does and what a counter does. So if we look at our little data set there, very important to understand with totalizers is that the values must increment from the subsequent value, except if there’s a rollover. And I’ll show you guys a little bit how flow will determine that in the example that I’m going to show you very shortly as well. Now, there’s another very important concept with counters. 

And that is that if you look at the previous example that we had, the historian might have recorded a point here. The last point it recorded was 2.3, and it then rolled over in the PLC to start counting afresh. But technically, the value only rolled over at 3.5. Now, the historian didn’t have that last point because the PLC changed too quickly. The polling wasn’t set up correctly to catch that point. So we might have missed that exact rollover point inside of our historian. So the true rollover point was actually 3.5 and not three as we got from the values in the historian. Flow can you can go and configure this as a hard value. 

So if you know that your value always roll over at the value 3.5, you can go and add that in, and flow will utilize that in the equation as well to determine the actual total for the hour. So we will add that. So, in this example, if you did not specify the rollover point, the value will be 3.5. But if you know that the value rolled over at exactly 3.5, we will use that difference between the last point there and obviously the 3.5, and we will go and add that to the total as well. So if you specified a hard, limited rollover amount, then the value that you get from flow will be 3.9. So that’s very important to understand. 

If you hard code a rollover value in flow, then it will utilize that in the calculation when it does detect that a rollover has happened, and that there’s still some play or some room for production in that range as well. Right. Use cases for this obviously totalized values bottle counts what we get from filling machines, weight scales, that does values like that, totalizers, conveyor systems that potentially do totalizing on their weightometers. So we use that in that type of scenarios to get the total for an hour of production. Right, let’s get back to our original little bit of samples, and let’s look at one of our probably, I would say, 80% utilized retrieval methods. And that is the time weighted average retrieval in flow. When you do the drop down, I’ll show you guys that is that we just call it an average. 

But in essence, what we do is a time weighted average. So what we will do is we will actually look at what is the duration that a point is in a specific value. So in this case, the value was 2.3 for a full 247 seconds. The value was 2.5 for a total of 172 seconds. So not only do we look at the actual value. We also look at what is the total duration of these value points, and we will then do a weighted value. So we will utilize the duration. We will utilize the total duration, which is, in this case, an hour, because I’m looking from six to seven, and then I will do what we call a weighted value calculation. So it’s very simple. Again, we utilize the boundary value in this calculation, and it’s exactly that. 

We work out this weighted value, and we do that by looking at how long was the value in a specific duration versus the total duration of the time bucket. In this case, we’re looking at an hour. So my total duration is 3600 seconds. So we will say, you know what, 203 divided by 3600 multiplied by 1.6, and that gives us the weighted value. And again, we utilize the boundary value, but the duration is then only from 06:00 to the change to the next value that we’ve got. And we utilize all of that. And then, obviously, we will add these together to give you a total that’s produced for the hour. Now, this is used quite a lot. We use this on raw transmitter values that are not being set to be totalized or that’s being totalized inside of a PLC. 

So, flow transmitters, weight scale weightometers on conveyor systems, if they give us the raw tons per hour or the flow transmitters, gives us the raw cubes per hour that flows through it. Flow can utilize that and utilize that to determine what the actual flow or the actual tons for the hour was that went over that belt scale or through that production line. So it’s very important to understand that flow does a time weighted average and that we can actually go and do totals of our raw transmitter value. So, pretty much an integral aggregation below the line to give us the total of these things. Very important to understand is your unit of measure of your transmitter. So, if you look at an hourly time frame and your transmitter gives you instantaneous value of cubes per hour or tons per hour, that’s great. 

We can utilize that, and flow will give us the hour. If it is cubes per second or tons per second, then you have to work with the scaling factor to bring that back to an hour, else flow will not give you the correct time weighted average for that. All right, let’s go a little bit into a demo scenario here just to show you guys the little bit of the results of my data set that I’ve got. And then I will also do a little bit of a custom retrieval for you guys as well. All right, so let’s go into flow here. So this is the flow configuration environment. So you’ll notice that I’ve built out a whole bunch of aggregation methods already. So I’ve got a minimum, the maximum, the average, the first, the last, the timing state, et cetera. 

Now it’s very simple and easy to configure this. So if I open up this minimum retrieval here and I go to my retrieval section and flow, you’ll notice that depending on the retrieval method that you want to achieve on your data, it’s literally a drop down selection to determine that. So in this case, you’ll see there’s all of them, the sum, the average, the min, the max, the counter, the delta, the range. All of them are inside of my drop down here. And I can utilize these drop downs, as simple as that to determine what type of aggregation I want to perform on the raw data. I’m currently querying from a SQL database. 

So I’ve got my little query here that’s going to get my data out and I’m bringing back that boundary value that we spoke about so that I can even perform these standard aggregation methods on top of data that sits in a transactional database like a SQL database. Let’s create one from scratch. So I’m going to create a new metric here. I’m going to call it my change maker Friday metric. Let’s go to a data source here. I’ve got a little historian simulator here. So here I’ve got a flow transmitter, a pressure and a temperature. Let’s take the flow transmitter, let’s drag that across to create an hourly retrieval in flow. And obviously if I open that up, I can now go and select again stack standard the retrieval methods as they are listed. I can use that from historical data as well. 

And I can go now and say, okay, let’s do a time weighted average because I’ve got a raw flow transmitter here. Notice that it’s kilograms per second. So remember what I said, very important to understand the unit of measure of your transmitter. So in this case it’s steam production out of a boiler. So I would like to know potentially what is the total tons in the hour? Not the kilograms per second, but tons per hour. Then for you guys, you need to remember to add that as a scaling factor. Now to go to kilograms tons, you divide by 1000 to go from seconds to an hour. That’s 3600. So if you do the math, you would need to supply a scaling factor of 3.6 to convert this tons per hour. I also have the concept of a filter tag here. 

And that’s quite a unique feature where I can say, again, cleaning the data even more before I’m performing the aggregation method. So think about it. I would only like to determine this. If my pump was running and there was actually product going through that line, I would like to determine what the actual totalize is. In this case, I’ve got a boiler, so I would like to say something. You know what, if the temperature is above a certain point, then I know that the boiler is actually active, let’s say 80 degrees, and only then determine what is my total steam output that I can go and collect. So I can utilize these filter tags to do that scenario as well. Now, you’ll notice that I can only do one tag at a time as a filter condition. 

So in this case, I can only use the temperature potentially as a filter condition to filter that out. And obviously if I deploy that, then the flow system will go and create, filter the data out first and then do the time weighted average on the remaining data points to see what the total flow is when the temperature is above 80. All right, now let’s think about something else. I want to now determine how well this boiler is actually producing. Now, when I want to do that, I unfortunately need to look at all three of these parameters. So I want to know if my flow is above a certain point, if my pressure is above a certain point and my temperature is in a certain range, I want to know what is the time that this Boiler is in that particular processing condition. 

So that will tell me how well is this boiler producing. But now, if I go and do exactly that, if I take my steam and I create another metric or another KPI, and this is now my time in state retrieval. You guys remember that we do have a time and state retrieval method here. But again, I can only go and add one filter tag to this, so I can’t filter on the pressure and the temperature again. So in this case, I would need to go and create a custom aggregation or a custom expression to determine what the actual time and state is when all three of my process parameters are in certain processing conditions. So I’m going to do exactly that. 

I’m going to go and select the custom expression here in the box, and that will now give me a little bit of an expression editor where I can now go and create whatever expression I need. So let’s do exactly that again. Little bit of a tick or a trip or a trick. If you double click here at the top left corner, it will actually go and give it to you a nice and a big full screen so you’ve got a nice real estate to go and create your calculation. So in this case, I would like to go and add all three of these points. So I want to look at the flow, I want to look at the pressure, I also want to look at the temperature inside of my calculation. 

And now you can write whatever aggregation method your heart desire inside of here. And it’s fully based on c sharp notation. We do have some examples. So if I drop down here in the example box here and I go to measure, you can see I can either go and create a row based sum or a row based average. And if I select that, it will go and give me a sample expression. And in this case it just adds all three of those points together divided by three to give me an average, a little bit of a statistical average of the values. But that’s obviously not what I would like to do. I wouldn’t now like to know what is the time and state for all three of these things. 

Now it sounds like a very complex calculation that you need to do and go and do you now need to go and sift through all of the raw data points, look at all of if the flows are above a certain point, if the pressures are above a certain point, if the temperatures are above a certain point, then determine how long was it in this state for. So it sounds like quite a hairy and a complex thing, but it’s actually not. And the reason I say that is that we can actually use the built in aggregation methods that flow supplies inside of our calculations. So I can use the time and state aggregation method right here in the scripting example. 

Now I’ve already done it, so I’m just going to copy the script out here and I’m just going to paste it over this example here. And let’s just quickly run through what I’ve done here. And you guys can see how we can utilize our built in aggregation methods inside of flow. So I’m just going to neaten it up a little bit so you guys can see what’s actually happening here. So it’s very simple. Again, I’m running through all the raw data points and I’m saying flow must be greater than 4.1, the pressure must be greater than 1.4 bar and the temperature must be greater than 98. If all three of these processing conditions are achieved, then I know my boiler is in a very good operating space and I want to see how long can it achieve this in an hour. 

So if all of these three conditions are correct. So we’re doing just a questionnaire to say are all of these three in these conditions? Yes. Then return the result of one, else return a result of zero. So actually what I’m doing is I’m transposing these raw process values into a discrete signal to tell me how long was it one and how long was it zero depending of the processing conditions. And I’m just adding that result as a translation row that I can then tell flow to go and utilize our built in time and state aggregations on top of this data. And I’m passing it the points and I’m saying for all of these points that I’ve created now please go and determine if the time and state is equal to one so I can utilize these aggregation methods on our support website. 

We also have examples of how I can actually use the time weighted average aggregation method inside of these equations. So you don’t have to go and try and determine these durations yourself. You can utilize these already created time and state durations and aggregation methods inside of our scripting here. I’m going to verify my results here just to make sure I didn’t do any funny spelling mistakes. And there my calculation is complete. I can go and close this or just minimize it. Again I can say okay, and there I’ve got a custom expression now configured. I can go and deploy this out and the flow engine will do its thing from that perspective. So now I will have a KPI. 

If I expand that and I look a little bit further back in time here, I can see oh, there was a good piece of operation where I was in those processing parameters, but if I look at the raw data, it’s not too often that I’m in that processing parameters. So obviously I need to do some fault fine tuning and fault finding on these process parameters. All right. Then again, I can also see all the results of all of these other aggregation methods that I’ve done. So I’ve created a little dashboard here. Sorry, I’m just going to go and reconnect. I think I’ve got a timeout on my project here, sorry for that. Just want to quickly reset this. There we go. Just going to reset that and I can go and just reset my dashboard here. And there we go. 

Let’s go back to that 06:00 time period that I’ve created for my sample data set and there’s all the results of all the aggregation methods as we saw it. So the min, the max for my data set. So again, exact same data that we’ve pulled here. And these are the results that flow pushed out for us, determining or based on the different aggregation methods that we’ve created based on those two sample sets that I went through in the presentation as well. Now, the last thing I want to do, I know we’re running almost out of time. The last thing I want to show you guys is something that we see quite a lot in tech support as well. And let’s focus here on the counter value. So you guys remember the counter value. 

We will determine this rollover automatically and we will utilize that rollover in our counts. Now, we do get quite a lot of questions that says your Lenny. I don’t know. This counter value of mine is very high. It’s almost double of what it should be. What can be the problem when we look at this? So obviously there’s something wrong with this countervalue. Now, something that we do see in historians quite often is that data do get in a little bit late and some data points actually gets in after a previous point. And we do get the scenario where the data points are not incremental anymore, they actually are less than the previous one. And remember, with the counter retrieval flow will see that as a rollover and add an entire from zero to whatever that value is to the count. 

So that’s potentially when we see these high differences in our counter values. Now the simplest way to know if your totalizer is doing that is to just look at the raw data. So I’ve got a query here that I wrote to get the raw data out of my SQL database, little table here, and there’s the values of my counter. And you’ll see there was a rollover and then it incremented again. Now, if I plot this data on Excel and I do a very simple calculation in excel where I’m just going to minimize this quickly for us, and I do a very simple calculation in excel where I say, you know what, take this value, subtract that value and see if it’s a positive or a negative result. If you do that on that raw data, you’ll notice a few interesting things on the zero. 

Yes, that’s when a rollover occurred. So that’s the only point where I should see a negative value. But I also see a negative value there. And I also see a negative value here. So what’s happening? You’ll notice that the values are not incremental anymore. It goes from 1.7 to 1.6. So there’s a negative value there. So flow will see this as a rollover and flow will then tell you, oh, well, you had to go from 1.7 back to zero and then to 1.6. So flow will add an additional 1.6 to the value, exactly the same here. Flow will then add another 0.1 to the value. And that is why you see that my faulty value is significantly higher than the actual counter value that we do see. 

So that is something that we do see quite a lot in tech support on counter retrievals is that kind of scenario. Now, we do have in flow the capability to add what we call, let’s just do this again. I’m just going to create a counter very quickly, probably on this bottle count here. We do have the capability on this counter retrieval and flow to go and specify a dead band. So if you do get in that situation, you can’t fix the problem in your control layer or in the historian. Obviously, that’s the first place where we need to go and try and fix it. But I could have said, I want a dead band of one, so it must at least change with one complete value before I go and add that inside of the flow system. 

All right, I think that’s enough of me taking time off your Friday. I don’t know if there’s any quick questions and answers that you guys might have. Feel free to ask a few questions if you’ve got else. You guys can just email me or pop me some questions. If you have any questions around what you saw here today from the aggregation perspective.

You might also like