Close this search box.
Get an exclusive look at Ignition's latest demos, remote energy management and more.
By Elian Zimmermann
20 November 2023

The Difference Between Historians: Data, Process, Site, Enterprise and Cloud



Ken Wyant
Director of Business Solutions


If you have spent time researching industrial databases, you have likely encountered several labels for historians. Ken speaks about the differences, important considerations and, most importantly, which one you need to get the job done! 


Good afternoon. Those that don’t remember. My name is Ken. Ken from Canary has a nice little ring to it. And I’ve been tasked with kind of explaining the difference between historians, and I’m not really going to contrast between Canary and some of our competitors or some of the open source. I will touch on that a little bit, but it’s really about where the historian resides or should reside or can reside, and that can be a difficult question. But I want to give you a little bit of background about myself. So I’ve been in software development for too many years, almost 30 years now. So when I came out of college, I started my first job at this little mom and pop lawn and garden store.


We sold tractors and mowers and all that kind of fun stuff, and I was tasked to be the parts manager. So I walk in there on day one, and I’m like, okay, where’s the software, the system, the computer? They didn’t own a computer. Everything was manual. Inventory was rows and rows of boxes with parts thrown in them. So a customer came in and asked for a part, and you went and searched for it for five or ten minutes if you found it, 15 minutes if you couldn’t find it, because no one had any idea what we had and what we didn’t have. And so I was like, is it okay if I build a system here? So I was kind of thrown into a position where I wasn’t comfortable doing my job, and I was like, I’m going to build something that’s better.


And so this was, in the mid 90s, didn’t know anything about databases, really. And build a database, build a front end, build an invoicing system and all this and that kind of from scratch. And so it was kind of a cool learning experience. And that led to my next, what I’d call my first real job. I worked for our large healthcare software company. And so, of course, with healthcare, there’s the whole clinical side where you have to collect all the important notes that the nurse gathers from the patient, and then there’s the whole billing and reimbursement side, and talk about massive databases. I mean, our clinical database probably had close to 800 tables. Our financial database had somewhere between five and 600 tables. And so that was my first real exposure to what a database is and what a database needed to be.


I didn’t know anything about indexes, primary keys, anything like that, but you certainly learned pretty quickly. And so I thought I knew how to optimize, tune a database, get the most performance out of a database, did that for about 16 years and then decided it was time for something completely different. And so I ended up at Canary, knew nothing about automation. I didn’t know what a Plc was. I didn’t know what OPC was. You guys love acronyms. I didn’t know what any of those meant. I knew how to do healthcare, billing and reimbursement. But when I got to Canary and started learning about our historian, I was amazed at the speed of the database. I thought I knew how fast I could push a database in a relational or SQL world, and I was amazed at what a time series true database can do.


So let me back up a little bit and give you a little bit of history on Canary so I could quiz you, because Clark told you this morning how long Canary has been around. Does anyone remember our starting year? Oh, I heard an 85. Very good. Sorry, I don’t have any socks to throw out, so I’ll get you a pair later. So, yes, so were around in 1985, but odly enough, while the historian is kind of the core product that we have right now, we didn’t start out as a historian company. We started out as a trending company in 1985. Now, I was in high school then I bought my first computer in college, maybe my senior year of college, when Windows 31 came out. And that was kind of the first. I see someone squinting. They know what Windows three one is.


So some people may not even know what Windows three one is, but that was really the first commercial GUI presentation of a windows operating system. Anyone know when that came out? I kind of gave it away, maybe 1992. So were doing trending for seven years before Windows GUI even existed. So were doing trending in dos. Okay, that’s not fun. 85, we start doing trending. And there was a couple of specific data sources that we knew how to collect data from and present that in a graphical nature on screen. But we had a couple of those implementations where people complained about performance or that the trends weren’t very reactive, which at nemesdas, I don’t know how reactive they thought they should be.


But in any case, we realized that if were going to have a graphical tool, then we don’t want to be the cause of the performance issues that people are having. And we realized that the data sources were those crutches. And so that’s when we decided we’re going to build our own data source. We’re going to call it a historian. And so shortly in, I don’t know, 86 or 87, we kind of built our first historian and kind of the rest is history. So trending has always been at our core as well. We’ve always had a really strong trending package, but historian is where we kind of found our niche. And I’ve always focused on building a really strong, really performant historian. And so the funny thing is, our historian is not a SQL or relational database.


We are a no SQL, as Clark said this morning. And the funny thing about that is were no SQL before SQL even existed. So we beat SQL Server. I think SQL Server came out in 1989, I believe. And so, funny enough, were no SQL before that term could even exist. So what is a data historian? This isn’t my definition. Each sentence I stole from a different website and combined them together. It was kind of the best parts of a definition that I could find. But I look at ad as a data historian is a type of software specifically designed for capturing and storing time series data from individual operations. Data historians are commonly used where reliability and uptime are critical. And the key words there is time series. Time series. Wow, I just lost my train of thought.


Time series requires us to record the data differently than if it’s a relational database. And this is where kind of our secret sauce comes in on how we can really push systems to perform well, to write as fast as we can, to read as fast as we can. And there’s actually only a handful of true historian companies out there. Sure, most SCADA systems, hmIs, have some sort of historian component with them, but that’s not their focus. They put very little resources into it. They provide it just as a convenience. But honestly, that works out well for us. We have a couple of SCaDA systems that we interface with in the US, and the reason we do so well is their historian product is horrendous. And so people that use that and want long term storage want performance, they come and find us know.


So we do really well in certain market segments, such as oil and know. When I started at Canary, I’d been there for three months. I got thrown on my first oil and gas project, and somehow I became Canary’s oil and gas expert. And so that’s mainly the field that I work in. I work with a lot of customers, and we have a couple of SCaDA systems we come up against all the time, and that’s how we acquire those customers is because the SCaDA systems, HMI historian components, don’t perform well. So as I said, time series is a very unique component. So our whole methodology is built around indexing based on time. So I have customers that have been with us for 20 plus years. And I visited one of them last year, in fact.


And I couldn’t believe that they had 21 years of history online inside their historian. And I was like, wow. Why? He’s like, because we can. Disk space is so cheap and it’s not a huge implementation. And I said, well, how’s it performing? He goes, it’s great. I said, well, show me a trend from 2005. And sure enough, he brings up our trending 60 days, 2005, and it fills in like that. And I was like, that’s pretty cool. That’s what we build for. But the way we do that is we store data very specialized. We use binary storage, we use run length encoding, which means from record to record, we’re not writing a full timestamp every time. We’re writing deltas.


We have some really good compression algorithms to be a small footprint on disk, but be very performant, going from record to return records to users. As Clark mentioned this morning, we have lossless compression. Some of our competitors don’t do that. They have algorithms that if a value isn’t of a significant enough change, they’ll ignore it and not write it to the database. We’ve always taken the approach that whatever we receive, we’re going to record in the historian so that a user can retrieve that value later on. So we don’t interpolate, we don’t down sample, we don’t require you to aggregate at some point, we don’t require you to trim or prune or purge if you want to, you can set that up, but we don’t require that. And the performance is going to maintain over time, regardless of how many years you have online.


Now, one thing that we’re not good at, since we are really indexed on time, is people come to us and say, I need to find every time this tag has been above a value of 100. That’s a little more difficult for us. We have to read every record to be able to tell you that. But if you want data from yesterday or data from a year ago, that’s where we shine. And that’s where, along with our trending and some of our other tools, the speed can come in. Excuse me. So when you start comparing us to some other options, I already mentioned a lot of ScaDA systems have an HMI component built in. In the US we run up across. One of our biggest competitors is Osipie.


I don’t know how big of a presence they have here in South Africa, but they’ve been doing the same thing as us for maybe just a little bit longer. They actually beat us to market by maybe six months. We’ve kind of been the two true historians in the US market for quite some time. They are the Rolls Royce and we are maybe the common car, the Volkswagen Polo. Over here we get you from a to b, just as they do. We just don’t charge you an arm and a leg. Of course, there are some open source options. We talked about Alex from your team. We talked to him a little bit this morning. We try to not present ourselves as just a database. We do have other modules, services that we feel provide value to the system.


That’s not to say there aren’t people that don’t just use us as a data store and don’t use our clients, don’t use our modeling, don’t use any of the other components. But we do run up against people want us to compare ourselves to influx or to timescale DB. Those are open source options. They work. In some cases, they work pretty well. In some cases, we’ve benchmarked against them. Not that you need to always be running in a million values per second, but our performance can exceed theirs. We are a commercial product, so if you have an issue, you have someone to call, you have element eight team here that has direct access to us. There’s some other technical things that we’ve discovered as we’ve tried to compare ourselves to influx or to timescale DB.


They use a right ahead log, which gives you a little more exposure to data loss if a computer would go down, or things like that. There’s minor technical things we never present ourselves as. I must have option to every customer. We know there are situations where we’re not the right fit and we’re okay telling a customer that from time to time. So we’re not going to berate you if you decide to go with influx. On certain projects, we understand that budget is always a consideration and we’re not always a good fit. Depending on size and scale of the system, sometimes the Scada HMI historian component is going to be good enough for the user, and that’s okay. So let me move on to a picture that looks something like this. How many of you have heard of the Purdue model?


Other than element eight in the back? Has anyone heard of Purdue melt? Okay, so in the US, this is a very common architecture stack that gets utilized and you guys maybe are using parts of it without realizing the name that maybe we call it in the US. So the Purdue model is basically your network layers or the layers of implementation of where your controls sit, your plc sit, where your ScADA sits, and so forth. So when we start talking about historians and where the historian should reside, lots of times we’re referring to this network stack. So I know it’s really hard to read, but here at level zero, we’re starting at the bottom. This is your field devices, your sensors, and then level one, you have a controller, wan.


So you have a bunch of controllers that could be your plcs that are collecting the information from these sensors. And then we get to what they call level two, which is where your HMI, your skate is going to sit. So in this case, probably your ignition instance, out at a site somewhere. Typically, we’re not at this level. We’re not going to reside at the same layer as a SCaDA system, but we are going to try to get our data collector at that level, though. Our philosophy on data collection is we always want to be as close to the source as possible to eliminate any technical hiccups that could cause us to lose any data.


So if we don’t sit at this same layer, and maybe we sat a layer higher if there was some hiccup where this machine can’t communicate to this machine, now we’re losing data. So when we present, we always want to try to have a data collector alongside at the same network layer, because our collectors in our sender, which is part of our store, and forward, have buffering capabilities. So if there is a disconnect between layers or a firewall hiccup, that’s okay. We’re buffering data. At this point, no data loss is going to occur. So typically, canary comes in at maybe a level three. So this is the first place that you’re going to see a historian. So we’re not at the ScaDa layer, but we’re still on OT side at this point. We’re still not on a corporate IT network.


We’re still in the OT world. And this is where Canary has made a living for probably the first 25 years of our business. We were alongside in a control room for a while. We probably estimated two thirds of our computers where Canary was installed weren’t connected to the Internet, which made licensing always a manual process. Because of how we do our licensing, we would love to have our machine reach out to the Internet, grab the key auto license, and away you go. But that’s not possible when we’re sitting down here at the control layer. Typically, these are unconnected from the Internet. You’re behind firewalls, and you’re protected. And so, as I said, this was where we made our living for many years.


It probably wasn’t until ten years ago that we had our first real major installation at what they call level four, or the enterprise zone. So this would be like corporate or regional data center that’s bringing multiple sites together into more of a corporate historian. And so lots of times we may have a historian at site that people are using, and maybe they don’t keep all the history at site. Maybe they are only going to keep three years, five years, something like that. But then their corporate instance is where they’re going to store everything forever. And this becomes kind of their main reporting. It’s how they provide data to data analysts.


This has become a lot more prevalent as the Otit fight has kind of intensified because, as I said, when we made our living down here, if someone needed data, they would just start poking holes in firewalls. Oh, you need that? Okay, we’ll open up a port for you. Oh, you need that? Okay. We’ll give you a remote desktop, or we have a vendor that we want to work with that’s going to do some analysis on our equipment. Okay, well, we’ll let them tunnel down in and grab the data out at this layer. And the more and more that you have security breaches and things like that has become a no. And so they really want to protect this. This is where your control is occurring at these layers.


And so we don’t want outside vendors, we don’t want business analysts coming down into this layer. So lots of times we’ll have a copy of the data here, and we’re going to push up through this DMZ zone, and we’re going to end up in a enterprise zone at level four. And level four is kind of an overarching. It allows management, it allows all those business level people to have access to the same data without compromising security or compromising the risk of them pressing a button that, who knows what that button could have done. We see a lot of stuff at the enterprise zone. And then, of course, beyond enterprise, up here we have our cloud. Okay? And so we do have cloud offerings. We do some of our own hosting for clients. But I read a pretty funny quote once.


It said it was like a cartoonish thing and have one character talking to another one. It said, you realize the cloud is just the server in someone else’s data center. So along with cloud can come more expenses, increase cost in the long run. We used to have some customers that said, oh, we are cloud first. When went to them and they deployed everything to the cloud, then after three years they said we’re cloud last because it was way too expensive. And when you look at pricing, we host an AWS, we host an azure, you look at the pricing and you’re like, oh, it’s just two and a half cents per hour and you don’t think about how that accumulates over time.


But when we’re talking to customers though, these are discussions that we have about how many historians, where these historians going to reside and it really comes down to who’s the consumer of that data. So just as a company, as our client internally they have clients as well. And do those clients want maybe our trending or our visualizations in a control room? If so, then we’re going to probably have to have a historian at site or at level three. Do you have business analysts? Do you have management that needs a summary view of all your operations? Well, you probably don’t want them bouncing into every site historian to be able to access that. Well, maybe we can aggregate all our tags and all our data up to a corporate or level four historian and allow that overview or that overarching look at all their operations.


Is level four right? Or are you going to go to the cloud? Lots of people are Microsoft shops, so they have Azure readily available to them. And we have lots of implementations that are running in Microsoft Azure. In some cases that’s easier because not everyone has large it shops. Maybe they outsource their it. And so cloud ends up being a good offering for them and a good fit. And so when we’re talking to customers, it’s a discussion just as if you’re a system integrator in here and you have a project. You know, architectures always has to be one of the first discussions. And so when we get on calls with customers, we’re trying to understand what their needs are so that we can make the appropriate recommendations on where a historian should reside.

You might also like