With the rise of IoT management platforms organizations are able to easily gather massive datasets from distributed devices. While previous research has focused on the methods of data streams from IoT devices, few studies have been published on correlating this data for interoperability and action. In this article, I will explain a set of guiding principles to help normalize incoming data from devices with the purpose of deriving both analytical and actionable meaning.
Onboarding of distributed data streams has become a cost-efficient opportunity for many organizations using the cloud-based IoT platforms from GCP, AWS & Azure. For most, this means gathering sensor data from various internet-connected devices deployed across large, and often dynamic, geographical zones. Problems arise when introducing a new data source that provides data with little in common with already established inputs. Typically, this means adding custom logic to handle this new stream, resulting in a tight coupling between these two data sets. For example, a shipping organization may have devices installed on transport vehicles to monitor routes. A new device could be introduced to monitor capacity utilization. The development team could easily form linkages between drivers, routes, and capacity to determine the efficiency of delivery plans.
Consider incoming traffic data. Can we easily reroute vehicles based on outstanding delivery events? What about weather data, gas prices, tire pressure, and cargo-hold temperature? The development team could certainly expand an existing data model, but our system will soon begin to resist the adoption of new devices.
If we ignore security, the onboarding of IoT data follows a standard process flow:
- Readings from the device sensor(s) are encoded and broadcast to a gateway.
- The gateway wraps this message and sends it over an internet connection (note: some devices will have their gateway built-in, while others will use a gateway hub).
- The IoT management platform accepts the message and passes it to the backend.
- The message is routed, based on the message [device] type.
- The message is decoded and normalized to a common API
The first challenge of onboarding IoT data is the typical nature of the messages. For example, the data coming from a smart fridge would be vastly different from the streamlined, encoded readings sent over radio frequencies from parking sensors located in city streets. To overcome this challenge, a common wrapper is used to help route incoming messages to their proper decoder.
While a message field contains the raw sensor payload (typically JSON), we can perform routine without having to unbox it by having a standard message info block. This is where we see the importance of having the sensor(s) and gateways be logically separate. The sensor should only focus on reporting readings. It’s up to the gateway to handle the packaging and shipment of this payload. Also, deployed devices can redirect to different platforms by simply changing or adding the gateways that service them. It is reasonable to expect custom gateways for specific backends.
After onboarding and decoding data in messages, we can focus on the second challenge of IoT data streams: windowing. As with any distributed systems, our architecture must account for duplicate, missing, or misordered messages. In a sequence of network hops, there is always the potential for delivery failure. Since our goal is to make actionable meaning, incorrectly ordered events could be detrimental. Imagine if two temperature readings came in out of order the end-system may incorrectly interpret a spike in temperature as a recovery. The quickest solution is to add a timestamp to the message wrapper to allow to perform proper ordering and backfilling if a message should arrive late. The timestamp reported by the gateway when the message was received. While the timestamp should be contained in the message, we cannot assume all sensors will have synced clocks.
Now that we have a strategy to deal with identifying and windowing, we’re ready to address the final requirement of normalizing IoT data: relating. Services are unable to apply generic logic or analytics to data points without having a relationship. While our Smart City implementation adds an additional level of relationship based on a defined site structure this is not required to achieve meaning. Instead, we focus on the most commonly available datapoint: location, examples:
First, consider 800 devices that report the temperature of manufacturing equipment throughout a factory. Assume an event where twelve devices report a dangerous increase in operational temperature. We could react by turning off this machine and calling for maintenance. However, if we know that these twelve devices are nearby we can react to the potential of a more significant event, such as a fire. Second, consider 3500 devices that detect parking events by using IR/magnetic sensors distributed throughout a city. Assume an event where 600 of these devices report a departure. Update nearby signs to reflect the current count of vacancies. However, if we know that these 600 devices are all located downtown, we can react by changing traffic lights to assist with outbound traffic. By adding latitude, longitude, and altitude data during message wrapping, we’re able to create an implied relationship between all our inbound messages. For gateway hubs servicing multiple devices, an additional lookup may be required at the time of unwrapping. In such systems, the device would need to either self-identify within the payload or be identified by the packaging gateway. Since devices may move (i.e., our proximity relationship to other readings may shift over time), we must also include temp data. Fortunately, we already included timestamp data for ordering purposes. By combining location and timestamp data we’ve normalized all our input spatiotemporally.
But how to integrate this normalized data into an event-driven architecture? Answer in my next article part 2…