Hydropower

What is an Internet of Water Data Hub?

Lilli Watson, Science Communications Associate, the Nicholas Institute for Energy, Environment & Sustainability, Duke University
September 2022
Internet of Water (IoW) Data Hubs allow one or more users to publish a variety of water data from disparate sources in one place. IoW Data Hubs can be organized by theme or geography and follow IoW Principles.1 They ensure that data and metadata from these disparate sources are standardized before they are published so that they can be seamlessly found and used together. IoW Data Hubs, together with the data discovery tool Geoconnex, are the underlying architecture that makes an internet of water possible.
Water data producers share their data through hubs where other/secondary data users can find and access them. Users then transform data into information that decision-makers can use to improve water planning, management, and stewardship. IoW Data Hubs and Geoconnex are the key elements that connect water data producers, users, and decision-makers to enable information sharing and informed decision-making.

“IoW Data Hubs, together with the data discovery tool Geoconnex, are the underlying architecture that makes an internet of water possible.”

The Internet of Water Network

Internet of Water Network

IoW Data Hub Components

Not all data hubs are IoW Data Hubs. IoW Data Hubs are structured sources of findable, accessible, and usable water data that conform to IoW best practices and specifications, adhere to IoW Principles, and are interconnected with other IoW Data Hubs through Geoconnex. IoW Data Hubs must include four key components: data producers, data wrappers, a data store, and a metadata catalog.

Data Producers: a network of participating individuals and/or organizations that produce data.

Data Wrappers: manual and/or automated processes that transform data from various producers into a common data standard and load them into a data store.

Data Store: the physical location(s) and structure(s) that store the wrapped (standardized) data.

Metadata Catalog: a searchable collection of metadata that points to data in a data store, including APIs to describe and deliver that data.

IoW Data Hub Types

The four components of IoW data hubs can be configured and interact in different ways, largely depending on who—the data producers or the data hub administrators—performs the tasks associated with data wrappers. These tasks include transforming data into standardized formats and loading the standardized data into the data store.

There are four types of IoW Data Hubs. At one end of the spectrum, data producers are primarily responsible for all four components with the exception of the metadata catalog (Type A: Distributed). At the other end of the spectrum, a centralized hub organization is responsible for all four components of the data hub, with the exception of collecting the raw data (Type D: Centralized). In Blended data hubs (Type B and Type C), responsibility for the components is shared between producers and hubs.

Hub Type A: Distributed

In a Distributed Data Hub, producers collect and convert data into a standardized format stored locally. The hub’s metadata catalog can search through local catalogs to pull queried data in real time.

Ideal for: Managing high volumes of similar data types in real-time

Advantages: Low computation and storage requirements for the hub

Limitations: User query is limited

Barriers: Requires significant capacity from all data producers

Examples: EPA Interoperable Watersheds Network; CUAHSI Hydroclient; WaDE 1.0

Internet of Water Hub Type A - Distributed

Hub Type B: Blended, Producers Push Data to Hubs

In Type B Blended Data Hubs, producers collect and convert data into a standardized format before pushing them into a centralized data store managed by the hub.

Ideal for: Regulatory data collected at a daily or higher (sub-daily) frequency

Advantages: Users can make complex queries; accessible when data producers are offline; hubs can ensure data producers meet certain standards

Limitations: Large computation and storage requirements for the hub

Barriers: Requires producer capacity to wrap data and an agreement to share data

Examples: Water Quality Portal; USGS NWIS; Reclamation Water and Information System

Internet of Water Hub Type B: Blended

Hub Type C: Blended, Hubs Pull Data from Producers

Type C Blended Data Hubs pull data into a centralized data store, rather than relying on local entities to push data to the data store as in Type B. This lowers the burden on local entities that may not have the capacity to push data to hubs on a regular basis.

Ideal for: Non-regulatory data collected by multiple producers with varying capacity

Advantages: Users can make complex queries; accessible when data producers are offline

Limitations: Large computation and storage requirements for the hub; hubs have less control over data standards

Barriers: Requires producer capacity to wrap data and an agreement to share data

Examples: NOAA Integrated Ocean Observing System

Internet of Water Hub Type C

Hub Type D: Centralized

Centralized Data Hubs pull raw data from producers and convert the data into a standardized format that is saved in their data store with a metadata catalog.

Ideal for: Storing a few data types across a few producers who have low capacity

Advantages: Users can make complex queries; accessible when data producers are offline

Limitations: Large computation and storage requirements for the hub; hubs must have high capacity to create and maintain wrappers

Barriers: Potential reservations by data producers to allow the hub to standardize data

Examples: National Groundwater Monitoring Network, WaDE 2.0

Internet of Water Hub Type D

Building a Water Data Hub

States across the US are modernizing their water data infrastructure by building IoW Data Hubs. One example is the New Mexico Water Data Initiative, which is a collaborative effort across five state agencies to identify, share, and integrate key water data. IoW data hubs can also be multistate efforts organized around a central theme, such as the Western States Water Council’s (WSWC) Water Data Exchange (WaDE). WaDE integrates water rights, allocation, supply, and use data from the WSWC’s 18 member states. There are also federal water data hubs, like the USGS’s Water Quality Portal, which integrates data from a variety of federal sources as well as many other data producers that contribute to the USEPA Water Quality Exchange. Hubs can also operate on a local scale, like the Haw River Data Hub, which integrates data from the Haw River Assembly’s many water quality monitoring programs into one easy-to-use online platform. IoW Data Hubs can operate at any size or scale, as long as the data they provide is findable, accessible, and usable.

What’s the Right Type of Hub for my Data?

The best type of hub for any given scenario depends on the capacity of both the organization responsible for managing the hub and the data producers who will be contributing their data. Before determining what type of data hub to create, organizations should engage with data producers to understand their capacity limitations.

Summary of Internet of Water Data Hub Types
Summary of Internet of Water Data Hub Types - Attribute Comparison
Non-IoW data hubs often have the data producers and some kind of data store but lack the wrapper and/or metadata catalog. Wrappers (and associated data standards) are essential for data users to put the data to use with other data across a region. The metadata catalog is essential for finding data within and, eventually, across hubs.

The IoW Initiative at the Lincoln Institute of Land Policy’s Center for Geospatial Solutions is developing an open-source tool to help organizations build data hubs. Hubkit is a modular software suite that automates data ingestion, standardization, and publication processes. It includes components to configure the upload of csv or Excel-based templates of data to a standardized database, standard APIs for geospatial and observation data and metadata (the same as those being used by next-generation USGS systems), and publication of websites for individual monitoring locations through Geoconnex. Geoconnex is a family of software and practices that will enable users to find water data by theme and location across IoW hubs through a search index covering all of them.

These two tools are crucial to building the technical architecture necessary to have findable, accessible, and usable water data. They are the key technological elements that make an Internet of Water possible.

1    Some hubs may have an organizational affiliation with the IoW Coalition, such as the Water Data Exchange (WaDE) program of the Western States Water Council, whereas others may simply be a resource for the community, such as the U.S. Army Corps National Inventory of Dams.

Thanks to:

Peter Colohan, Director of the Internet of Water Initiative, Center for Geospatial Solutions, Lincoln Institute of Land Policy

Kyle Onda, Associate Director of the Internet of Water Initiative, Center for Geospatial Solutions, Lincoln Institute of Land Policy

Lauren Patterson, Senior Policy Associate, Nicholas Institute for Energy, Environment & Sustainability, Duke University

Photo Credits

Header Photo: Adam Smigielski, Unsplash

Footer Photo: Andrew Svk, Unsplash

Keep Exploring

Technology Adoption at Public Agencies

Water data are collected by a variety of public agencies, each with its own data standards, formats, and sharing protocols. This fragmentation makes it difficult for data users to access the data they need. In 2021, The Nicholas Institute Water Policy Program completed a Technology Adoption Research Project to learn more about data management at public agencies.

IoW Webinar: SensorThings API

This webinar introduces SensorThings API, an open standard for data providers to publish interoperable data, and data users to build workflows and applications built on standard interfaces applicable across all implementing data providers. You will learn about how to use the API to get data, including from several example data providers. You will also learn how to set up your own.

Geospatial Vector Data

It’s important to be able to share data in ways that are easy for scientists and water professionals to analyze and for developers to use to make tools and communication materials. In this blog, Kyle Onda describes practices that the Internet of Water Initiative at the Lincoln Institute’s Center for Geospatial Solutions recommends for sharing geospatial vector data.