What is an Internet of Water Data Hub?
“IoW Data Hubs, together with the data discovery tool Geoconnex, are the underlying architecture that makes an internet of water possible.”
The Internet of Water Network
IoW Data Hub Components
Not all data hubs are IoW Data Hubs. IoW Data Hubs are structured sources of findable, accessible, and usable water data that conform to IoW best practices and specifications, adhere to IoW Principles, and are interconnected with other IoW Data Hubs through Geoconnex. IoW Data Hubs must include four key components: data producers, data wrappers, a data store, and a metadata catalog.
Data Producers: a network of participating individuals and/or organizations that produce data.
Data Wrappers: manual and/or automated processes that transform data from various producers into a common data standard and load them into a data store.
Data Store: the physical location(s) and structure(s) that store the wrapped (standardized) data.
Metadata Catalog: a searchable collection of metadata that points to data in a data store, including APIs to describe and deliver that data.
IoW Data Hub Types
The four components of IoW data hubs can be configured and interact in different ways, largely depending on who—the data producers or the data hub administrators—performs the tasks associated with data wrappers. These tasks include transforming data into standardized formats and loading the standardized data into the data store.
There are four types of IoW Data Hubs. At one end of the spectrum, data producers are primarily responsible for all four components with the exception of the metadata catalog (Type A: Distributed). At the other end of the spectrum, a centralized hub organization is responsible for all four components of the data hub, with the exception of collecting the raw data (Type D: Centralized). In Blended data hubs (Type B and Type C), responsibility for the components is shared between producers and hubs.
Hub Type A: Distributed
In a Distributed Data Hub, producers collect and convert data into a standardized format stored locally. The hub’s metadata catalog can search through local catalogs to pull queried data in real time.
Advantages: Low computation and storage requirements for the hub
Limitations: User query is limited
Barriers: Requires significant capacity from all data producers
Examples: EPA Interoperable Watersheds Network; CUAHSI Hydroclient; WaDE 1.0
Hub Type B: Blended, Producers Push Data to Hubs
In Type B Blended Data Hubs, producers collect and convert data into a standardized format before pushing them into a centralized data store managed by the hub.
Advantages: Users can make complex queries; accessible when data producers are offline; hubs can ensure data producers meet certain standards
Limitations: Large computation and storage requirements for the hub
Barriers: Requires producer capacity to wrap data and an agreement to share data
Examples: Water Quality Portal; USGS NWIS; Reclamation Water and Information System
Hub Type C: Blended, Hubs Pull Data from Producers
Type C Blended Data Hubs pull data into a centralized data store, rather than relying on local entities to push data to the data store as in Type B. This lowers the burden on local entities that may not have the capacity to push data to hubs on a regular basis.
Advantages: Users can make complex queries; accessible when data producers are offline
Limitations: Large computation and storage requirements for the hub; hubs have less control over data standards
Barriers: Requires producer capacity to wrap data and an agreement to share data
Examples: NOAA Integrated Ocean Observing System
Hub Type D: Centralized
Centralized Data Hubs pull raw data from producers and convert the data into a standardized format that is saved in their data store with a metadata catalog.
Advantages: Users can make complex queries; accessible when data producers are offline
Limitations: Large computation and storage requirements for the hub; hubs must have high capacity to create and maintain wrappers
Barriers: Potential reservations by data producers to allow the hub to standardize data
Examples: National Groundwater Monitoring Network, WaDE 2.0
Building a Water Data Hub
States across the US are modernizing their water data infrastructure by building IoW Data Hubs. One example is the New Mexico Water Data Initiative, which is a collaborative effort across five state agencies to identify, share, and integrate key water data. IoW data hubs can also be multistate efforts organized around a central theme, such as the Western States Water Council’s (WSWC) Water Data Exchange (WaDE). WaDE integrates water rights, allocation, supply, and use data from the WSWC’s 18 member states. There are also federal water data hubs, like the USGS’s Water Quality Portal, which integrates data from a variety of federal sources as well as many other data producers that contribute to the USEPA Water Quality Exchange. Hubs can also operate on a local scale, like the Haw River Data Hub, which integrates data from the Haw River Assembly’s many water quality monitoring programs into one easy-to-use online platform. IoW Data Hubs can operate at any size or scale, as long as the data they provide is findable, accessible, and usable.
What’s the Right Type of Hub for my Data?
The best type of hub for any given scenario depends on the capacity of both the organization responsible for managing the hub and the data producers who will be contributing their data. Before determining what type of data hub to create, organizations should engage with data producers to understand their capacity limitations.
The IoW Initiative at the Lincoln Institute of Land Policy’s Center for Geospatial Solutions is developing an open-source tool to help organizations build data hubs. Hubkit is a modular software suite that automates data ingestion, standardization, and publication processes. It includes components to configure the upload of csv or Excel-based templates of data to a standardized database, standard APIs for geospatial and observation data and metadata (the same as those being used by next-generation USGS systems), and publication of websites for individual monitoring locations through Geoconnex. Geoconnex is a family of software and practices that will enable users to find water data by theme and location across IoW hubs through a search index covering all of them.
These two tools are crucial to building the technical architecture necessary to have findable, accessible, and usable water data. They are the key technological elements that make an Internet of Water possible.
Thanks to:
Peter Colohan, Director of the Internet of Water Initiative, Center for Geospatial Solutions, Lincoln Institute of Land Policy
Kyle Onda, Associate Director of the Internet of Water Initiative, Center for Geospatial Solutions, Lincoln Institute of Land Policy
Lauren Patterson, Senior Policy Associate, Nicholas Institute for Energy, Environment & Sustainability, Duke University
Photo Credits
Header Photo: Adam Smigielski, Unsplash
Footer Photo: Andrew Svk, Unsplash