A Report by the Committee on Earth and Environmental Sciences
1992
Program Objectives and Functions
Agencies participating in the USGCRP commit program resources, consistent with their roles and missions, to the goal of an interagency global change data and information management program consistent across agencies and that involves and supports the university, international, and other user communities. The program objectives are grouped into assembling, documenting, archiving, and disseminating (Figure 2).
In the area of assembling, agencies commit to
* gather, manage, and share global change data and information in a manner that conforms to the Data Management for Global Change Research Policy;
* involve actively the research and user communities in identifying the parameters that are key to global change research, in building a priority list of global change research parameters required to achieve the USGCRP research milestones, in assembling and documenting priority data sets, and in developing value-added data sets for subsequent analysis; and
* seek cooperation with international partners to share global change data and information for mutual benefit.
In the area of documenting, agencies commit to
* develop improved documentation for global change data and information, including publication of global change data and information so that users will be well informed about its quality and developers of quality data and information will receive recognition.
In the area of archiving, agencies or their designees commit to
* document and preserve long-term in situ, remotely sensed, and derived global change data and information, both digital and nondigital, so that comparative analyses over decades to centuries can be conducted using high-quality data and information; and
* ensure that appropriate agencies have responsibility for stewardship of the data and information for each parameter.
In the area of disseminating, agencies or their designees commit to
* develop and maintain tools that help users to locate global change data and information, including a comprehensive directory of data and information that describes salient features of key global change holdings, and a catalog and networking infrastructure that makes it as easy as possible to access and acquire those data and information; and
* build upon existing digital and nondigital data and information resources to improve the access to high-quality global change data and information by integrating appropriate activities of agency data centers, archives, libraries, and other information-disseminating organizations, and providing products in appropriate media, depending on the particular user's need.
In the total program, agencies commit to
* use appropriate national, international, and de facto standards to the greatest possible extent to facilitate the archiving and exchange of data and information, to describe the quality of data and information, to improve the compatibility of media as they change over time, to access the data and information by networking, to develop accurate documentation, and to help with the consistency of data and information products and procedures across agencies; and
* solicit the active involvement of the national and international research and user communities to review the effectiveness of the GCDIS in facilitating access to and use of global change data and information and to articulate the changing needs of the GCDIS users over time.
To clarify responsibilities, the commitments are further delineated according to each of four roles participants may have: data source, data repository, research community member, or user. These roles are not exclusive---individuals and organizations often act in multiple roles for different aspects of their work.
Agencies, principal investigators, and research activities that are sources for data
* help identify priority global change parameters within their mission responsibilities and assign stewardship responsibilities to an appropriate data repository;
* acquire or collect mission-related data from in situ or remote-sensed observations;
* assemble data sets from historical or archived sources for retrospective data;
* validate, calibrate, and document data collected under the USGCRP to standards agreed upon within the USGCRP; and
* deliver data and information to the appropriate data center in a timely manner and in concert with the Data Management for Global Change Research Policy.
Agencies with data stewardship responsibilities, data centers, libraries, and other data repositories
* acquire or collect data and information from the appropriate agency or research data sources;
* organize data and information in a manner consistent with the GCDIS guidelines;
* coordinate with other U.S. and international organizations to acquire or share global change data and information;
* preserve data and information for decades and centuries, with media migration strategies coordinated with source agencies and the user community through the USGCRP;
* develop cross-training opportunities for computer and information scientists to interact with global change researchers, including university programs;
* produce value-added data and information products for use by researchers, policymakers, educators, and the public;
* maintain and facilitate directory, catalog, access, and administrative mechanisms coordinated with the GCDIS so that it is timely, affordable, and easy for users to access and use global change data and information;
* use national, international, and de facto standards to the greatest possible extent; and
* provide users with timely and affordable products that are generally available in appropriate media, depending on the particular user's need.
The research community works with agencies and data centers to
* help identify priority global change parameters and align stewardship responsibilities with data repositories;
* assess the data for gaps, omissions, or errors in the coverage for priority data sets and recommend remedies;
* improve documentation of data sets;
* validate selected data and information products through processes such as peer reviews and assessments after actual use;
* assemble long-term and/or global coverage data and information sets;
* deliver selected derived data and information, with documentation, from process studies, predictive models, or assessment activities to the appropriate data center;
* publish data and information when appropriate; and
* participate in projects to assemble retrospective data sets and "data rescue" projects when needed to preserve crucial data that are at risk.
Users of global change data and information
* help identify priority global change parameters and needed products; and
* participate in workshops, communications, and other feedback mechanisms to identify problems and improvements.
The Global Change Data and Information System
Extensive collections of data and information useful for global change research are now supported in the Federal agencies. Many of these collections are managed for specific mission purposes, although the collections are available to researchers through cooperative agreements or as published products. The goal of interagency efforts is for Federal agencies to work with each other, with academia, and with the international community to make it as easy as possible for researchers and others to access and use global change data and information. Toward this end, agencies participating in the USGCRP are organizing the GCDIS, which takes advantage of the mission resources and responsibilities of each agency and links the services of these data and information resources to each other and to the users.
The GCDIS is not some large, central computer center or data repository. Rather, it is distributed among existing facilities in the agencies but with crucial standards agreed upon by the agencies. Each agency retains responsibility for mission data and information, but agrees to provide certain services in concert with other agencies. For instance, instead of data centers sending data to researchers in incompatible media and formats, GCDIS helps make global change data and information available in forms that are consistent with and responsive to user needs. Also, under the GCDIS concept, data centers coordinate procedures for data searches and for handling orders for data products, and existing agency data directories are coordinated to assure that access to any of the directories provides information about the others.
Data and Information Content
The primary sources for global change data and information are agency programs, including those focused on the USGCRP (such as the Earth Observing System [EOS] and other agency global change initiatives) and those contributing to the USGCRP from other agency programs that are not focused on global change. When appropriate, data and information targeted for GCDIS in both digital and nondigital forms include raw data from observation systems, value-added data from data assembly activities, and derived data and information from models and other investigations; long-term as well as short-term data and information sets; historical, current, and future data and information; in situ and remote observations; and references to data and information that exist outside the program, such as major international and national programs (e.g., the World Ocean Circulation Experiment, the Global Energy and water experiment, and the Next Generation Weather Radar [NEXRAD] program).
Clearly, the data and information content of the GCDIS must span many sources: agency data centers, libraries, static and local governments, individual agency programs, university and other researchers, and the international community. Not only will the sources of the data and information be many and varied, but also the scientific disciplines will range over all aspects of the Earth system from its interior to its surface, from its ground water to its oceans, from its atmosphere to solar influences, and human interactions and related economic issues.
To meet its focused objectives, the USGCRP uses a priority structure that includes strategic priorities, integrating priorities, and science priorities. In addition, the USGCRP places priority on interdisciplinary scientific approaches that include the study of
* sources and sinks of greenhouse gases and the processes that control them;
* clouds, water, energy and sea-level change, including radiative balance;
* ecological systems, including species and ecosystem responses;
* economics and human dimensions of global change, including resource economics, and the interactions and effects of human activities on global change; and
* global and Earth system observations and models to provide both global- and regional-scale data bases and models for predicting global change.
While the existing USGCRP priorities help focus the program's contents, there are still many thousands of data and information products of potential relevance, each needing investment of resources to become well documented and accessible. As a result, the global change data and information management program must establish a process for further prioritization. This process for data and information prioritization will be accomplished within the framework of the USGCRP research priorities and milestones as they evolve.
An early attempt to assign a relative ranking for global change research needs was documented in the 1988 report Earth System Science: A Closer View (summarized earlier in Tables 2 and 3). Tables 4 and 5 show that the agencies already manage much data and information relevant to these particular priority needs. It can also be seen that no single agency has data on all the needed variables, and that the data required for each individual variable generally reside in several agencies. Both points demonstrate the need for a coordinated interagency program for the management of global change data and information.
At first glance, the data and information holdings may seem extensive, but it is important to recognize that in total the holdings are far from adequate. For example, the non-satellite data sources in the tables are usually only local and not global, and the satellite data sources, while usually more global, span only limited periods of time. In addition, much of the data cannot be cross-compared with other data or used for purposes beyond the specific agency purposes for which they were collected. Relatively few were collected with meeting the needs of the USGCRP as the primary objective.
It should also be noted that these tables are already dated and as such are meant to be only illustrative pending the development of a data priority structure based on all the USGCRP research priorities. For example, our perspective on global change is continuing to evolve and the critical importance of research into the human dimensions of the global change problem is becoming increasingly better understood. Table 6 lists what are currently considered to be some of the most important types of additional data and information required for such research.
User Community
The primary users of the GCDIS will be global change researchers in agencies, academia, and the international community who conduct process studies, diagnostic and monitoring activities, and integrated modeling investigations; and those researchers, policymakers, and educators who assess the state of global change research and use modeling approaches to provide information for policy decisions.
The user community for the data and information management program is often described as the research community. This community certainly makes up the bulk of data suppliers and users whom the GCDIS will serve, but it is not the entire community. In addition to providing insight into basic research issues, the USGCRP will address vital questions with national and international policy implications. Such products of the GCDIS will serve users concerned more with providing policy information about an issue at a given time than with scientific certainty, or those more concerned with compiling results across disciplines to address a socioeconomic problem than examining how the underlying processes function.
The information revolution in our global society is engendering the expectation that not only researchers and policymakers, but also executives, educators, and private citizens will need access to global change data and information. It is this expanded user community that many present data centers do not serve but to whom the data and information program must be responsive. The GCDIS must serve the congressional aide as well as the chemist, the demographer as well as the oceanographer. The Federal government is already encouraging access by such users through mechanisms such as the Global Change Research Information Office and the National Research and Education Network initiative; these will be intrinsic to a successful GCDIS.
Libraries
Libraries are a pervasive element of our society for all individuals and groups needing information and need to be part of the global change data and information management program. There are more than 32,000 libraries of all kinds in the U.S., including more than 2,200 Government libraries.
Certainly, libraries act as a source of much global change information already, in the natural as well as the human sciences. The collections provide current and historical data, often incorporated in publications. Individual science collections often overlap with those in other disciplines: examples are geology and oceanography, energy and space sciences, climatology and agriculture. The dimensions of science related to humans, including health, settlements, and education, must be accessible as part of the overall GCDIS. Transportation, public works, and urban/regional planning collections cross each other, as do those of the biological and physical sciences. In the research on global change such interdisciplinary access is essential, and the library science professionals have a wealth of experience to offer.
Interdisciplinary access is reflected in the mechanisms that libraries and information centers employ to assist users. Libraries have standardized the ways materials are organized and described in the collections to provide uniformity. The national and international standards employed by libraries and information centers result in the development of catalogs, directories, and related bibliographic products that can be assessed worldwide. Information services that describe international, Federal, State, local, and private sector data sets are available electronically. The descriptions of major library collections, information centers, and bibliographic information systems are available through libraries and networks. Interlibrary loan and document delivery of materials for the client are based on standardized formats.
Libraries can also help users to access the GCDIS. This will be especially critical for small, individual research efforts and for GCDIS users outside the research community. Such a function will be a natural extension of present library capabilities, which include handling data files in addition to such traditional materials as books, technical reports, tapes, films, and on-line data bases. In today's scientific and technical libraries it is commonplace to find CD-ROMs and other electronic media accessible to the end user. Most technical libraries and information centers also provide on-line, remote access for the user, and many are already linked to electronic networks. Most libraries acquire new products and services as soon as they are affordable so that their users will have access to the latest technology and formats.
University libraries often have collections of data used and/or created by their researchers. With the support and encouragement of the GCDIS, some of these university libraries could begin to serve as small data centers that archive and distribute data generated by the university. The librarians could also be called upon to build inventories and describe metadata and data collections for the research community. These would increase the management and accessibility to global change data and would complement the efforts underway at Federal libraries, including those of the National Oceanic and Atmospheric Administration (NOAA) and the U.S. Department of Agriculture's (USDA) National Agricultural Library. Exchange of data and information, linkages among disparate scientific disciplines, and a broader array of users would result. National and international scientific data exchange would be enhanced for the global change research communities.
To effectively use the capability of libraries, library information groups will work with GCDIS toward the goal of having the GCDIS data and information formats, products, and network interfaces compatible with library systems. Building such foundations today will provide fully interdisciplinary and interoperable systems tomorrow. These systems will provide the ability for the user to go from one type of information system to another, whether a system contains data or bibliographic information.
Catalog System
The portion of the GCDIS that helps users to locate global change data and information is the Global Change Catalog System (GCCS). The main purpose of the GCCS is to provide users of the GCDIS sufficient information about global change data and information available throughout the world to determine rapidly what data and information sets would be useful for a selected task. The user normally enters the GCCS (< ahref = "fig3.gif">Figure 3) through electronic connection to the Global Change Master Directory (GCMD), which has brief, summary-level descriptions of the content of data and information sets. Other information in the directory includes location and points of contact for ordering, as well as possible quality assessments and relevant publications. The GCMD will also be available on CD-ROM, floppy disks, and, for selected parts, in hard copy.
In addition to the GCMD, the GCCS includes Federal guides and inventories as well as links with other systems. A guide provides detailed information about the individual data or information to assist the user in determining its applicability. Examples include information on the observing instrument, resolution, processing algorithms, and known limitations. An inventory contains descriptions of the individual data elements at the smallest level in which portions of the data can be ordered. Examples include date, location, cloud cover, and sufficient information to enable ordering. There are links from the GCMD to other directories, guides, and inventories outside the Federal domain and throughout the world that enable access to directory information not contained in the GCMD.
When the GCMD is used over networks or dial-up lines, orders may often be placed and received electronically. Practical considerations make it likely that this level of service would be available only for reasonably sized, key global change data and information. Also, for some data and information sets it is likely that guide and inventory information will be available only off line.
The GCDIS Catalog System Subgroup of the IWGDMGC is serving as the coordinating group for the development, implementation, and evolution of the GCCS. Representatives from member agencies provide liaison to groups within their agencies to assure identification and documentation of relevant data and information sets, interconnection of useful information systems, advocacy for modifying the information systems to make them easier to use in conjunction with each other, and dissemination of information about the work of the Catalog System Subgroup to agency members and the user community. Through interconnected directories, guides, and inventories of coordinating and cooperating nodes of an international directory network, the GCCS will allow users to find and obtain data and information they need from other parts of the globe.
Standards
The rapid pace of technology evolution impacts all areas of data and information management, from collection through analysis to dissemination. Computer processing power, telecommunications networks, and information-handling techniques have made it possible to build massive compilations of data and information that can be easily searched by researchers in a library setting or at their own desks. Another major trend of particular relevance to global change research is the explosive growth in geographic information systems. In these and many other technologies, progress is being paced by the emergence of robust standards.
The global change data and information management program promotes greater reliance on common standards to decrease the cost of managing and using data and information and to increase accessibility and comparability of data and information.
Included are not only formal standards promulgated by international and national standards organizations, but also de facto industry standards and interagency conventions, guidelines, and procedures. Examples of areas where an interagency consensus on standards can be immediately useful are documentation (metadata), data interchange, information search and retrieval, ordering and billing, and applications portability.
Agencies participating in the GCDIS are subject to Federal policy that is explicit in many areas relevant to data and information management (e.g., Open Systems Interconnection, Spatial Data Transfer Standard, and Applications Portability Profile). Interagency standards coordination through groups such as the IWGDMGC Standards Subgroup helps ensure that management and technical staff are aware of applicable standards, that formal waivers are sought when appropriate, and that proposals for new standards are raised appropriately. The application of standards is always tempered by practical considerations, including resource constraints, with emphasis on minimizing disruption of existing operations, but ensuring a steady evolution toward full incorporation by all agencies or their mutually agreed designees.
Networks
Networks provide researchers with access to colleagues and to unique or important experimental data and computation resources. Much of the global change research community is already using networks for electronic mail, data transfer, research libraries, bulletin boards, faxes, data distribution, and interactive sessions. Many of these networks are interconnected, but they do not all have the same capabilities, and navigating within and between the networks can be complex.
The general approach for the GCDIS is to build upon existing networks and ensure that adequate interconnections between networks exist or will be established. Several Federal agencies in the USGCRP operate or sponsor computer communications networks to support their research programs (see Appendix D). Most of these agency networks are currently connected with more than 5,500 research and commercially operated networks of similar technology around the globe---the Internet. Several million people in 40 countries and more than 850,000 computers, from personal computers to multiuser mainframes and super computers, are connected via the Internet, and the numbers continue to grow exponentially.
Just as the extent of the networks and the capabilities of connected computers vary over large ranges, so do the data-handling capacities. As more and larger files are moved among the networks, it is probable that the new generation of networks will not be adequate for more than a few years. Thus, current research for the National Research and Education Network is focused on networks to serve users needing thousands of times more capacity than today. The IWGDMGC sponsors a networking activity to surface current problems and to define future network support requirements for the intense data and information demands that will be generated by global change research.
The Earth Observing System Data and Information System (EOSDIS)
The EOSDIS will be the NASA component of the GCDIS. The EOSDIS is NASA's Earth science data and information system, responsible for managing all of NASA's Earth science data and information. The EOSDIS will receive, process, catalog, archive, and distribute all EOS data and products and will catalog, archive, and distribute NASA Earth Probe and other past, ongoing, and future NASA Earth science space mission data and products. The EOSDIS will provide platform and instrument command and control for the EOS observatories. The EOSDIS will be interoperable with the corresponding data and information systems of the international partners of the International Earth Observing System (IEOS). As a component of the GCDIS, the EOSDIS will be interoperable with the integrating infrastructure of the GCDIS and with other IWGDMGC agency data and information systems and will adopt standards and approaches agreed to by the IWGDMGC to the maximum extent practical.
As NASA's component of the GCDIS, a primary responsibility of the EOSDIS is to provide scientific information and data products to a broad community of global change researchers, as well as to NASA-sponsored investigators. A critical task of the EOSDIS is to provide an Earth science view of all EOSDIS data and information consistent with a comprehensive view of the entire EOSDIS Earth science data base. This will allow users to search across multiple inventories at multiple locations to find combinations of data meeting the same criteria (e.g., spatial and temporal coverage) and, after iteratively narrowing the search as needed, to place a request for the required combination data. Through the GCDIS this Earth science view will be extended to include all participating agencies.
The EOSDIS will be a distributed but coordinated system. Product generation, information management, data archiving and distribution, and user support will be distributed to a set of Distributed Active Archive Centers (DAACs), but will be coordinated on a systemwide basis to preserve the integrity of the EOSDIS as an integrated system. This approach will take full advantage of scientific expertise and institutional heritage and experience of the DAACs, while at the same time providing a measure of control to ensure the availability of an overall consistent Earth science data base to the global change and general Earth science community. DAAC user support staff will assist users with algorithm documentation, processing histories, system configuration, calibration, navigation, housekeeping and quality control information, research bibliographies, browse products, help functions, and so on. See Figure 4 for a schematic of the EOSDIS architecture.