Issues and Alternative Approaches for the Integration and Application of Societal and Environmental Data within a GIS Daniel G. Brown Department of Geography Michigan State University East Lansing, MI 48824-1115 Copyright 1994,Daniel G. Brown and M.S.U.BOARD OF TRUSTEES Working Paper No. 3 Rwanda Society-Environment Project Department of Geography Michigan State University East Lansing, MI 48824-1115 April 12, 1994 Abstract A variety of approaches to integrating social and environmental data exist. For both types of data, it is important to maintain the spatial components of the data so that geographical patterns can be used in theory development and planning. The use to which the data are to be put will determine the level of sophistication required in their integration. In this paper, I discuss many issues which arise in the integration of disparate data sets, with special attention to how social and environmental data are different. Policy makers, scientists, and managers may wish to integrate social and environmental data for description, inference, and/or modeling. These activities require increasing levels of sophistication. I suggest a typology of existing methodologies which I refer to as: parallel analysis (no integration), loose integration, and full integration. The primary difference between loose integration and full integration is that full integration is constrained to data structures which permit dynamic, interactive, and spatial modeling of social and environmental data. Loose integration is currently possible to a limited extent for trained individuals on the current, commercially available GIS software. For many applications specialized software is needed but the techniques are described in the geographic literature. Conclusions include the following: in many cases full integration may not be desirable; development theoreticians must provide suggestions about how spatial patterns or spatial coincidences might refute or support a theory or hypothesis for spatial data to be useful for inference and modeling in development studies; GIS is not useful for inference and modeling within some theoretical frameworks; and linkages between and theoretical issues for linking societal and environmental process models need to be better developed. Introduction THIS PAPER ADDRESSES THE INTEGRATION OF SOCIETAL AND ENVIRONMENTAL DATA SETS for the geographical analysis of natural resource management (NRM) strategies in Rwanda, Central Africa. The paper outlines some of the issues which make social and environmental data integration (hereafter referred to as data integration) an important issue and some alternative approaches to data integration. The approaches are presented with some assessment of the limitations of each for using the data in a decision making context. A goal is to suggest a methodological framework for the analysis of social and environmental data which is flexible enough to accomodate multiple research questions. A detailed discussion of the overall goals of the Rwanda Society-Environment Project is provided by Campbell, Olson and Berry (1993). Data integration is necessary for analyzing the processes which lead to land use and environmental change in developing countries. An underlying theoretical basis, which itself represents the integration of social and environmental processes, is required for examining land use and environmental change. A methodological framework is suggested for the use of geographic information systems (GIS) to address natural resource management issues in Africa. Its conceptual starting point is the KITE model of society-environment interactions (Campbell and Olson 1991). The KITE is a conceptual framework for understanding environment and development issues which recognizes both social and environmental agents in the land use systems. The examples are drawn from a pilot study in Rwanda. The activities which might be facilitated using GIS can be characterized by three levels of sophistication: description, inference, and modeling. Each level of sophistication requires a commensurate level of spatial data integration. Description includes the activities described as exploratory data analysis by Tukey (1977). This involves the identification of spatial and statistical trends and patterns in data, and at its simplest involves the creation of map displays. More sophisticated description may be required for the identification of underlying statistical spatial patterns. Description may require no data integration. Inference refers to the confirmation or refutation of hypotheses which are developed out of some theoretical model. Clearly, the hypotheses, and by extension the theories, which GIS can provide inferential support for must be spatial in nature. That is, the theory must suggest a geographical pattern as a result of its basic tenets. Many theories of land use and land use change are fundamentally geographical. Inference may only require loose integration of social and environmental data, more on this point later. Finally, modeling is an activity which involves prediction. The existence of supported alternative hypotheses which are based in theory allows for the extension of knowledge to unknown locations or future times. As suggested implicitly by the KITE conceptual model, spatial models of the interactions between social and environmental systems must include feedbacks and multiple dynamic interactions to provide successful predictions (Campbell and Olson 1991). Therefore, effective modeling and prediction requires the full integration of social and environmental data. The present problem might be characterized as a "poorly- structured problem" (DeMan 1988). The specific questions are not known. Many possible questions exist. The answers to the variety of questions might be addressed through description, inferential analysis, modeling, and/or a combination of the three. In such an instance, flexibility of data formats and management routines is a central issue. It is important that the possibility for the most sophisticated types of analyses be maintained. Implementation of integrated social and environmental data sets ought to be constrained to existing, and possibly inexpensive, software systems and data structures given the requirement of technology transfer to developing countries. By necessity, spatial representations of societal and environmental interactions are assumed realizable within the framework of the KITE. However, the author recognizes that many relationships (e.g., power relationships, household interactions, and farmer perceptions) are not easily characterized spatially. Analyses of such relationships may not be appropriate within the GIS context, though interpretation of spatial patterns should address such issues. Data Integration Issues SOCIETAL AND ENVIRONMENTAL DATA ARE COLLECTED FOR DIFFERENT PURPOSES, at different scales, and with different underlying assumptions about the nature of the phenomena. The subjects of environmental data often exhibit continuous spatial variation (e.g., elevation, soils, precipitation, and temperature). Social phenomena tend to be more spatially discrete (e.g., people, farms, and political units). Exceptions to these generalizations include population density, a continuous social variable, and land cover type, a discrete environmental variable. The differences and perceived differences between social and environmental phenomena have resulted in mapped data which are sometimes incompatible. In every case, however, the variable is discretized to a finite set of spatial units. Geographical data are comprised of spatial (where) and attribute (what) information. Any data integration scheme, no matter how involved, must include methods for managing each of these components. There are several standard approaches to managing spatial and attribute data. The most common spatial data structures include raster and vector (Fig. 1). Raster is a regular geometric tessellation (a grid), whereas vector abstracts geographic objects as points, lin es or areas. Attribute data can be managed as layers (Fig. 2) which are stored in seperate files (called the file processing approach) or through the use of a database management system (DBMS). Relational databases have been used for over a decade to manage spatial and attribute information through the use of data tables which are related through common identifiers (Aronoff 1987). Figure 1. Raster and vector and vector data models (Aronoff 1989; p. 164) A fundamental difference between spatial units as defined for socio- economic data as compared with those defined for environmental data is the nature of assumed internal homogeneity. Clearly, internal homogeneity will be a function of the unit sizes; larger units will have less internal homogeneity. However, spatial units defined for environmental data--either grid cells or irregular polygons--are nearly always designed to be internally homogenous to some degree (e.g., soil units, land cover types). Socio- economic units, on the other hand, are nearly always the product of some political process; they are administrative units. Internal homogeneity may have been one of many considerations in the establishment of unit boundaries, or not considered at all. In some cases the variable of interest varies over areas much larger than the Figure 2. Example of geographic attributes represented as layers (Aronoff 1989; p. 192) administrative unit. In these instances the internal heterogeneity problem is greatest in transition or border areas. Examples of successful applications of GIS are much more abundant in environmental studies than socioeconomic studies. The nature of the data, as described above, and available methods, not necessarily the phenomena themselves, have encouraged this situation. In addition, Dobson (1993; p. 435) argued that this situation is the result of societal priorities. "Traditionally, societies have shown greater willingness to fund research on technology, infrastructure and physical resources than on social or cultural topics. GIS development suffers from the same societal bias that causes human geographers to receive less financial support than physical scientists." He cited the successful use of GIS in economic geography applications, where funds are more plentiful, as an exception to the paucity of social GIS applications. In many ways, the problems of data integration can be solved through data transformation; from one form to another, from one scale to another, and from one aggregation unit to another. Given inherent differences in data sets, data transformations developed for environmental and societal data require different assumptions. A variety of procedures are available for estimating statistical surfaces and/or areal estimates from a network of point samples. Thiessen polygon calculation, geostatistics (kriging and co-kriging), and aggregation of survey samples can all be used in spatial estimation from point samples. The underlying assumption of continuity in the data has given rise to a variety of techniques which permit interpolation of envionmental data to a discretized surface, similar to a raster data structure. Geostatistics (Journel and Huijbregts 1978; Cressie 1991) exploit the spatial dependence in continuous data for unbiased estimation of spatial variables. The same assumption is often not appropriate for societal data, requiring the aggregation of point samples to some areal unit. For example, household surveys which ask yes/no or open-ended questions are quite difficult to transform to a continuous surface. In some cases, however, a statistical surface is appropriate (e.g. population density). Ideally, sampling schemes should be devised with the ultimate goals of analysis in mind. However, the above listed transformation routines reduce the need for such specialized sampling schemes. Spatial sampling of survey data in Rwanda is currently designed to provide statistical significance at the level of the prefecture and agro-ecological zone. Socio-economic analyses have often been reduced to "spread sheet" analysis, where the spatial unit (usually a political unit) becomes the observation and a number of variables are available for each unit. This approach is not useful for incorporating spatial autocorrelation information into the analysis because information about the adjacency (i.e., topology) of spatial units is lost. Additionally, it hinders the integration with environmental data because these data may not be collected using the same spatial unit. Ultimately some common discretization may be required for data integration, especially where dynamic models are to be developed. The raster data structure (or Goodchild's et al., 1992, field view of geographic space) is conducive to spatial statistical analysis because of the regular nature of geographic elements. Distances and lengths of common borders between neighboring pixels are constant. A primary concern with discretization is that of data resolution, i.e. degree of spatial detail. Data resolution is to some extent controlled by source data resolution and to some extent at the discretion of the analyst. In terms of the integrity of a geographic representation, the best data resolution is the finest resolution. The most appropriate resolution will depend on the application. Practically, the finest possible resolving power of data from a map can be determined from its scale, expressed as a representative fraction. Tobler (1988) provides a simple rule, based on the assumption of a 0.5 mm minimum pen width, that the finest resolution appropriate for a map in meters is equal the denominator of the scale fraction divided by 2000. Environmental and societal forces in development and environmental degradation operate at household, local, sub-national and international/continental scales. In order to develop models for understanding this system, data must be generalized and characterized at each of the scales. However, policy is often made at the national level, based on sub-national data. An important goal of data integration for policy making, then, should be the characterization of the multi-scale processes at the sub- national scale. Inferential and predictive analyses require some characterization of error in spatial datasets. Error is propagated through GIS operations, yet it is not always clear how. For example, taking the difference between two uncertain variables will result in a higher uncertainty in the product than will adding the same two variables (Burrough 1986). Raising a variable to a exponent will increase the magnitude of errors. Ultimately, the quality of the data relates directly to the risk of making poor decisions based on those data. Data, and analyses based on those data, need to include variance/variability estimates so that risk can be accounted for in a decision. For example, estimates of the potentials for land degradation need to be presented with alternatives which assess the likely range of possible situations at any location. In spatial estimation, as above, geostatistics provide estimates of the variability of estimated values which can be useful for assessing how well the values have been estimated. Lanter and Veregin (1992) present a possible approach, using error propagation functions, to account for errors in source data and the effects of map analysis on those errors. Estimates of error distributions and variability can be used in Monte Carlo simulations to assess the possibilities that incorrect maps and/or decisions have resulted (Fischer 1990). Fuzzy set theory can also be applied to uncertainty assessment (Robinson 1988). Clearly, these approaches require the integration of GIS with statistical methods to adequately represent error and uncertainty. Two special types of data are available and commonly used, one in the social sciences (surveys) and one in the environmental sciences (remote sensing). The ways in which these data are integrated within a GIS are discussed below. Satellite Data Integration The spatial coverage and multi-temporal aspects of remote sensing make it appealing as a complementary data source to traditional mapped or sampled data. Satellite remote sensing involves the recording of wavelength specific light reflection and transmission from the top of the atmosphere. Image processing can be used to extract thematic information from satellite digital products. Traditionally, the information provided has been categorical representations of the land surface cover type. Interpretation is required for assigning thematic attributes, usually in conjunction with "ground truth" information. Estimation of biophysical parameters using remote sensing, like green-leaf area and biomass, has become an important activity in global-scale modeling efforts (Running and Coughlan 1988). Satellite data are grid-based; consistent with the raster data structure. Satellite data come with a variety of inherent spatial resolutions and from a variety of platforms. The issue of scale can be addressed directly with multiple resolution satellite sensors. Common platforms include the Landsat sensors, thematic mapper (TM) and multispectral scanner (MSS) at 30 m and 79 m resolutions, respectively, and the advanced very high resolution radiometer (AVHRR) at a resolution of 1.1 km. Different sensors provide different levels of information detail. AVHRR is being looked to more and more to provide information for global landcover change monitoring. GIS data layers can aid in the interpretation of satellite information. Similarly, satellite information can provide a covariate for surface estimation from point samples. Langford et al. (1991) demonstrated how remote sensing might be used to improve areally explicit population estimates, in combination with census data. In the context of this project, satellite data can provide land cover and land use information. This information might be interpreted at a variety of levels of detail: forest/non-forest, land use, and quality of land. The land cover information, properly classified, can provide an areally and temporally explicit view of societal and environmental attributes. Land cover is a consequence of both human activity and natural propensities. For this reason, the integration of remote sensing with other environmental and societal data sets is beneficial. Survey Data Integration Surveys of individuals and households provide the fundamental information about individual perceptions and actions for social science research. This information can be integrated with other geographic data sets for people-environment interactions research provided it is collected and managed spatially. Survey data integration presents one of the greatest challenges in social and environmental data integration. Many existing survey data sets are collected and sampled in order provide statistically significant results at a pre-specified spatial aggregation level. In Rwanda, for example, surveys available through the Ministry of Agriculture (DSA) are designed to be statistically significant at the level of the Prefecture, the first sub- national spatial units, of which there are ten. The issue of confidentiality dictates that some aggregation be used. However, spatial information about the samples would permit more detailed modeling of spatial interactions. Spatial information provided with survey data would permit the overlay of survey data with other data sets within a GIS. The aggregation units then could be modified as needed. Surveys are clearly the only way to address certain questions about the way people perceive and interact with the environment. However, there seems to be a fundamental conflict of scale in survey data. In Rwanda, the DSA surveys are statistically valid at the prefecture or agro-ecological zone. However, they represent processes which operate at the household scale. A fundamental question arises, "Are household scale phenomena and processes adequately depicted at the prefecture level?" The aggregation of survey data to some areal unit may actually reduce their usefulness to near nothing. It becomes clear that the spatial locations of the survey data points must be maintained if we are to relate the household information to other large-scale variables. Survey data, because they tend to be geographically sparse and temporally irratic, have limited applicability for national or continental scale modeling. Their most important contribution is for inference. Regularizing the spatial and temporal sampling schemes may alleviate this limitation somewhat. Data Integration Solutions I SUGGEST A THREE-STAGE TYPOLOGY OF INTEGRATION APPROACHES for societal and environmental data sets within a GIS for natural resource management. The question of how much integration is desirable should be addressed in terms of both the theoretical approach of the questioner and the practical concerns of data manipulation and analysis. The goals of analysis, be they description and inference or modeling and prediction, will dictate the degree to which data integration is desirable. "If the researcher is working with highly aggregated, spatial data and is not overly concerned with analytical modelling, then the GIS becomes an expensive tool for the production of maps" (Marble and Peuquet 1993; p. 447). Parallel Analysis The first approach involves the production of analysis products (e.g., maps and tables) for environmental and social analyses separately and comparing the results. This might be referred to as the "hold up two maps and squint" approach and is quite commonly used. No integration is required. The approach relies heavily on the visual interpretation of patterns and spatial coincidence. Loose Integration The term loose integration is used here to refer to data integration techniques which rely heavily on overlay and regional characterization techniques common to many GIS systems. Conversion between areal units or the creation of new areal units which represent the intersection of two sets of units provide mechanisms for comparing disparate data. For example, soil unit data might be summarized for each commune in Rwanda to provide information about soils. Alternatively, new units which represent the intersections of soil and commune units can be created. Data collected and aggregated to some spatial aggregation unit (e.g. Rwanda's communes) can be converted from those units to other spatial units through a variety of approaches. Areal interpolation can be carried out using techniques outlined by Tobler (1979), Flowerdew and Green (1989), Langford et al. (1991), Martin and Bracken (1991), and Goodchild et al. (1993). A common assumption for many of these methods is that the density of the count variable, or the value of a density variable, is constant throughout each of the units (Langford et al. 1991). Goodchild et al. (1993) presented a method which relaxes this assumption by employing "control units" which are more likely to satisfy the homogeneity assumption. Such transformations may mask spatial variability if the resultant units are larger or are more heterogeneous. Loose integration has the advantage of providing information for a set of spatial units which is familiar to or required by a user. Political units are commonly used by governmental decision makers for planning and policy decisions. Therefore, summarization of environmental variables by political units may be desirable. However, by converting from one areal unit to another (e.g. from soil units to political units) the analyses may be biased towards one set of data. Any data aggregation or averaging must be considered in the interpretation of an analysis. Because research problems often are not clearly defined when data collection and data management activities are begun, it is important that transformations which generalize (i.e., diminish detail in) the spatial or attribute components of geographic data be delayed as late in the process as possible. In many instances, policy makers may be most interested in seeing the data as they are without the aid of complicated dynamic models. Loose integration, the basic promise of GIS from its inception, provides an environment for exploring data and their spatial coincidences. Given the difficulties of modeling human behaviour and representing the intricate dynamics of society-environment interactions, the ability of a policy maker to explore a set of data and apply her/his own set of assumptions and understandings goes a long way toward the synthesis sought in the more formal representation of a dynamic model. Policy makers can factor in knowledge of daily events, for example wars and coups and refugee resettlement, which will not be reflected in a data set with the timeliness to be useful. The KITE provides one such approach to data exploration and analysis. Full Integration In order to produce models of interactions between people and the environment, and to incorporate feedbacks into such models, social and environmental data must be in a common format. Additionally, the data must be in a format which is suitable for geographical interaction modeling. Two possible common data models include raster (or grid cell) data and the use of some irregularly shaped administrative or landscape units. Each of these presents a distinct set of advantages and disadvantages. The common use of a regular tessellation of space for environmental modeling may be indicative of the superiority of such data structures for modeling (Steyaert and Goodchild In Press). Although socio-economic models have been typically developed for areal units (usually political units), there may be some advantages to using regular tessellations for these types of models as well. Perhaps the most important advantage is the ability to link environmental and social models. Many effective spatial or distributed environmental models are built around data which are structured in grid cells (e.g., ANSWERS for erosion modeling; Beasley et al., 1982). Some models (e.g., FOREST-BGC for forest biogeochemistry modeling; Running, 1990) are built on irregular spatial units. However, such units are assumed to be homogenous and administrative units cannot fulfill that criterion. Given the existence of grid-based models in geography and elsewhere, the raster data structure would provide a natural structure with which such models can run within the GIS framework. Although human activities usually cannot be modeled deterministically, stochastic approaches to spatial modeling have led to some success. Applications of such an approach include diffusion modeling and spatial interaction modeling. A Methodological Framework for Full Integration in Rwanda The approach described below is based on the assumption that all data can be displayed as statistical surfaces, i.e. that the variables are continuous. Where data are not continuous, statistical surfaces cannot be generated from areal units. The outline below is suggested as one possible approach to data integration which supports modeling of societal and environmental interactions. Data Collection Although this paper is not intended to provide suggestions for institutions regarding the data collection infrastructure, a few comments about data collection are warranted. Data collection issues affecting data integration are both technical and theoretical in nature. Technical issues can be addressed by regularization of spatial and temporal sampling schemes, as well as coordination of information categories which are collected internationally. Sheppard (1993, p. 458) stated that "Most philosophers of science accept that theory informs data collection through its influence on how (questions of which data to collected and what basic categories to use in making observations) are answered." As an example, Sheppard cited changes in occupation categories by the U.S. Census Bureau which have facilitated Weberian studies while hindering Marxian analyses. Ultimately, then, the types of analyses that can be done will be limited by the available data. An assessment of data needs will make its most fundamental contribution to the field of data collection. However, this paper examines techniques which can be used with a wide variety of data sets and was written with the assumption that data are available. Standards for data transfer and data quality--like those set forth for the United States by NCDCDS (1988)--are needed to improve the interoperability of systems on an international scale. Data Management Data management in GIS facilitates the integration of diverse data sets and determines the analyses possible with those data. Data transformation routines facilitate the conversion of data to a common spatial structure. The common structure could be a common set of areal units. As discussed above, the areal units can be limiting, depending on their size and relations to underlying heterogeneity of the surface being represented. In order to maintain the integrity of the surface, I suggest the use of a common statistical surface based on a regular geometric tessellation (e.g. the raster data structure). In this way the areal unit does not bias the analysis towards environmental or societal data. A relatively small neutral unit can represent the variability, as best as is possible, in each spatial unit, and provide a means for integrating the data. Examples of methods for transformation to a statistical surface from the areal units are provided by Tobler (1979) and Martin and Bracken (1991). These methods assume that the variable underlying the areal aggregation scheme is a smooth surface. For computational convenience, Tobler (1979) settled on minimizing the second-derivative as his definition of smoothness. As an example, the 1991 population surface for Rwanda derived from prefecture-level data is displayed in Figure 3. The cell size for the regular tessellation should be smaller than the smallest areal units used for data collection. One appropriate "rule-of- thumb" is to choose a grid cell that is, at largest, one-half the size of the smallest object to be resolved. In Rwanda, the two basic units we wish to combine include communes, 143 units with an average area of 166 km2, and soil units, 1106 units with an average area of approximately 22 km2. Each has been digitized from a map with a base scale of 1:250,000. A cell size of about 1 km2 will be used initially. Working with a compact Figure 3. Pycnophylactic interpolation of 1991 population in Rwanda from prefecture-level data. area the size of Rwanda, somewhat larger than 24,000 km2, the tessellation is not significantly distorted by the curvature of the Earth's surface. However, continental and global analyses are so affected and must address the inadequacies of the square tessellation. For continental or global representations, triangular and/or hexagonal tessellations may be more appropriate, as may hierarchical tessellations (for an example, see Goodchild and Shiren 1992). Tranformation to a common statistical surface does not preclude the use of areal units in later analysis. In fact, transformation between units is a simple matter of re-aggregation (Tobler 1979). The data sets should include membership information from each level of the political and landscape hierarchies for later re-aggregation or scale integration. All data which are collected at a higher resolution than the selected tessellation (e.g. points or finer gridded data) should be maintained at the higher resolution. Data Analysis Spatial and spatio-temporal analyses using the raster data structure are commonly used for land suitability analysis, erosion studies, hazards planning, optimum corrider analysis, spatial pattern characterization, viewshed analysis, and many other applications. Several generic tools for raster analysis have been developed (Tomlin 1990). In most implementations these tools can be linked together, using deduction, to create cartographic models for prediction. The deductive process of cartographic modeling requires a well developed, and specific, theoretical basis. Inductive approaches can also be used in a raster environment for inference and explanation of spatial patterns (e.g., Brown, In Press). Inference, of course, must also be guided by some theoretical foundation. The greatest advantage of the suggested approach to full integration is in the modeling process for predicting particular spatial outcomes (Grossmann and Eberhardt 1992). Hydrology (Beasley et al. 1982), vegetation regeneration (Shugart and West 1980), global climatic circulations (Manabe and Weatherald 1975), and several other natural systems have all been modeled with varying degrees of success using the raster data structure. Examples of similar models in the social sciences are not as abundant. However, HŠgerstrand's (1968) diffusion modeling efforts are good examples of grid based modeling in the social sciences. The diffusion modeling concepts are applicable to modeling disease movements as well as populations and innovations (Cliff et al. 1981). These models are based on assumptions about the stochastic nature of the process of movement and are, therefore, much less deterministic than environmental models. Important areas of future research, then, are in providing answers to the following questions: Is it possible to combine deterministic and/or stochastic models of environmental and societal processes?, How can uncertainties be characterized in the models?, Which processes are most crucial to spatial modeling for such integration?, and How should such models be integrated with non-spatial theoretical concerns (e.g., political and economic power relationships)? Additionally, the issue of whether linking models of human activities with models of environmental processes is desirable must be addressed. Environmental process models are often deterministic, sometimes stochastic. Human behaviour is not deterministic in any sense and can only be modeled through stochastic means. The amounts of variability in model results, therefore, may be somewhat larger for socioeconomic models than for environmental process models. Characterization of error in the models is a central issue. The time and space scales of the dynamic interactions between people and environmental processes must be addressed. If concern for natural resources is over a relatively short time scale, the human activity must also be represented at short time scales--scales for which data are often unavailable. Longer term natural resource problems must be examined with longer term human activities in mind. Similar statements could be made for the spatial scales of representation. If the concern is for processes which operate at the scale of the household, then environmental variability which affects household level decisions need to be mapped. Unfortunately, there are limits to temporal and spatial resolution of data. The analyst must integrate only those data sets which are compatible with respect to time and space scales. Averaging to coarser resolutions is the only transformation option available for addressing the scale problem. It is not possible to transform data from coarse to fine resolution. Conclusions Ultimately, many of the issues affecting our ability to properly integrate social and environmental data within a GIS are institutional. The agencies and bodies charged with collecting data are necessarily those most likely to have impact on our ability to fully integrate social and environmental data. However, some of the problems of societal and environmental data integration cannot be addressed through institutional means. Geographic information systems (GIS), and related technologies, provide some technical solutions to the integration questions. Surficial and areal interpolation provides the foundational methodology on which a data integration scheme might rest. The choice between irregular and regular spatial units is dependent on the application. I have distinguished here between loose and full integration on the basis of the nature of the unit used for analysis. Loose integration, according to my typology, involves the conversion of one set of irregular, polygonal areal units to another or the creation of a whole new set of areal units based on the intersection of two previously defined sets of units. Full integration, which I propose as a necessity for complicated and dynamic spatial modeling, involves the conversion of incompatible spatial data to a common and regular spatial scheme (e.g., the raster data structure). The theoretical issues of data integration are paramount. The theory governing the analysis of society and environment interactions will have implications for which data are needed, which analysis routines are appropriate, and whether or not GIS and spatial analyses are valid methodologies at all. A theory which is not developed to the point of suggesting possible spatial patterns or spatial coincidence does not have any direct need of GIS. I suspect that many of the theoretical approaches for understanding societal and environment interactions are partially spatial. That is, some aspect of the theory would suggest spatial pattern, whereas other processes are non-spatial. A need for the future of application of GIS and societal-environmental interaction studies is a better definition on the part of theorists as to the spatial implications of the theory and how the spatial and the non-spatial components interact. The level of data integration required, then, is dependent on the analytical purposes of the activity. I have outlined one method for what I have termed full integration, defined as that level of integration which permits spatial interactive dynamic modeling of social and environmental processes. This is not to suggest that in every instance full integration is necessary or even desirable. In most case, I suspect, loose integration is sufficient-- thus the popularity of GIS for natural resource management. Standard GIS functions of overlay and boolean logical operators are sufficient for many data exploration forays. As always, the data structure and format for data management will control to some extent the level of sophistication in the data analysis. References Aronoff, S. 1989. Geographic Information Systems: A Management Perspective. Ottawa: WDL Publications. Beasley, D. B., Huggins, L. F. and Monke, E.J. 1982. Modeling sediment yields for agricultural watersheds. Journal of Soil and Water Conservation, 37(2): 113-117. Brown, D.G. In Review. Predicting vegetation types at treeline using topography and biophysical disturbance variables. Submitted to peer review. Burrough, P.A. 1986. Principles of Geographical Information Systems for Land Resources Assessment. Monographs on Soil and Resources Survey, No. 12. Oxford: Oxford University Press. Campbell, D.J. and Olson, J.M. 1991. Framework for environment and development: the Kite. CASID Occasional Papers, 10, East Lansing, MI: Center for the Advanced Study of International Development. Campbell, D.J., Olson, J.M., Berry, L. 1993. Population pressure, agricultural productivity and land degradation in Rwanda: An agenda for collaborative training, research and analysis. Rwanda-Society Environment Project Working Papers, 1, East Lansing, MI: Department of Geography, Michigan State University. Cliff, A.D., Haggett, P., and 1981. Spatial Diffusion: An Historical Geography of Epidemics in an Island Community. Cambridge Geographic Studies, 14. Cambridge: Cambridge University Press. Cressie, N.A.C. 1991. Statistics for Spatial Data. New York: John Wiley and Sons. DeMan, W.H.E. 1988. Establishing a geographical information system in relation to its use: A process of strategic choices. International Journal of Geographical Information Systems, 2(3): 245-261. Dobson, J.E. 1993. The geographic revolution: a retrospective on the age of automated geography. Professional Geographer, 45(4): 431-439. Fischer, P.F. 1990. Simulation of error in digital elevation models. Papers and Proceedings of the Applied Geography Conferences, 13: 37-43. Flowerdew, R., and Green, M. (1989) Statistical methods for inference between incompatible zonal systems. In M.F. Goodchild and S. Gopal, Accuracy of Spatial Databases. L ondon: Taylor and Francis: 239-247. Goodchild, M.F., Anselin, L., and Deichmann, U. 1993. A framework for the areal interpolation of socioeconomic data. Environment and Planning A, 25: 383-397. Goodchild, M., Haining, R., Wise, S. 1992. Integrating GIS and spatial data analysis: problems and possibilities. International Journal of Geographical Information Systems, 6(5): 407-424. Goodchild, M.F., and Shiren, Y. 1992. A hierarchical spatial data structure for global geographic information systems, CVGIP: Graphical Models and Image Processing, 54(1): 31-44. Grossmann, W.D. and Eberhardt, S. 1992. Geographical information systems and dynamic modelling. Annals of Regional Science, 26: 53-66. HŠgerstrand, T. (1968) Innovation Diffusion as a Spatial Process. Chicago: University of Chicago Press. Journel, A.J. and Huijbregt, C.J. 1978. Mining Geostatistics, London: Academic Press. Langford, M., Maguire, D.J., and Unwin, D.J. 1991. The areal interpolation problem: estimating population using remote sensing in a GIS framework. In I. Masser and M.B. Blakemore, eds. Handling Geographic Information. Essex: Longman Scientific and Technical, 55-77. Langran, G. 1992. Time in Geographic Information Systems. New York: Taylor and Francis. Lanter, D.P. and Veregin, H. 1992. A research paradigm for propagating error in layer-based GIS. Photogrammetric Engineering and Remote Sensing, 58(6): 825-833. Manabe, S. and Wetherald, R.T. 1975. The effects of doubling the CO2- concentration on the climate of a general circulation model. Journal of Atmospheric Science, 32, 3-15. Marble, D.F. and Peuquet, D.J. 1993. The computer and geography: ten years later. Professional Geographer, 45(4): 446-448. Martin, D. and Bracken, I. 1991. Techniques for modelling population-related raster databases. Environment and Planning A, 23: 1069-1075. NCDCDS. 1988. The digital cartographic data standard. The American Cartographer, 15(1): 11-141. Robinson, V.B. 1988. Some implications of fuzzy set theory applied to geographic databases. Computers, Environment, and Urban Systems. 12: 89- 97. Running, S.W. and Coughlan, J.C. 1988. A general model of forest ecosystem processes for regional applications. I. Hydrologic balance, canopy gas exchange and primary production processes. Ecological Modelling, 42: 125-154. Shugart, H.H. and West, D.C. 1980. Forest succession models. Bioscience, 30(5): 308-313. Sheppard, E. 1993. Automated geography: what kind of geography for what kind of society. Professional Geographer, 45(4): 457-460. Steyaert, L.T. and Goodchild, M.F. In Press. Integrating geographic information systems and environmental simulation models: a status review. In W.K. Michener, S. Stafford, and J. Brunt, Eds. Environmental Information Management and Analysis: Ecosystem to Global Scales. Philadelphia: Taylor and Francis. Tobler, W. 1979. Smooth pycnophylactic interpolation for geographical regions. Journal of the American Statistical Association, 74(367): 519-536. Tobler, W. 1988. Resolution, resampling, and all that. In H. Mounsey and R.F. Tomlinson, eds. Building Databases for Global Science. Philadelphia: Taylor and Francis, 129-137. Tomlin, C.D. 1990. Geographic Information Systems and Cartographic Modeling. Englewood Cliffs, NJ: Prentice Hall. Tukey, J.W. 1977. Exploratory Data Analysis. Reading, MA: Addison-Wesley.