Issues and Alternative Approaches for the Integration and Application of Societal and Environmental Data within a GIS


Daniel G. Brown

Department of Geography
Michigan State University
East Lansing, MI 48824-1115

Copyright 1994,Daniel G. Brown and M.S.U.BOARD OF TRUSTEES


Working Paper No. 3

Rwanda Society-Environment Project
Department of Geography
Michigan State University
East Lansing, MI 48824-1115


April 12, 1994


Abstract

	A variety of approaches to integrating social and 
environmental data exist.  For both types of data, it is important 
to maintain the spatial components of the data so that 
geographical patterns can be used in theory development and 
planning.  The use to which the data are to be put will determine 
the level of sophistication required in their integration.  In 
this paper, I discuss many issues which arise in the integration 
of disparate data sets, with special attention to how social and 
environmental data are different.  Policy makers, scientists, and 
managers may wish to integrate social and environmental data for 
description, inference, and/or modeling.  These activities require 
increasing levels of sophistication.  I suggest a typology of 
existing methodologies which I refer to as:  parallel analysis (no 
integration), loose integration, and full integration.  The 
primary difference between loose integration and full integration 
is that full integration is constrained to data structures which 
permit dynamic, interactive, and spatial modeling of social and 
environmental data.  Loose integration is currently possible to a 
limited extent for trained individuals on the current, 
commercially available GIS software.  For many applications 
specialized software is needed but the techniques are described in 
the geographic literature.  Conclusions include the following:  in 
many cases full integration may not be desirable; development 
theoreticians must provide suggestions about how spatial patterns 
or spatial coincidences might refute or support a theory or 
hypothesis for spatial data to be useful for inference and 
modeling in development studies; GIS is not useful for inference 
and modeling within some theoretical frameworks; and linkages 
between and theoretical issues for linking societal and 
environmental process models need to be better developed.


Introduction

	THIS PAPER ADDRESSES THE INTEGRATION OF SOCIETAL AND ENVIRONMENTAL DATA 
SETS for the geographical analysis of natural resource management 
(NRM) strategies in Rwanda, Central Africa. The paper outlines 
some of the issues which make social and environmental data 
integration (hereafter referred to as data integration) an 
important issue and some alternative approaches to data 
integration.  The approaches are presented with some assessment of 
the limitations of each for using the data in a decision making 
context.  A goal is to suggest a methodological framework for the 
analysis of social and environmental data which is flexible enough 
to accomodate multiple research questions.  A detailed discussion 
of the overall goals of the Rwanda Society-Environment Project is 
provided by Campbell, Olson and Berry (1993).
	Data integration is necessary for analyzing the processes 
which lead to land use and environmental change in developing 
countries.  An underlying theoretical basis, which itself 
represents the integration of social and environmental processes, 
is required for examining land use and environmental change.  A 
methodological framework is suggested for the use of geographic 
information systems (GIS) to address natural resource management 
issues in Africa.  Its conceptual starting point is the KITE model 
of society-environment interactions (Campbell and Olson 1991).  
The KITE is a conceptual framework for understanding environment 
and development issues which recognizes both social and 
environmental agents in the land use systems.  The examples are 
drawn from a pilot study in Rwanda. 

	The activities which might be facilitated using GIS can be 
characterized by three levels of sophistication:  description, 
inference, and modeling.  Each level of sophistication requires a 
commensurate level of spatial data integration.  Description 
includes the activities described as exploratory data analysis by 
Tukey (1977).  This involves the identification of spatial and 
statistical trends and patterns in data, and at its simplest 
involves the creation of map displays.  More sophisticated 
description may be required for the identification of underlying 
statistical spatial patterns.  Description may require no data 
integration.  Inference refers to the confirmation or refutation 
of hypotheses which are developed out of some theoretical model. 
Clearly, the hypotheses, and by extension the theories, which GIS 
can provide inferential support for must be spatial in nature. 
That is, the theory must suggest a geographical pattern as a 
result of its basic tenets.  Many theories of land use and land 
use change are fundamentally geographical.  Inference may only 
require loose integration of social and environmental data, more 
on this point later.  Finally, modeling is an activity which 
involves prediction.  The existence of supported alternative 
hypotheses which are based in theory allows for the extension of 
knowledge to unknown locations or future times.  As suggested 
implicitly by the KITE conceptual model, spatial models of the 
interactions between social and environmental systems must include 
feedbacks and multiple dynamic interactions to provide successful 
predictions (Campbell and Olson 1991).  Therefore, effective 
modeling and prediction requires the full integration of social 
and environmental data.

	The present problem might be characterized as a "poorly-
structured problem" (DeMan 1988).  The specific questions are not 
known.  Many possible questions exist.  The answers to the variety 
of questions might be addressed through description, inferential 
analysis, modeling, and/or a combination of the three.  In such an 
instance, flexibility of data formats and management routines is a 
central issue.  It is important that the possibility for the most 
sophisticated types of analyses be maintained.

	Implementation of integrated social and environmental data 
sets ought to be constrained to existing, and possibly 
inexpensive, software systems and data structures given the 
requirement of technology transfer to developing countries.  By 
necessity, spatial representations of societal and environmental 
interactions are assumed realizable within the framework of the 
KITE.  However, the author recognizes that many relationships 
(e.g., power relationships, household interactions, and farmer 
perceptions) are not easily characterized spatially.  Analyses of 
such relationships may not be appropriate within the GIS context, 
though interpretation of spatial patterns should address such 
issues.


Data Integration Issues

	SOCIETAL AND ENVIRONMENTAL DATA ARE COLLECTED FOR DIFFERENT PURPOSES, 
at different scales, and with different underlying assumptions 
about the nature of the phenomena.  The subjects of environmental 
data often exhibit continuous spatial variation (e.g., elevation, 
soils, precipitation, and temperature).  Social phenomena tend to 
be more spatially discrete (e.g., people, farms, and political 
units).  Exceptions to these generalizations include population 
density, a continuous social variable, and land cover type, a 
discrete environmental variable. The differences and perceived 
differences between social and environmental phenomena have 
resulted in mapped data which are sometimes incompatible.  In 
every case, however, the variable is discretized to a finite set 
of spatial units.
	Geographical data are comprised of spatial (where) and 
attribute (what) information.  Any data integration scheme, no 
matter how involved, must include methods for managing each of 
these components.  There are several standard approaches to 
managing spatial and attribute data.  The most common spatial data 
structures include raster and vector (Fig. 1).  Raster is a 
regular geometric tessellation (a grid), whereas vector abstracts 
geographic objects as points, lin


es or areas.  Attribute data can be managed as layers (Fig. 2) which are 
stored in seperate files (called the file processing approach) or through the 
use of a database management system (DBMS).  Relational databases have been 
used for over a decade to manage spatial and attribute information through the 
use of data tables which are related through common identifiers (Aronoff 
1987). 


Figure 1.  Raster and vector and vector data models (Aronoff 1989; p. 164)

	A fundamental difference between spatial units as defined for socio-
economic data as compared with those defined for environmental data is the 
nature of assumed internal homogeneity. Clearly, internal homogeneity will be 
a function of the unit sizes; larger units will have less internal 
homogeneity. However, spatial units defined for environmental data--either 
grid cells or irregular polygons--are nearly always designed to be internally 
homogenous to some degree (e.g., soil units, land cover types).  Socio-
economic units, on the other hand, are nearly always the product of some 
political process; they are administrative units.  Internal homogeneity may 
have been one of many considerations in the establishment of unit boundaries, 
or not considered at all.  In some cases the variable of interest varies over 
areas much larger than the


Figure 2.  Example of geographic attributes represented as layers (Aronoff 
1989; p. 192) 


administrative unit.  In these instances the internal heterogeneity problem is 
greatest in transition or border areas.  

	Examples of successful applications of GIS are much more abundant in 
environmental studies than socioeconomic studies. The nature of the data, as 
described above, and available methods, not necessarily the phenomena 
themselves, have encouraged this situation.  In addition, Dobson (1993; p. 
435) argued that this situation is the result of societal priorities. 
"Traditionally, societies have shown greater willingness to fund research on 
technology, infrastructure and physical resources than on social or cultural 
topics.  GIS development suffers from the same societal bias that causes human 
geographers to receive less financial support than physical scientists."  He 
cited the successful use of GIS in economic geography applications, where 
funds are more plentiful, as an exception to the paucity of social GIS 
applications.   

	In many ways, the problems of data integration can be solved through 
data transformation; from one form to another, from one scale to another, and 
from one aggregation unit to another. Given inherent differences in data sets, 
data transformations developed for environmental and societal data require 
different assumptions.  A variety of procedures are available for estimating 
statistical surfaces and/or areal estimates from a network of point samples.  
Thiessen polygon calculation, geostatistics (kriging and co-kriging), and 
aggregation of survey samples can all be used in spatial estimation from point 
samples. The underlying assumption of continuity in the data has given rise to 
a variety of techniques which permit interpolation of envionmental data to a 
discretized surface, similar to a raster data structure.  Geostatistics 
(Journel and Huijbregts 1978; Cressie 1991) exploit the spatial dependence in 
continuous data for unbiased estimation of spatial variables.  The same 
assumption is often not appropriate for societal data, requiring the 
aggregation of point samples to some areal unit.  For example, household 
surveys which ask yes/no or open-ended questions are quite difficult to 
transform to a continuous surface.  In some cases, however, a statistical 
surface is appropriate (e.g. population density). 

	Ideally, sampling schemes should be devised with the ultimate goals of 
analysis in mind.  However, the above listed transformation routines reduce 
the need for such specialized sampling schemes.  Spatial sampling of survey 
data in Rwanda is currently designed to provide statistical significance at 
the level of the prefecture and agro-ecological zone.  Socio-economic analyses 
have often been reduced to "spread sheet" analysis, where the spatial unit 
(usually a political unit) becomes the observation and a number of variables 
are available for each unit.  This approach is not useful for incorporating 
spatial autocorrelation information into the analysis because information 
about the adjacency (i.e., topology) of spatial units is lost. Additionally, 
it hinders the integration with environmental data because these data may not 
be collected using the same spatial unit.

	Ultimately some common discretization may be required for data 
integration, especially where dynamic models are to be developed.  The raster 
data structure (or Goodchild's et al., 1992, field view of geographic space) 
is conducive to spatial statistical analysis because of the regular nature of 
geographic elements.  Distances and lengths of common borders between 
neighboring pixels are constant.  A primary concern with discretization is 
that of data resolution, i.e. degree of spatial detail.  Data resolution is to 
some extent controlled by source data resolution and to some extent at the 
discretion of the analyst.  In terms of the integrity of a geographic 
representation, the best data resolution is the finest resolution.  The most 
appropriate resolution will depend on the application.  Practically, the 
finest possible resolving power of data from a map can be determined from its 
scale, expressed as a representative fraction.  Tobler (1988) provides a 
simple rule, based on the assumption of a 0.5 mm minimum pen width, that the 
finest resolution appropriate for a map in meters is equal the denominator of 
the scale fraction divided by 2000.

	Environmental and societal forces in development and environmental 
degradation operate at household, local, sub-national and 
international/continental scales.  In order to develop models for 
understanding this system, data must be generalized and characterized at each 
of the scales.  However, policy is often made at the national level, based on 
sub-national data.  An important goal of data integration for policy making, 
then, should be the characterization of the multi-scale processes at the sub-
national scale. 

	Inferential and predictive analyses require some characterization of 
error in spatial datasets.  Error is propagated through GIS operations, yet it 
is not always clear how.  For example, taking the difference between two 
uncertain variables will result in a higher uncertainty in the product than 
will adding the same two variables (Burrough 1986).  Raising a variable to a 
exponent will increase the magnitude of errors. Ultimately, the quality of the 
data relates directly to the risk of making poor decisions based on those 
data.  Data, and analyses based on those data, need to include 
variance/variability estimates so that risk can be accounted for in a 
decision.  For example, estimates of the potentials for land degradation need 
to be presented with alternatives which assess the likely range of possible 
situations at any location.  In spatial estimation, as above, geostatistics 
provide estimates of the variability of estimated values which can be useful 
for assessing how well the values have been estimated.  Lanter and Veregin 
(1992) present a possible approach, using error propagation functions, to 
account for errors in source data and the effects of map analysis on those 
errors.  Estimates of error distributions and variability can be used in Monte 
Carlo simulations to assess the possibilities that incorrect maps and/or 
decisions have resulted (Fischer 1990).  Fuzzy set theory can also be applied 
to uncertainty assessment (Robinson 1988).  Clearly, these approaches require 
the integration of GIS with statistical methods to adequately represent error 
and uncertainty.

	Two special types of data are available and commonly used, one in the 
social sciences (surveys) and one in the environmental sciences (remote 
sensing).  The ways in which these data are integrated within a GIS are 
discussed below.

	Satellite Data Integration

	The spatial coverage and multi-temporal aspects of remote sensing make


 it appealing as a complementary data source to traditional mapped or sampled 
data.  Satellite remote sensing involves the recording of wavelength specific 
light reflection and transmission from the top of the atmosphere.  Image 
processing can be used to extract thematic information from satellite digital 
products.  Traditionally, the information provided has been categorical 
representations of the land surface cover type. Interpretation is required for 
assigning thematic attributes, usually in conjunction with "ground truth" 
information. Estimation of biophysical parameters using remote sensing, like 
green-leaf area and biomass, has become an important activity in global-scale 
modeling efforts (Running and Coughlan 1988). Satellite data are grid-based; 
consistent with the raster data structure.

	Satellite data come with a variety of inherent spatial resolutions and 
from a variety of platforms.  The issue of scale can be addressed directly 
with multiple resolution satellite sensors.  Common platforms include the 
Landsat sensors, thematic mapper (TM) and multispectral scanner (MSS) at 30 m 
and 79 m resolutions, respectively, and the advanced very high resolution 
radiometer (AVHRR) at a resolution of 1.1 km.  Different sensors provide 
different levels of information detail.  AVHRR is being looked to more and 
more to provide information for global landcover change monitoring. 

	GIS data layers can aid in the interpretation of satellite information.  
Similarly, satellite information can provide a covariate for surface 
estimation from point samples.  Langford et al. (1991) demonstrated how remote 
sensing might be used to improve areally explicit population estimates, in 
combination with census data.

	In the context of this project, satellite data can provide land cover 
and land use information.  This information might be interpreted at a variety 
of levels of detail:  forest/non-forest, land use, and quality of land.  The 
land cover information, properly classified, can provide an areally and 
temporally explicit view of societal and environmental attributes.  Land cover 
is a consequence of both human activity and natural propensities.  For this 
reason, the integration of remote sensing with other environmental and 
societal data sets is beneficial.

	Survey Data Integration

	Surveys of individuals and households provide the fundamental 
information about individual perceptions and actions for social science 
research.  This information can be integrated with other geographic data sets 
for people-environment interactions research provided it is collected and 
managed spatially. 

	Survey data integration presents one of the greatest challenges in 
social and environmental data integration.  Many existing survey data sets are 
collected and sampled in order provide statistically significant results at a 
pre-specified spatial aggregation level.  In Rwanda, for example, surveys 
available through the Ministry of Agriculture (DSA) are designed to be 
statistically significant at the level of the Prefecture, the first sub-
national spatial units, of which there are ten. The issue of confidentiality 
dictates that some aggregation be used.  However, spatial information about 
the samples would permit more detailed modeling of spatial interactions.  
Spatial information provided with survey data would permit the overlay of 
survey data with other data sets within a GIS.  The aggregation units then 
could be modified as needed.
	Surveys are clearly the only way to address certain questions about the 
way people perceive and interact with the environment.  However, there seems 
to be a fundamental conflict of scale in survey data.  In Rwanda, the DSA 
surveys are statistically valid at the prefecture or agro-ecological zone. 
However, they represent processes which operate at the household scale.  A 
fundamental question arises, "Are household scale phenomena and processes 
adequately depicted at the prefecture level?"    The aggregation of survey 
data to some areal unit may actually reduce their usefulness to near nothing.  
It becomes clear that the spatial locations of the survey data points must be 
maintained if we are to relate the household information to other large-scale 
variables. 

	Survey data, because they tend to be geographically sparse and 
temporally irratic, have limited applicability for national or continental 
scale modeling.  Their most important contribution is for inference.  
Regularizing the spatial and temporal sampling schemes may alleviate this 
limitation somewhat.


Data Integration Solutions

	I SUGGEST A THREE-STAGE TYPOLOGY OF INTEGRATION APPROACHES for societal and 
environmental data sets within a GIS for natural resource management.  The 
question of how much integration is desirable should be addressed in terms of 
both the theoretical approach of the questioner and the practical concerns of 
data manipulation and analysis.  The goals of analysis, be they description 
and inference or modeling and prediction, will dictate the degree to which 
data integration is desirable.  "If the researcher is working with highly 
aggregated, spatial data and is not overly concerned with analytical 
modelling, then the GIS becomes an expensive tool for the production of maps" 
(Marble and Peuquet 1993; p. 447).

	Parallel Analysis

	The first approach involves the production of analysis products (e.g., 
maps and tables) for environmental and social analyses separately and 
comparing the results.  This might be referred to as the "hold up two maps and 
squint" approach and is quite commonly used.  No integration is required.  The 
approach relies heavily on the visual interpretation of patterns and spatial 
coincidence. 

	Loose Integration

	The term loose integration is used here to refer to data integration 
techniques which rely heavily on overlay and regional characterization 
techniques common to many GIS systems. Conversion between areal units or the 
creation of new areal units which represent the intersection of two sets of 
units provide mechanisms for comparing disparate data.  For example, soil unit 
data might be summarized for each commune in Rwanda to provide information 
about soils.  Alternatively, new units which represent the intersections of 
soil and commune units can be created.  Data collected and aggregated to some 
spatial aggregation unit (e.g. Rwanda's communes) can be converted from those 
units to other spatial units through a variety of approaches.  Areal 
interpolation can be carried out using techniques outlined by Tobler (1979), 
Flowerdew and Green (1989), Langford et al. (1991), Martin and Bracken (1991), 
and Goodchild et al. (1993).  A common assumption for many of these methods is 
that the density of the count variable, or the value of a density variable, is 
constant throughout each of the units (Langford et al. 1991).  Goodchild et 
al. (1993) presented a method which relaxes this assumption by employing 
"control units" which are more likely to satisfy the homogeneity assumption.  
Such transformations may mask spatial variability if the resultant units are 
larger or are more heterogeneous.

	Loose integration has the advantage of providing information for a set 
of spatial units which is familiar to or required by a user.  Political units 
are commonly used by governmental decision makers for planning and policy 
decisions.  Therefore, summarization of environmental variables by political 
units may be desirable.  However, by converting from one areal unit to another 
(e.g. from soil units to political units) the analyses may be biased towards 
one set of data.  Any data aggregation or averaging must be considered in the 
interpretation of an analysis.  Because research problems often are not 
clearly defined when data collection and data management activities are begun, 
it is important that transformations which generalize (i.e., diminish detail 
in) the spatial or attribute components of geographic data be delayed as late 
in the process as possible. 
	In many instances, policy makers may be most interested in seeing the 
data as they are without the aid of complicated dynamic models.  Loose 
integration, the basic promise of GIS from its


 inception, provides an environment for exploring data and their spatial 
coincidences.  Given the difficulties of modeling human behaviour and 
representing the intricate dynamics of society-environment interactions, the 
ability of a policy maker to explore a set of data and apply her/his own set 
of assumptions and understandings goes a long way toward the synthesis sought 
in the more formal representation of a dynamic model.  Policy makers can 
factor in knowledge of daily events, for example wars and coups and refugee 
resettlement, which will not be reflected in a data set with the timeliness to 
be useful.  The KITE provides one such approach to data exploration and 
analysis.

	Full Integration

	In order to produce models of interactions between people and the 
environment, and to incorporate feedbacks into such models, social and 
environmental data must be in a common format. Additionally, the data must be 
in a format which is suitable for geographical interaction modeling.  Two 
possible common data models include raster (or grid cell) data and the use of 
some irregularly shaped administrative or landscape units.  Each of these 
presents a distinct set of advantages and disadvantages. The common use of a 
regular tessellation of space for environmental modeling may be indicative of 
the superiority of such data structures for modeling (Steyaert and Goodchild 
In Press).  Although socio-economic models have been typically developed for 
areal units (usually political units), there may be some advantages to using 
regular tessellations for these types of models as well.  Perhaps the most 
important advantage is the ability to link environmental and social models.

	Many effective spatial or distributed environmental models are built 
around data which are structured in grid cells (e.g., ANSWERS for erosion 
modeling; Beasley et al., 1982).  Some models (e.g., FOREST-BGC for forest 
biogeochemistry modeling; Running, 1990) are built on irregular spatial units.  
However, such units are assumed to be homogenous and administrative units 
cannot fulfill that criterion.  Given the existence of grid-based models in 
geography and elsewhere, the raster data structure would provide a natural 
structure with which such models can run within the GIS framework.

	Although human activities usually cannot be modeled deterministically, 
stochastic approaches to spatial modeling have led to some success.  
Applications of such an approach include diffusion modeling and spatial 
interaction modeling.


A Methodological Framework for Full Integration in Rwanda

	The approach described below is based on the assumption that all data 
can be displayed as statistical surfaces, i.e. that the variables are 
continuous.  Where data are not continuous, statistical surfaces cannot be 
generated from areal units.  The outline below is suggested as one possible 
approach to data integration which supports modeling of societal and 
environmental interactions. 

	Data Collection

	Although this paper is not intended to provide suggestions for 
institutions regarding the data collection infrastructure, a few comments 
about data collection are warranted.  Data collection issues affecting data 
integration are both technical and theoretical in nature.  Technical issues 
can be addressed by regularization of spatial and temporal sampling schemes, 
as well as coordination of information categories which are collected 
internationally.  Sheppard (1993, p. 458) stated that "Most philosophers of 
science accept that theory informs data collection through its influence on 
how (questions of which data to collected and what basic categories to use in 
making observations) are answered."  As an example, Sheppard cited changes in 
occupation categories by the U.S. Census Bureau which have facilitated 
Weberian studies while hindering Marxian analyses.  Ultimately, then, the 
types of analyses that can be done will be limited by the available data.  An 
assessment of data needs will make its most fundamental contribution to the 
field of data collection.  However, this paper examines techniques which can 
be used with a wide variety of data sets and was written with the assumption 
that data are available. Standards for data transfer and data quality--like 
those set forth for the United States by NCDCDS (1988)--are needed to improve 
the interoperability of systems on an international scale. 

	Data Management

	Data management in GIS facilitates the integration of diverse data sets 
and determines the analyses possible with those data.  Data transformation 
routines facilitate the conversion of data to a common spatial structure.  The 
common structure could be a common set of areal units.  As discussed above, 
the areal units can be limiting, depending on their size and relations to 
underlying heterogeneity of the surface being represented.  In order to 
maintain the integrity of the surface, I suggest the use of a common 
statistical surface based on a regular geometric tessellation (e.g. the raster 
data structure).  In this way the areal unit does not bias the analysis 
towards environmental or societal data.  A relatively small neutral unit can 
represent the variability, as best as is possible, in each spatial unit, and 
provide a means for integrating the data. 

	Examples of methods for transformation to a statistical surface from the 
areal units are provided by Tobler (1979) and Martin and Bracken (1991).  
These methods assume that the variable underlying the areal aggregation scheme 
is a smooth surface.  For computational convenience, Tobler (1979) settled on 
minimizing the second-derivative as his definition of smoothness. As an 
example, the 1991 population surface for Rwanda derived from prefecture-level 
data is displayed in Figure 3. 
	The cell size for the regular tessellation should be smaller than the 
smallest areal units used for data collection.  One appropriate "rule-of-
thumb" is to choose a grid cell that is, at largest, one-half the size of the 
smallest object to be resolved. In Rwanda, the two basic units we wish to 
combine include communes, 143 units with an average area of 166 km2, and soil 
units, 1106 units with an average area of approximately 22 km2. Each has been 
digitized from a map with a base scale of 1:250,000.  A cell size of about 1 
km2 will be used initially. Working with a compact


Figure 3.  Pycnophylactic interpolation of 1991 population in Rwanda from 
prefecture-level data.


area the size of Rwanda, somewhat larger than 24,000 km2, the tessellation is 
not significantly distorted by the curvature of the Earth's surface.  However, 
continental and global analyses are so affected and must address the 
inadequacies of the square tessellation.  For continental or global 
representations, triangular and/or hexagonal tessellations may be more 
appropriate, as may hierarchical tessellations (for an example, see Goodchild 
and Shiren 1992).

	Tranformation to a common statistical surface does not preclude the use 
of areal units in later analysis.  In fact, transformation between units is a 
simple matter of re-aggregation (Tobler 1979).  The data sets should include 
membership information from each level of the political and  landscape 
hierarchies for later re-aggregation or scale integration.  All data which are 
collected at a higher resolution than the selected tessellation (e.g. points 
or finer gridded data) should be maintained at the higher resolution.

	Data Analysis	

	Spatial and spatio-temporal analyses using the raster data structure are 
commonly used for land suitability analysis, erosion studies, hazards 
planning, optimum corrider analysis, spatial pattern characterization, 
viewshed analysis, and many other applications.  Several generic tools for 
raster analysis have been developed (Tomlin 1990).  In most implementations 
these tools can be linked together, using deduction, to create cartographic 
models for prediction.  The deductive process of cartographic modeling 
requires a well developed, and specific, theoretical basis.  Inductive 
approaches can also be used in a raster environment for inference and 
explanation of spatial patterns (e.g., Brown, In Press).  Inference, of


 course, must also be guided by some theoretical foundation.

	The greatest advantage of the suggested approach to full integration is 
in the modeling process for predicting particular spatial outcomes (Grossmann 
and Eberhardt 1992).  Hydrology (Beasley et al. 1982), vegetation regeneration 
(Shugart and West 1980), global climatic circulations (Manabe and Weatherald 
1975), and several other natural systems have all been modeled with varying 
degrees of success using the raster data structure. Examples of similar models 
in the social sciences are not as abundant.  However, H�gerstrand's (1968) 
diffusion modeling efforts are good examples of grid based modeling in the 
social sciences.  The diffusion modeling concepts are applicable to modeling 
disease movements as well as populations and innovations (Cliff et al. 1981).  
These models are based on assumptions about the stochastic nature of the 
process of movement and are, therefore, much less deterministic than 
environmental models.
	Important areas of future research, then, are in providing answers to 
the following questions:  Is it possible to combine deterministic and/or 
stochastic models of environmental and societal processes?, How can 
uncertainties be characterized in the models?, Which processes are most 
crucial to spatial modeling for such integration?, and How should such models 
be integrated with non-spatial theoretical concerns (e.g., political and 
economic power relationships)?  Additionally, the issue of whether linking 
models of human activities with models of environmental processes is desirable 
must be addressed. Environmental process models are often deterministic, 
sometimes stochastic.  Human behaviour is not deterministic in any sense and 
can only be modeled through stochastic means.  The amounts of variability in 
model results, therefore, may be somewhat larger for socioeconomic models than 
for environmental process models. Characterization of error in the models is a 
central issue.
	The time and space scales of the dynamic interactions between people and 
environmental processes must be addressed.  If concern for natural resources 
is over a relatively short time scale, the human activity must also be 
represented at short time scales--scales for which data are often unavailable.  
Longer term natural resource problems must be examined with longer term human 
activities in mind.  Similar statements could be made for the spatial scales 
of representation.  If the concern is for processes which operate at the scale 
of the household, then environmental variability which affects household level 
decisions need to be mapped.  Unfortunately, there are limits to temporal and 
spatial resolution of data.  The analyst must integrate only those data sets 
which are compatible with respect to time and space scales.  Averaging to 
coarser resolutions is the only transformation option available for addressing 
the scale problem. It is not possible to transform data from coarse to fine 
resolution.


Conclusions

	Ultimately, many of the issues affecting our ability to properly 
integrate social and environmental data within a GIS are institutional.  The 
agencies and bodies charged with collecting data are necessarily those most 
likely to have impact on our ability to fully integrate social and 
environmental data. However, some of the problems of societal and 
environmental data integration cannot be addressed through institutional 
means. Geographic information systems (GIS), and related technologies, provide 
some technical solutions to the integration questions. Surficial and areal 
interpolation provides the foundational methodology on which a data 
integration scheme might rest.  The choice between irregular and regular 
spatial units is dependent on the application.  I have distinguished here 
between loose and full integration on the basis of the nature of the unit used 
for analysis.  Loose integration, according to my typology, involves the 
conversion of one set of irregular, polygonal areal units to another or the 
creation of a whole new set of areal units based on the intersection of two 
previously defined sets of units. Full integration, which I propose as a 
necessity for complicated and dynamic spatial modeling, involves the 
conversion of incompatible spatial data to a common and regular spatial scheme 
(e.g., the raster data structure).

	The theoretical issues of data integration are paramount. The theory 
governing the analysis of society and environment interactions will have 
implications for which data are needed, which analysis routines are 
appropriate, and whether or not GIS and spatial analyses are valid 
methodologies at all.  A theory which is not developed to the point of 
suggesting possible spatial patterns or spatial coincidence does not have any 
direct need of GIS.  I suspect that many of the theoretical approaches for 
understanding societal and environment interactions are partially spatial.  
That is, some aspect of the theory would suggest spatial pattern, whereas 
other processes are non-spatial. A need for the future of application of GIS 
and societal-environmental interaction studies is a better definition on the 
part of theorists as to the spatial implications of the theory and how the 
spatial and the non-spatial components interact.

	The level of data integration required, then, is dependent on the 
analytical purposes of the activity.  I have outlined one method for what I 
have termed full integration, defined as that level of integration which 
permits spatial interactive dynamic modeling of social and environmental 
processes.  This is not to suggest that in every instance full integration is 
necessary or even desirable.  In most case, I suspect, loose integration is 
sufficient-- thus the popularity of GIS for natural resource management.  
Standard GIS functions of overlay and boolean logical operators are sufficient 
for many data exploration forays.  As always, the data structure and format 
for data management will control to some extent the level of sophistication in 
the data analysis.

References

Aronoff, S.  1989.  Geographic Information Systems:  A Management Perspective.  
Ottawa:  WDL Publications.

Beasley, D. B., Huggins, L. F. and Monke, E.J. 1982.  Modeling sediment yields 
for agricultural watersheds.  Journal of Soil and Water Conservation, 37(2): 
113-117.

Brown, D.G.  In Review.  Predicting vegetation types at treeline using 
topography and biophysical disturbance variables. Submitted to peer review.

Burrough, P.A.  1986.  Principles of Geographical Information Systems for Land 
Resources Assessment.  Monographs on Soil and Resources Survey, No. 12.  
Oxford:  Oxford University Press. 

Campbell, D.J. and Olson, J.M.  1991.  Framework for environment and 
development: the Kite.  CASID Occasional Papers, 10, East Lansing, MI:  Center 
for the Advanced Study of International Development.

Campbell, D.J., Olson, J.M., Berry, L.  1993.  Population pressure, 
agricultural productivity and land degradation in Rwanda:  An agenda for 
collaborative training, research and analysis.   Rwanda-Society Environment 
Project Working Papers, 1, East Lansing, MI:  Department of Geography, 
Michigan State University.

Cliff, A.D., Haggett, P., and   1981.  Spatial Diffusion:  An Historical 
Geography of Epidemics in an Island Community. Cambridge Geographic Studies, 
14.  Cambridge:  Cambridge University Press.

Cressie, N.A.C.  1991.  Statistics for Spatial Data.  New York: John Wiley and Sons.

DeMan, W.H.E. 1988.  Establishing a geographical information system in 
relation to its use:  A process of strategic choices. International Journal of 
Geographical Information Systems, 2(3): 245-261.

Dobson, J.E.  1993.  The geographic revolution: a retrospective on the age of 
automated geography.  Professional Geographer, 45(4):  431-439.

Fischer, P.F.  1990.  Simulation of error in digital elevation models.  Papers 
and Proceedings of the Applied Geography Conferences, 13: 37-43.

Flowerdew, R., and Green, M.  (1989)  Statistical methods for inference 
between incompatible zonal systems.  In M.F. Goodchild and S. Gopal, Accuracy 
of Spatial Databases.  L

ondon:  Taylor and Francis:  239-247.

Goodchild, M.F., Anselin, L., and Deichmann, U.  1993.  A framework for the 
areal interpolation of socioeconomic data. Environment and Planning A, 25:  
383-397.

Goodchild, M., Haining, R., Wise, S.  1992.  Integrating GIS and spatial data 
analysis: problems and possibilities.  International Journal of Geographical 
Information Systems, 6(5):  407-424.

Goodchild, M.F., and Shiren, Y.  1992.  A hierarchical spatial data structure 
for global geographic information systems, CVGIP: Graphical Models and Image 
Processing, 54(1):  31-44.

Grossmann, W.D. and Eberhardt, S.  1992.  Geographical information systems and 
dynamic modelling.  Annals of Regional Science, 26:  53-66.

H�gerstrand, T.  (1968)  Innovation Diffusion as a Spatial Process.  Chicago:  
University of Chicago Press.

Journel, A.J. and Huijbregt, C.J.  1978.  Mining Geostatistics, London:  
Academic Press.

Langford, M., Maguire, D.J., and Unwin, D.J.  1991.  The areal interpolation 
problem:  estimating population using remote sensing in a GIS framework.  In 
I. Masser and M.B. Blakemore, eds.  Handling Geographic Information.  Essex:  
Longman Scientific and Technical, 55-77.

Langran, G.  1992.  Time in Geographic Information Systems.  New York:  Taylor 
and Francis.

Lanter, D.P. and Veregin, H.  1992.  A research paradigm for propagating error 
in layer-based GIS.  Photogrammetric Engineering and Remote Sensing, 58(6):  
825-833.

Manabe, S. and Wetherald, R.T.  1975.  The effects of doubling the CO2-
concentration on the climate of a general circulation model.  Journal of 
Atmospheric Science, 32, 3-15.

Marble, D.F. and Peuquet, D.J.  1993.  The computer and geography:  ten years 
later.  Professional Geographer, 45(4): 446-448.

Martin, D. and Bracken, I.  1991.  Techniques for modelling population-related 
raster databases.  Environment and Planning A, 23:  1069-1075.

NCDCDS.  1988.  The digital cartographic data standard.  The American 
Cartographer, 15(1):  11-141.

Robinson, V.B.  1988.  Some implications of fuzzy set theory applied to 
geographic databases.   Computers, Environment, and Urban Systems.  12:  89-
97.

Running, S.W. and Coughlan, J.C. 1988.  A general model of forest ecosystem 
processes for regional applications. I. Hydrologic balance, canopy gas 
exchange and primary production processes. Ecological Modelling, 42:  125-154.

Shugart, H.H. and West, D.C.  1980.  Forest succession models. Bioscience, 
30(5):  308-313.

Sheppard, E.  1993.  Automated geography:  what kind of geography for what 
kind of society.  Professional Geographer, 45(4): 457-460.

Steyaert, L.T. and Goodchild, M.F.  In Press.  Integrating geographic 
information systems and environmental simulation models:  a status review.  In 
W.K. Michener, S. Stafford, and J. Brunt, Eds.  Environmental Information 
Management and Analysis: Ecosystem to Global Scales.  Philadelphia:  Taylor 
and Francis.

Tobler, W.  1979.  Smooth pycnophylactic interpolation for geographical 
regions.  Journal of the American Statistical Association, 74(367):  519-536.

Tobler, W.  1988.  Resolution, resampling, and all that.  In H. Mounsey and 
R.F. Tomlinson, eds.  Building Databases for Global Science.  Philadelphia:  
Taylor and Francis, 129-137.

Tomlin, C.D. 1990.  Geographic Information Systems and Cartographic Modeling.  
Englewood Cliffs, NJ:  Prentice Hall.

Tukey, J.W.  1977.  Exploratory Data Analysis.  Reading, MA: Addison-Wesley.