PUMS Home Page



Public Use Microdata Samples (PUMS) are computer-accessible files containing records for a sample of housing units, with information on the characteristics of each housing unit and the people in it. Within the limits of sample size and geographical detail, these files allow users to prepare virtually any tabulations they require. Identifying information has been removed to protect the confidentiality of the respondents. The samples can be extended to analysis of the whole United States for many purposes. Comparative analysis across different groups is possible using the PUMS data.

PUMS datafiles contain records representing 1 in 1000, 1 percent, and 5 percent samples of the housing units in the United States and the persons in them. Each PUMS file provides records for states and some of their geographic levels.

The 5% sample identifies every state and various subdivisions of states called Public Use Microdata Areas (PUMAs), each with at least 100,000 persons. These PUMAs were primarily based on counties, and may be whole counties, groups of counties, or places. When these entities have more than 200,000 persons, PUMAs can represent parts of counties, places, etc. None of these PUMAs on the sample crosses state lines.

The 1% sample was based primarily on metropolitan/nonmetropolitan areas, and contains PUMAs which were made from whole central cities, whole Metropolitan Statistical Areas (MSAs) or Primary Metropolitan Statistical Areas (PMSAs), MSAs or PMSAs outside the central city, groups of MSAs or PMSAs, and groups of areas outsides MSAs or PMSAs. When the areas have more than 200,000 persons, 1% PUMAs can represent parts of central cities, MSAs/PMSAs, and so forth. 1% PUMAs may cross state lines and in that case state codes are not shown.

The PUMAs for 1% and 5% are dissimilar geographical areas. Also, in some states, one or more PUMAs include noncontiguous parts. These PUMAs may occur for several reasons. On the 1% file, an effort was made to separate metropolitan areas from non-metropolitan areas. On the 5% file, an effort was made to keep meaningful socio-economic or planning areas together. In sparsely populated areas, it may have been necessary to delineate PUMAs with noncontiguous parts to meet the minimum population criterion when adjacent counties belonged to a metropolitan area or a local planning area.

The 1%, 5%, or 1 in 1000 filetypes differ from decennial census to decennial census. 1990 and 1980 are fairly similar. In 1970, two long forms were used generating two 1 in 100 and two 1 in 1000 files (5% filetype does not exist). The filetypes described in these documents are designed to reflect CIESIN holdings.



As each data collection period differs slightly in content and approach, each datafile is discussed in further depth in the following pages:

1940 PUMS (1% sample)
1950 PUMS (1% sample)
1960 PUMS (1 in 1000)
1970 PUMS (1 in 1000)
1980 PUMS (1% and 5% sample)
1990 PUMS (1% and 5% sample)

  • Layout:

    Each file generally contains two record types, each with different variables, rather than one longer record with all the variables. The two basic record types are the housing unit record and the person record. A serial number in each record links the persons in the housing unit to the proper housing unit record.

  • Universe:

    Information from the censuses were derived either from questions asked of the entire population or from questions asked of only a sample of the population. Those questions asked about every person and housing unit are called 100-percent or short-form questions. The others are called sample or long-form questions.

    Those households receiving the short-form questionnaires were asked only the 100-percent questions, and those receiving the long form were asked both the 100-percent questions and the sample questions. In 1990, some 17.7 million housing units received a long form, out of an estimated total of 106 million units (about 16.7%). Sampling rates vary depending on geographic location and population size.

    PUMS datafiles contain a sample of the individual long-form census records showing most population and housing characteristics with identifying information removed.

    100-PERCENT COMPONENT (Short-Form)
    Household Relationship, Sex, Race, Age, Marital Status, Hispanic Origin.
    Number of units in structure, Number of rooms in unit, Tenure (owned or rented), Value of home (or monthly rent), Congregate housing (meals included in rent), Vacancy characteristics.
    Population: Social Characteristics
    Education (enrollment and attainment), Place of birth, Citizenship, Year of entry to U.S., Ancestry, Language (spoken at home), Migration (residence between decennial censuses), Disability, Fertility, Veteran Status.
    Population: Economic Characteristics
    Labor force, Occupation, Industry, Class of worker, Place of work, Journey to work, Work experience, Income, Year last worked.
    Year moved into residence, Number of bedrooms, Plumbing, Kitchen facilities, Telephone in unit, Vehicles available, Heating Fuel, Source of water, Method of sewage disposal, Year structure built, Farm residence, Shelter costs (including utilitites).
  • Design and Methodology:

    The coding system varies for each census, so it is important to have access to the codebook for each census in order to assess the meaning of a specific field in a census record and its comparability across censuses. Very little comparability exists between geographic identifiers on each of the previous files, but housing and population characteristics are similar. Because of this similarity, microdata files from the most recent censuses are useful for analysis of trends.

    The sample questionnaires were edited for completeness and consistency, and substitutions or allocations for any missing data were made. Allocation flags appear at the end of each record to indicate when an item has been allocated. A user wishing to tabulate only actually observed values can eliminate variables with allocated values.

  • Variables:

    Discussed in more detail in every decennial datafile description. Generally, the following topics are of interest:


    An attempt will be made to waisindex all data dictionaries from 40-90. By submitting queries, a "time-series" search can be created. For example, by submitting the query "spouse", all data items across all years dealing with any spouse-related activities will be returned. For now, all decennial data dictionaries may be individually searched.


    Summary Tape Files (STF) are designed to provide statistics with greater subject detail for geographic areas than is feasible or desirable to provide in printed reports. The census data contained in printed reports are arranged in tables. Population and housing characteristics are presented for specified geographic areas; for example a table may represent the number of rented housing units in a census tract, the number of persons 65 years of age or older in a city, or the total population of a county. Census data at the small-area level, such as census tracts and smaller, will contain limited subject matter detail. STF files, in machine-readable format, mimic this table layout.


    No publications are currently on-line.




    Census, United States, Demographics, Populations, Housing, PUMS.