7.2Variables, Data and Descriptive Statistics

The variables needed for this analysis were obtained at the state and county level. I considered the 50 U.S. states and the District of Columbia (containing 3141 county equivalents) for the year 1990. Four counties had to be dropped from the analysis because of missing corresponding values between various data sources. These counties are Kalawao county, HI, Aleutians and Lake counties, AK and Yellowstone county, MT. The primary variables needed are output, capital, employment and land area. Secondary variables are education, population and the number of establishments. The variables are aggregated for two types of employment, IT and non-IT. The way these two types are defined is discussed next. Table 7.1 presents the variables needed for this study.

Just as there are different ways to define IT capital (see chapter 2), there are also various ways of defining information technology employment vs. non-IT employment. My measure is based on four sources: Porat (1977), Hepworth (1990), Hudson and Leung (1988) and Drennan (1989).

Table 7.1Definitions of Variables
Variable Definition Link to other variables
ne,c Number of employees in industry type e in county c
ne,s Total number of employees in industry type e in state s ns = Σc nc
ye,c Output of industry type e in county c
ye,s Output of industry type e in state s ys = Σc yc
ke,c Capital stock of industry type e in county c
ke,s Capital stock of industry type e in state s ks = Σc kc
pe,c Labor productivity of industry type e in county c pe,c = ye,c / ne,c
pe,s Labor productivity of industry type e in state s pe,c = ye,c / ne,c
kne,c Capital to labor ratio of industry type e in county s kne,c = ke,c / ne,c
kne,c Capital to labor ratio of industry type e in county s kne,c = ke,c / ne,c
ac Land area of county c
as Total land area of state s As = Σc ac
popc Population of county c
eds Average years of education in state s

Porat is considered as a pioneer regarding the definition of the information economy. In his voluminous dissertation, he identified four “layers” of information occupations based on the general SIC industry codes. He defined IT occupations as employees who produce, disseminate, analyze and distribute information. His definition served also as a reference to the Organization for Economic Cooperation and Development [O.E.C.D. (1997)] for identifying IT activities. Hepworth, Dreenan, Hudson and Leung have all studied various effects of information technology in light of Porat’s definition, but with some restrictions and/or enlargements. The industries considered as IT industries are usually industries dealing with information as their main resource, and where the ratio of IT capital stock to total capital stock is usually high. Note that this definition differs from the common understanding of IT occupations as mainly limited to computer and network engineers. The data at the county level are available by SIC codes. At the state level, I will use my dataset built in chapter 4, which contains information on production function variables for 52 2-digit SIC industries (Table 4.2). Based on all these considerations, I define IT employment as the number of employees working in industries that are more involved with information and knowledge than traditional industries, and that correspond roughly to the classification in the previously cited references. Tables 7.2 and 7.3 list the IT and non-IT industries, and how they relate to the 52 industries in Table 4.2.

Note that some adjustments had to be made in order to match the two sources of data coming from BEA for the state level data and the U.S. Bureau of Census for county data. In doing so, I could use 50 industries, which resulted from the match of data sources. Among these, 21 are IT industries and 29 are non-IT industries. Following Drennan, another “industry” was added to the IT classification, the one that corresponds to administrative and auxiliary employment in all the industries, and which is reported for each 1-digit industry by the Bureau of Census. Indeed, administrative and auxiliary workers are managing information as their main occupation. As described here, then, data on employment, IT and non-IT, were assembled for the year 1990 at the county level from the County Business Patterns of the U.S. Bureau of Census.

Table 7.2IT Industry Classifications
SIC code CODE IT INDUSTRIES
50-- 7 Wholesale trade
4800 62 Communications
6000 + 6100 91 Banking
6200 92 Security brokers
6300 93 Insurance carriers
6400 94 Insurance agents
6700 96 Holding and investment
7200 102 Personal services
7300 + 8300 + 8600 + 8700 103 Business and Other Services
7800 106 Motion pictures
8000 108 Health services
8100 109 Legal services
3500 526 Industrial machinery
3600 + 3800 527 Electronic, instrument and related equipment
3700 528 + 529 Transportation equipment
2700 536 Printing & publishing
2800 537 Chemicals
4100 612 Local & interurban passenger transit
4500 615 Transportation by air
4700 617 Transportation services
8200 1010 Educational services
/-1999 + Administrative and Auxiliary of all industries
Note: Code refers to the code defined in Table 4. 2
Table 7.3Non-IT Industry Classifications
SIC code CODE Non-IT or “traditional” INDUSTRIES
15-- 4 Construction
52-- 8 Retail trade
1000 31 Metal mining
1200 32 Coal mining
1300 33 Oil & gas
1400 34 Nonmetalic minerals
4900 63 Electric, gas, & sanitary
6500 95 Real estate
7000 101 Hotels & lodging
7500 104 Auto repair & parking
7600 105 Misc. repair services
7900 107 Amusement and recreation
2400 521 Lumber & wood
2500 522 Furniture and fixtures
3200 523 Stone, clay, glass
3300 524 Primary metals
3400 525 Fabricated metals
2000 531 Food & kindred products
2100 532 Tobacco products
2200 533 Textile mill products
2300 534 Apparel & textile
2600 535 Paper products
2900 538 Petroleum products
3000 539 Rubber & plastics
3100 5310 Leather products
4200 613 Trucking and warehousing
3900 5210 Misc. manufacturing
4400 614 Water transportation
4600 616 Pipelines, ex. nat. gas
Note: Code refers to the code defined in chapter 2

Regarding output and capital stock, getting data at the county level is more complicated. There are simply no such data available. Therefore, I had to estimate output and capital stock series for the 3141 U.S. counties for the year 1990. In order to do so, I based my procedures on the methodology of Hicks and Nivin (2000), who implemented regional industry productivity measures based on national figures. This method is based on the premise that labor productivity within each industrial sector is uniform across regions. I considered a similar premise in chapter 4, when I estimated state capital stock series based on national industry figures, holding the capital to output ratio within an industry across states constant. Here I have to hold labor productivity (output divided by employment) constant within each of the 50 industries considered in this analysis, and across counties. Such an assumption may seem unrealistic at first, but it is possible to justify with the need to narrow sector-specific labor quality across regions that has taken place in the 1980s and that was due to increasing competitiveness. As Hicks and Nivin (2000) argued:

‘We suggest that the transformation of metro-regional economies during the 1980s was such that individual industries surviving and emerging within them were increasing likely to reflect ever- rising competitiveness pressures. As more and more goods producers and services providers faced the need to retain or regain competitiveness in expending nation-scale (and often global) markets, it is likely the the net effects was to narrow intra-industry labour quality differentials, especially within the nation’s largest metro regions. It follows that to be competitive in geographically- expanding markets, then the skill-sets and productivity of workers (...) – whether in Boston or Boise – would of necessity tend to converge over time. Moreover, as new enterprise is incubated, new entrants would be increasingly likely to meet the rising competitiveness requirements for survival. Taken together, such forces likely had the effect of substantially narrowing the range of sector specific labour quality differentials across regions. ’

I also assume that, just as labor productivity is held constant within industries across counties, the capital per worker ratio also remains constant at the same level. Of course, taken together, these hypotheses amount to saying that a given industry faces the same production function across counties within a given state, considering only capital and labor inputs. Although this is a somewhat strong assumption, the need to justify it becomes less crucial when the regional aggregated output is considered. Indeed, the need to estimate industry inputs and output levels across counties is only motivated by the goal of obtaining county aggregate output and capital stock levels. Thus, I obtained county output level data based on the following assumption:

yc / nc = ys / ns(7.14)

And county specific levels of capital stock are obtained using

kc / nc = ks / ns(7.15)

Finally, other county information such as area (in square miles), population, education levels, were obtained from the decennial census of population from the U.S. Bureau of the census. An education level variable, edc, is defined at the county level and represents the percentage of the population that has graduated from high school, but not from college. At the state level, the education variable eds represents the average years of education and. The data were taken from Ciccone and Hall (1996). Following this presentation of the three models and the variables used in this analysis, the next chapter describes and discusses the estimation results.