
Appendix A
Geographic Distribution of Job Openings Within the Cleveland-Akron, Columbus, and Toledo Metropolitan Areas:
Data Sources and Methodology 1
This paper was prepared by Neil Bania and Laura Leete under a Cooperative Agreement between HUD and The Center on Urban Poverty and Social Change at Case Western Reserve University in Cleveland, Ohio. The focus of the paper is on estimating the total number of job openings and their geographic distributions in three Ohio MSAs -- Cleveland-Akron, Columbus, and Toledo. The estimates are at the zip code level and are used as input data for a determination of job availability for specified mandated public housing residents and their competitors at the neighborhood level (see Appendix B and Section IV of this report). In addition, some of the Bania and Leete information is used directly in Sections II and IV of this report.
Bania and Leete develop five steps or "goals" necessary to produce estimates of job openings that welfare recipients and other low-income job seekers could qualify for at a level approximating the neighborhood. Essentially, the goals can be viewed as a disaggregation of industry employment data for a metropolitan area into occupational categories (rather than industry categories) and into zip codes rather than metro areas. These goals are sequential and include: estimating existing employment by 4-digit SIC codes and allocating them to zip codes within a specified MSA; converting "industry by zip code" data into "occupations by zip codes;" estimating net job openings by occupations for metropolitan areas and counties; allocating net job openings by occupational skill groups to zip codes; and computing the number of unemployed, the number discouraged, and the number of welfare recipients by Census tract and zip code. These steps require an assumption of an appropriate percentage of jobs that welfare recipients would qualify for. Here, Bania and Leete create a 4-group matrix, using education and skill data, and then assume that the first quartile of education is the minimum acceptable credential.
Bania and Leete go on to test the efficacy of alternative industry employment datasets by comparing ES202 and County Business Patterns industry employment reports. Essentially, they report that both are likely to produce similar results. The choice of data, therefore, rests on ease of use and there, the authors suggest that County Business Patterns data are both easier to obtain and somewhat easier to use.
What follows is a discussion of how Bania's and Leete's goals were accomplished for this study. The process and computer programs necessary to replicate the methodology in any MSA are fully documented and available.
The implementation of welfare reform, with an emphasis on moving people from welfare to work, has raised various questions about the ability of local labor markets to fully absorb everyone who is seeking employment. Specifically, policy makers and program administrators need to know the number and location of expected job openings which are skill and education appropriate for welfare recipients. This appendix describes in detail a method for developing estimates of job openings for low skill occupations at the zip code level within three metropolitan areas: the Cleveland-Akron CMSA, and the Columbus and Toledo MSAs. The method is general and sufficient detail is provided here to replicate this methodology for any metropolitan area in the United States. Estimates can be updated annually.
Overview of Methodology
General Description: In order to develop estimates of the number of job openings which are education and skill appropriate for welfare recipients, we use estimates provided by the Ohio Bureau of Employment Services. To group these occupations into four discrete education/skill categories, we use information on the job content for a given occupation as well as the distribution of education of those who are currently filling the occupation. To develop estimates which are geographically detailed (at the zip code level), we use employment data for industries by zip codes from the Zip code version (CD-Rom) of the County Business Patterns file. Next, employment by industry is converted to employment by occupation using an industry-occupation matrix derived from the 1990 Census (Five percent Public Use Micro Sample -- PUMS). This method assumes that every firm in a given industry uses the same set of occupations regardless of geographic location. Finally, we compute the number of unemployed, discouraged workers, and welfare recipients using census data (PUMS and Summary Tape File 3A - STF3A). These estimates are then allocated to the zip code level and to the census tract level based on the incidence among various population subgroups and the geographic distribution of those population subgroups.
This paper has three parts. First, we describe our methodology in section I. In section II, we provide sufficient detail and explication of computer programs needed to implement this methodology elsewhere. Finally, we report on our analysis of the use of County Business Patterns versus ES202 data as the basis of our estimates.
Description of Methodology
The methodology for creating zip code level estimates of low skill job openings for the Cleveland-Akron, Columbus, and Toledo metropolitan areas consists of five steps.
Employment by Industry for Zip Codes
First, we estimate employment by industry from the County Business Patterns file. The County Business Patterns data file reports total employment and the number of establishments by four digit SIC code in various employment size classifications. These data must be converted to point estimates of employment by industry for each zip code. An alternative data source is ES202, which reports actual employment for individual establishments at the address level. Each record also includes four digit SIC codes. The advantages and disadvantages of using ES202 and County Business Patterns are discussed below.
Employment by Occupation by Zip Codes
Second, we convert industry employment estimates to occupation employment estimates using an industry occupation matrix. The matrix is derived from the 1990 Five Percent PUMS file for each metro area.2 Sensitivity analysis is used to determine the validity of imposing metropolitan wide industry occupation matrices on smaller geographic units. We conclude that this method introduces only a small error and that it is a reasonable method for estimate employment by occupation.
Projected Job Openings by Occupation
Third, we use projected job openings prepared by the Ohio Bureau of Employment Services to form detailed occupational estimates for counties or groups of counties within each metro area. Projections of the expected number of annual openings by occupation for the years 1991-2000 were taken from the Ohio Bureau of Employment Services (OBES, 1993. Annual job openings come from two sources: the annual growth projections for each occupation and the expected number of net annual replacement openings. These projections are full-employment forecasts; they forecast changes in equilibrium employment, anticipating normal labor force growth.3
In the Cleveland-Akron area, we were able to develop projected job openings separately for six (Lorain, Cuyahoga, Medina, Lake, Portage, Summit) of the metropolitan areas eight counties. Ashtabula and Geauga counties could not be separated. Thus, we had different rates of projected job openings for 7 distinct geographic areas in the Cleveland-Akron metropolitan region. In the Columbus and Toledo metropolitan areas, we were able to develop separate estimates for the central counties (Franklin and Fulton respectively) and for the reminder of the metropolitan area. Less detail was available due to the smaller size of these regions.
Employment by Broad Occupation Skill/Education Levels for Zip Codes
Fourth, we used the distribution of employment by occupation (developed in step 2) as a basis for allocating the net openings within a specific geographic area. Thus, if we know that zip code 44113 (located in Cuyahoga county) currently contains 10.5% of the county's employment of stock clerks, then we would allocate 10.5% of Cuyahoga counties projected net job openings for stock clerks to zip code 44113.
In addition, we grouped the occupational categories reported on the Census into four discrete skill/education based categories. As a starting point, we identify those occupations which could be considered to represent job opportunities for current welfare recipients. In order to reduce the list of 407 occupational classifications reported in the Census to a more manageable set, we identified three categories which represent occupations with relatively homogeneous skill and educational requirements. They are: entry-level occupations, requiring 11 or 12 years education and less than six months of job-specific training; short-term training occupations, requiring high school graduation and 6 to 12 months of additional education or training; and long-term training occupations, requiring from 1 to 3 years of post-secondary education and/or training (possibly corresponding to community college or vocational education).
We assign occupations to these categories on the basis of occupational skill content, for which we use two types of measures: First, we measure occupational requirements via the general educational development (GED) and specific vocational preparation (SVP) scores developed by the U.S. Department of Labor in The Dictionary of Occupational Titles (U.S. Department of Labor, 1977).4 These measures are an idealized version of the training and skills an employer would like to see in an employee. In order to select occupations consistent with each public policy scenario, we used three measures of occupational requirements for each of 407 occupational categories found in the 1990 Census.5 These measures are the general educational development (GED) and specific vocational preparation (SVP) required for an occupation, and the actual education of those currently employed.
GED and SVP are both measures of job content developed by the U.S. Department of Labor for the 12,000 occupations described in the fourth edition of the Dictionary of Occupational Titles (1977).6 GED captures:
"those aspects of education that contribute to the workers' reasoning development and ability to follow instructions; and the acquisition of `tool' knowledge such as languages and mathematical skills" (U.S. Department of Labor, 1956, pp.-vi).
Jobs are rated on a scale of 1 to 6 for the level of reasoning, mathematical and language development needed (see attached material for a description of each of these levels). An occupation is then assigned the highest of these three scores as its final GED level. The SVP scale indicates (in ranges of months) the total amount of training time needed in order to perform in an occupation at an average level (see attached material for the ranges). This training time might include all types of vocational schooling, on-the-job training, and/or actual job experience.
The GED and SVP scores used here were developed by the Department of Labor between 1966 and 1976. The scores were assigned by analysts following their observation of workers on the job, and interviews with company officials and human resource personnel. They were intended to reflect the skills and development needed for average performance in a given occupation. Despite some limitations, many have argued that the GED and SVP are still the richest available source of information on the job content of the U.S. economy (e.g. Spenner, 1983, Miller et.al., 1980).7 Second, we measure actual worker characteristics in each occupation using data from the Five Percent Public Use Microdata Sample (PUMS) of the 1990 Census on the education levels of workers in an occupation in the Cleveland-Akron, Columbus, and Toledo metropolitan areas. To measure minimum acceptable education levels for workers in a given occupation, we compute the first quartile of education in each occupation. Far from being idealized, this is a measure of the characteristics of workers actually hired into an occupation under current conditions in the local labor market.
We look at both types of measures for each of 407 occupational categories. Using factor analysis on GED, SVP and the first quartile level of education, we construct a "skill content" index which rises with each of these variables. Occupations are ranked by this index and cut-points are selected to create each group of occupations.
Computing the Number of Persons Seeking Employment
Fifth, we use the five percent Public Use Micro Sample (PUMS) data to estimate the number of persons who meet the following criteria: unemployed at the time of the census (April 1, 1990); not unemployed but available for work at the time of the 1990 census; or received public assistance income during calendar year 1989. These estimates were produced at the Public Use Micro Area (PUMA) level. PUMAs are geographic areas which contain at least 100,000 persons but sometime contain as many as 200,000. In Cuyahoga county, there are 11 PUMAs. Sometimes, one county is a PUMA - as in the case of Medina county. However, sometimes two or more counties form a PUMA (as in the case of Ashtabula and Geauga counties). In order to allocate these estimates to lower levels of geographic units (census tracts and zip codes), we produced our PUMA-level estimates separately for 48 distinct age, racial, and education population subgroups. Then these estimates were allocated to tracts within each PUMA according to the distribution of that population subgroup across tracts within a given PUMA. Finally, tract level estimates were then aggregated up to zip codes. In the few instances where tracts cross zip code boundaries, we use the portion of land area as an allocation method to spilt the tract total across zip codes.
II. Methodological Details
The following section provides detailed information about the algorithm and computer programs needed to implement the methodology described above. Enough information is given so that the reader could easily implement this methodology in another metropolitan area.
Summary of Steps To Implement the Methodology:
Step 1 Estimate employment from the County Business Patterns by zip and industry.
Step 2 Convert employment estimates from industry to occupation.
Step 3 Compute net job openings by occupation for metro areas and/or counties.
Step 4 Allocate net job openings by occupation skill groups down to zip code level.
Step 5 Compute unemployed, discourage workers, and welfare recipients by zip code.
Data Files Needed for Step 1:
- County Business Patterns data files (from the CD-ROM):
- CBPSUM.DBF
- CBPZPXSC.DBF
- REFZIP.DBF
- REFSIC.DBF
- Industry cross walk: SIC code to Census Industry Code (CIC).
Files Needed for Step 2:
- 1990 Five Percent Public Use Micro Sample (PUMS) for Ohio
- Occupation Cross walk: Census Occupation Codes to OES occupation codes
- Industry Cross walk: CIC to SIC codes
- Zip code to County cross walk (with population and land area shares)
Files Needed for Step 3:
- Net Job opening projections for State, including a list of all occupations in state
- Net Job openings projections for counties or other geographic areas within metro areas
- Occupation Cross Walk: OES to Census Codes
Files Needed for Step 4:
- 1990 Five Percent Public Use Micro Sample (PUMS) for Ohio
- Occupation cross walk: Census Occupation codes to OES
- Industry cross walk: Census Industry Classification Codes (CIC) to SIC codes
- GED/SVP data for occupations
Files Needed for Step 5:
- 1990 Census of Housing and Population, STF3A for Ohio
- 1990 Five Percent Public Use Micro Sample (PUMS) for Ohio
- Census tract to zip code cross reference file, including land area shares
Detailed Methodology and Description of Computer Programs.
Step 1: Estimate employment from the County Business Patterns by zip and industry.
Goal: County Business Patterns contains data for total employment by zip code. In addition, the file reports the number of establishments in various employment size classifications. The goal of this step is to develop estimates of employment by 4-digit SIC code for each zip code. In a final step, SIC codes are converted to Census Industry codes (CICs).
Description of the Data Set: County Business Patterns data for zip codes are available from the Census Bureau on a single CD-ROM. The file covers the entire United States and as of December 1997, the most recently available data are for the first quarter of 1994. The data are collected from the filings that business establishments make in order to comply with the requirements of the Social Security Administration.
In brief, the data file contains total employment, the total number of establishments, the number of establishments by various employment size classifications, and the total payroll for each 5 digit zip code in the United States. In addition, all but the employment variables are reported for each four digit standard industrial classification (SIC) code.
Methodology: In order to develop estimates for the number of employees by zip code and SIC code, we used information on the total employment by zip code and the number of establishments in various employment size classifications. First, we developed an estimate of the total employment by SIC by multiplying the number of establishments by the number of workers in each establishment. We assumed that employment in a given establishment was equal to the midpoint of the employment size classification in which it was reported. Thus, if there were 12 establishments in the "one to four employee" size classification, we estimated total employment for that zip code, SIC code, and employment size classification as being equal to: 12 establishments x 2.5 employees per establishment = 30 employees. For the top coded category (over 1,000 employees) we assume that all establishments had exactly 1,000 employees.
Next, we summed the employment for all employment size classifications and all SICs within a given zip code. This estimate was then compared to the actual number of workers in a given zip code. If the estimate was low (high), then we followed this procedure:
- we "scaled up (down)" the estimated number of workers in each establishment (which was just the mid-point of the employment size classification) by the appropriate percentage. Thus, if our first employment estimate was 10% too low, we would "scale up (down)" the mid-point estimates by 10%.
- If we apply these scale factors exactly to our "first cut" employment estimates described above, we would match exactly the total employment in that zip code. However, doing so might possibly violate the known minimum and maximum employment totals that are possible for each employment size classification range. These ranges are know because we know the number of establishments that fall into each employment size classification. Thus, if there are 12 establishments reported in the "one to four" employee employment size classification for a given zip code and SIC code, then we know that these must be at least 12 and no more than 48 workers in that group of 12 establishments.
- To account for these constraints, we never let our estimate fall outside the range of the reported employment size classification. For example, the mid-point is 2.5 for the one to four employee size classification, and if our scale factor was 70%, then this would result in a new employment estimate of 4.25 (2.5 plus the 70% scale factor) workers per establishment, which is clearly not possible (all establishments must have between one and four employees). Thus, we force the employment estimates for a given employment size classification to be consistent with the minimum and maximum employment values possible in that employment size classification range. For the top coded category (over 1,000 employees), we would scale up but we would never scale downward.
Because the appropriate scale factor would not necessarily yield an estimated employment total which would match the actual reported employment for each zip code, we would follow an iterative procedure. We repeatedly applied the above process until our estimated employment in a given zip code exactly matched the known total for that zip code. There were a few instances in which it was not possible to choose a set of "scale factors" that would yield employment estimates that summed exactly to the reported total and were entirely consist with the employment size classifications reported in the data set. These anomalies are obviously inconsistent with the data and are thus indicative of an error in the original data file. That is, there exist no distribution of establishment sizes which are consistent with the minimum and maximum bounds and yield the reported total employment. Fortunately, in Ohio, with a total employment of over 4 million, these cases accounted for only a total of 47 employees. Therefore, we ignore these anomalies.
Step 2: Convert employment estimates from industry to occupation.
Goal: To convert the industry by zip code employment estimates created in step 1 to occupation by zip code.
Methodology: The method is straight forward except for two issues which arise due to slightly incompatible classification schemes. First, we use the 1990 5% PUMs to compute an industry occupation matrix for each metro area. On the five percent PUMS, metro areas are not directly identified. However, using collections of Public Use Micro Areas (PUMAs), it is possible to approximate the official metropolitan area definitions. For the Cleveland-Akron CMSA, the correspondence is exact. For Columbus and Toledo, we selected PUMAs which included the entire MSAs, and two additional counties in Columbus and one additional county in Toledo. Since the industry occupation matrix is a proportional allocation method, the higher totals in Columbus and Toledo do not matter.
The PUMS using the Census Occupation Classification and the Census Industry Classification codes. These are incompatible with SIC and OES occupations codes. Therefore, we developed a cross walk to facilitate the conversion. Finally, some zip code boundaries cross county boundaries. This means that part of a zip code may lie outside of the metro area boundary. We developed a zip code to county cross walk, which included population and land area shares so that we could allocated employment by within such a zip code between the two counties.
The industry occupation matrix reports the distribution of the percent of workers in a given industry across occupations for a given metro area. Thus, for a given zip code in a metro area, we assume that the employment distribution across occupations for each industry does not vary geographically: If there are 10 workers in a given industry and 50% of the workers in that industry in the relevant metro area work in occupation A, then we assume that this industry will contribute 5 workers to occupation category A. If industry staffing patterns (occupations) are not related to geography, then this is a good assumption. If, on the other hand, industries vary their occupation staffing according to their location, then this assumption is suspect. To address this question, we use this methodology to estimate employment by occupation for PUMAs located within the Cleveland CMSA. The estimated employment by occupation was then compared to the actual employment by occupation for each PUMA. We find that employment estimates vary from actual employment by about 5% (Leete and Bania, 1997).
Step 3: Compute net job openings by occupation for metro areas and/or counties.
Goal: Develop estimates of net job openings by occupation for metro areas and for counties (or groups of counties) within the three metro areas.
Methodology: Estimates of net job openings by occupation can be obtained from the Ohio Bureau of Employment Services (OBES). Estimates are tabulated for the State, for each metro area and for Service Delivery Areas (SDAs) which are typically counties or groups of counties. Estimates are suppressed when the total employment is given cell (geography and occupation) is less than 100. Thus, suppression increases as the size of the geographic decreases. To fill out the estimates and to insure that we have a "balanced" set of data files, we imputed the missing occupational categories. The imputation method involved using the share of employment at the state level in a given occupation category. Using this share, we allocated the residual unassigned net job openings for the lower level of occupational detail in the classification scheme. Typically, the total number of net job openings imputed was less than 5% of the total for a given geographic area. Including this step is mostly a computation convenience, the imputation process does not affect the total job openings significantly.
Interpretation of Net Job Openings: These estimates are the number of new job openings which are expected to become available and be filled in a typical year between 1995 and 2005. Thus, are estimates of labor demand and labor supply changes in a typical year in equilibrium. These estimates do not include cyclical effects and assume a steady state full-employment economy. Most important, these estimates do not represent job vacancies. If, due to welfare reform or any other sudden shock to the economy, the labor supply would suddenly and unexpectedly increase, then the economy would have to generate additional job openings to absorb this increase. See Leete and Bania for more discussion of this point. Also, see Mishel (1995) and Bloom (1997) for a discussion of possible labor market changes due to welfare reform.
Note: We define net job openings as the sum of the number of "growth openings (which can be positive or negative) plus the number of replacement openings (which can only be positive). Unlike the OBES method, we allow growth openings to be negative instead of assigning a zero value when growth openings are less than zero. Thus, if there is a need for 10 replacement workers, but expected growth is -7, which define net openings to be 3. OBES would define this as 10, which creates an upward bias in their estimates.
Step 4: Allocate net job openings by occupation skill groups down to zip code level.
Goal: Develop estimates of net job openings by occupation for zip codes within the three metro areas and develop a method for classifying occupations into four broad skill categories.
Methodology: The estimated number of job openings for each county or group of counties in a metro area is allocated down to the zip code by using the distribution of employment in that occupation and county (county group) across zip codes. Occupational characteristics are created by merging the GED/SVP data with education levels (the first quartile of education) by occupation. A complete description of the GED/SVP is contained in Leete and Bania (1997). These three variables (GED, SVP, and the first quartile of education) are combined using factor analysis to create a single factor (or score). This score can then be used to create four discrete occupational groupings. These are designated as entry-level occupations, then short term training occupations, long term training occupations, and high skill occupations. Occupations which are male dominated (that is over 85% male) are excluded from the estimate of job openings. This is because the welfare population is over 90% female and it is unlikely that these jobs will offer much in the way of employment opportunities for former welfare recipients.
Step 5: Compute unemployed, discourage workers, and welfare recipients by zip code.
Goal: Compute the number of unemployed persons, persons available for work (discourage workers), and public assistance (welfare) recipients. Develop estimates of these numbers by census tract and zip code.
Methodology: We use the 1990 PUMS (5% file) to estimate the number of unemployed persons at the time of the 1990 Census. We also include the number of persons who were not unemployed but who were available for work (discouraged workers). Finally, we included any person who received public assistance income in the year prior to the Census (1989). Using this method, we hope to identify those who might be seeking employment and thus are "competing" for the limited number job openings.
The methodology of using the PUMS to compute these numbers is straightforward. However, the lowest level of geography in the PUMS is the PUMA, an area of about 100,000 persons. Thus, we compute these estimates at the PUMA level for separately for various age, education, and racial subgroups. For example, there might 50 persons in a given PUMA who are African American, high school graduates, between the age of 25 and 49 who also fall into the unemployed, discouraged, and welfare recipients groups. Next, we identify the census tracts which belong to a given PUMA and compute the distribution of various population subgroups across tracts in the PUMA. Thus, we might compute that tract 1234 in a given PUMA has 2.1% of the PUMAs African Americans who have high school degrees and are between the age of 25 and 49. Thus, we would allocate 50 x 0.021 = 1.05 persons (who are unemployed, discourage, or welfare recipients) to tract 1234. Our total estimates for each tract are built by following this allocation scheme for each of the relevant population subgroups.
Finally, the tract level estimates are aggregated to the zip code level using a tract to zip code cross walk file. When tracts cross zip code boundaries, we allocate the total using the land area shares of the tract - if 75% of a tract falls into one zip code, then that zip code receives 75% of the tract total.
III. ES202 vs. County Business Patterns as Data Source for Industry Employment
The basis of the methodology described above rests on a data file which describes employment by industry for some small geographic unit (such as a zip code or census tract). We report here on the use of the ES202 unemployment compensation files in lieu of the County Business Patterns Zip Code (CD-ROM) data base. The methodology described (steps 2 through 4) above could be implemented using any data file that describes employment by industry for zip codes or some other small geographic unit.
ES202 contains data on employment for every reporting establishment with one or more employee. Summary reports from the ES202 data indicate that the coverage is quite complete and that total employment estimates for states and counties compare favorable with other data sources on employment.
The main advantages of the ES202 data are as follows:
- Geographic Description. In principle, the exact address is included on the ES202 for each establishment location. Thus, it is possible to assign the establishment to any geographic unit, including zip codes, census tracts, census block groups, or even census blocks.
- Exact Employment is Reported. The ES202 file reports employment for each month and the files is produced and updated quarterly. Thus, it is possible to track changes in employment with great frequency.
- Data are reported in a Timely Manner. Data are available with a lag of about 6 months, so frequent updates are possible and analysis will not be outdated as quickly.
- Four Digit Industry Code is Reported. Each establishment reports a four digit industry SIC code, so it is possible to assign occupations on the basis of the most detailed industry information.
- Name of Establishment is Reported. Since the name of the establishment is reported, it is possible to verify the accuracy of the data. In addition, it is possible to contact employers for surveys or to involve them job training or other programs.
The main disadvantages of the ES202 data file are:
- Address information is often inaccurate. Addresses reported should be the location of the work site. However, many companies use their headquarters address or even the address of a third party such as an accountant or law office which fills out the paperwork for the company. In other work, we report that nearly 25% of the ES202 data records are inaccurate at the zip code level.8 Accuracy is probably worse for smaller units of geography.
- Confidential Data. ES202 is a confidential data set requiring that the users jump significant legal hurdles to gain access to the data. In some states, access is not allowed for research purposes.
- Data are Difficult to Verify. Although the name of the establishment makes verification possible, the size of the file and the limited amount of other information on the employment levels in firms makes verification costly at best and problematic at worst.
On the other hand, County Business Patterns has some significant advantages. These include:
- Available for entire U.S. This file is available in single standard format for the entire United States. Researchers can develop methodologies similar to ours and the results can be shared and implemented elsewhere. The data set is not confidential and is easily purchased and used as it comes a CD-ROM disk.
- Data include Four Digit SIC Code. County Business Patterns data use the standard four digit industry SIC code. This makes integration into other data sets easy.
- Already Aggregated to ZIP Code Level. The data are already aggregated to the zip code level, which is probably the most appropriate geographic unit for analyzing local labor markets. Census tracts are too small and too numerous. Counties are too large for understanding the implications of access to jobs. Zip codes represent a compromise.
The County Business Patterns data also have significant disadvantages. These include:
- Long Delay in Availability. As of September 1997, the most recently available data set was for March 1994. This represents a lag of 3 and ½ years, which is about 3 years longer than ES202.
- Reports Interval Data for Zip Codes/Industries. This requires an elaborate imputation scheme to develop point estimates for employment by zip code. Obviously, this introduces more error in the process.
- Data cannot be directly verified. No names of companies are reported, so the data cannot be verified and employers cannot be contacted from this data set.
Criteria for Comparing ES202 and County Business Patterns
An empirical comparison of the two data sources is difficult because of the absence of a standard, that is, we don't know the true employment level by industry and by zip code, so we really have no basis for judging whether one data set is more accurate than the other. Therefore, we propose the following set of criteria for judging which data set to use for the application described in this paper. Because of ease of use, non-confidentiality, and ease of replication across the United States, we would choose the County Business Patterns if the results obtained from the analysis with ES202 and County Business Patterns are substantially similar.
We define "substantially similar" in the context of this application as:
- Is the distribution of the net job openings in low skill occupations across zip codes similar for ES202 based estimates and County Business Patterns based estimates? Specifically, is the correlation coefficient at least 0.90?
- Is the list of the top zip codes ranked by total low skill job openings using the ES202 method contain the same members as the list based on the County Business Patterns method? Specifically, does the list of top ten zip codes ranked by low skill job openings based on the two methods contain at least 7 common members?
- Among the zip codes with the largest number of low skill job openings, are the two sets of estimates close to each other? Specifically, among the set of zip codes with the 20 largest number of low skill openings based on either measure, do 75% of these zip codes fall within 30% of each other?
The correlation coefficients between the entry-level job openings estimates based on the two methods are 0.895, 0.963, and 0.935 for Cleveland-Akron, Columbus, and Toledo respectively. In the Cleveland-Akron metropolitan area, the top ten lists of zip codes produced by each method contains 8 common members. For the Columbus metropolitan area, there are also 8 common members, while the Toledo metropolitan area has 7 common members. Finally, among the zip codes with the largest number of entry-level openings (top 20 zip codes based on either measure), we find a significant percentage of the zip codes have job openings estimates within 30% of each other. In Cleveland-Akron, 21 of 25 zip codes are within plus or minus 30%. In Columbus, the figure is 16 of 23 zip codes and in Toledo it is 14 of 22 zip codes.
The results are clear - while there are differences in the two sets of estimates, these differences are not substantial enough to justify choosing ES202 over County Business Patterns. (indeed, it is not clear on what basis we would choose the ES202 based estimates over the County Business Patterns based estimates). Fortunately, the two sets of estimates are similar enough that we conclude either estimate yields substantially similar results.
Endnotes
1 Neil Bania and Laura Leete, Center on Urban Poverty and Social Change, Case Western Reserve University, Cleveland, Ohio.
2 Alternately, one can use the distribution of industry employment across occupations which is estimated for all of Ohio by OBES. However, because data for any industry/occupation combination with less than 100 employees is suppressed in this dataset (for confidentiality) considerable detail is lost, and extensive industry/occupation aggregation is required. A preliminary analysis found that the OBES data was available for 4,915 detailed (slightly aggregated 3-digit codes) industry/occupation combinations, while the Census data provided information on 8,806 such combinations. For this reason we use the Census occupation/industry employment breakdown.
3 Ohio's occupation and industry employment projections are derived from the national projections prepared by the U.S. Bureau of Labor Statistics. Rosenthal (1992) finds the level of Bureau of Labor Statistics occupation projections for the period 1980-90 to be quite accurate, with actual employment in 1990 totaling 1 percent more than projected employment. Differences between actual and projected employment for the aggregate occupational groups were also generally quite small, with five out of eight major groups exhibiting projection errors of less than 6 percent. At the detailed occupational level, projections of the magnitude of occupational growth and decline exhibited a conservative bias, where the projected degree of growth or decline was smaller than that actually experienced. Less (1992) evaluates Ohio's industry employment projections for the period 1985-1990. Detailed industry employment projections during this period exhibited a weighted mean absolute projection error of 14.4 percent at the 1-digit level of disaggregation. Much of the error in these estimates resulted from failing to forecast that Ohio's longer than average recovery from the 1981-82 recession and associated the structural shift from manufacturing to services that occurred in Ohio during this time period.
4 GED captures "those aspects of education that contribute to the workers" reasoning development and ability to follow instructions; and the acquisition of "tool" knowledge such as languages and mathematical skills" (U.S. Department of Labor, 1956, pp. vi). The SVP scale indicates (in ranges of months) the total amount of training time needed in order to perform in an occupation at an average level. Despite some limitations, many have argued that the GED and SVP are still the richest available source of information on the job content of the U.S. economy (e.g. Spenner, 1983, Miller et. al., 1980).
5 We limit our analysis to those occupations in which at least 100 individuals were estimated to be employed in the Cleveland-Akron metropolitan area in 1990. Our occupational categories are slightly aggregated versions of 506 Census categories; this aggregation was necessary in order to make the Census occupational categories match those used by Ohio Bureau of Employment Services. All calculations from Census data refer to individuals who report that their place of work is in the eight-county Cleveland-Akron metropolitan area; a geographic definition which will be compatible with our future work in this area. The difference between the six- and eight-county areas should not affect any calculations here.
6 These 12,000 detailed occupations are then aggregated into the 407 broader occupational categories used here.
7 Miller et.al. (1980) report that the sampling methodology used to select firms for observation was ad hoc and that manufacturing firms were consistently oversampled. In addition, concerns have been raised regarding the accuracy of these scores twenty to thirty years following their creation. In the ensuing time period jobs may have been either upgraded or deskilled; empirical work finds evidence of both (e.g. Cappelli, 1993, and Keefe, 1991; see Spenner, 1983, for a review) with no clear conclusion regarding the net effect.
8 Establishment location information in these records is not always accurate, as it sometimes represents the location of company headquarters or of a personnel management or accounting firm responsible for filing the report. Comparing a random sample of 2,304 establishment records (stratified by county and firm size) to 1994 phone books, we estimate that 74 percent of all establishments are reported in the correct zip code. Since not all establishments are listed in the phone book by the exact name reported in the ES-202 data, this is a lower bound for accuracy. Error rates were not distinctly different between counties or among large or small firms. Among the 40 zip codes with the largest employment, the share of reported employment which was falsely reported in those zip codes averaged 11.4 percent; the share of reported employment which was falsely reported in other zip codes averaged 13.8 percent.
References
Cappelli, Peter, "Are Skill Requirements Rising? Evidence from Production and Clerical Jobs", Industrial and Labor Relations Review, April 1993.
Center for Regional Economic Issues, "Labor Force Development in WIRE-NET: Drawing Inferences from State and Federal Data", Case Western Reserve University, August 1994.
Keefe, Jeffrey, "Numerically Controlled Machine Tools and Worker Skills", Industrial and Labor Relations Review, April 1991.
Less, Larry, "An Evaluation of Industry Projections: A Case Study of the Ohio Economy", Economic Development Quarterly, August 1992.
Miller, Ann, Treiman, Donald, Cain, Pamela, and Patricia Roos, editors, Work, Jobs and Occupations: A Critical Review of the Dictionary of Occupational Titles, Washington DC: National Academy Press, 1980.
Ohio Bureau of Employment Services, Ohio Labor Market Information: Labor Market Projections -- Ohio Projections, 1995-2005, February 1997.
Rosenthal, Neal, "Occupational Employment", Monthly Labor Review, August 1992.
Spenner, Kenneth, "Deciphering Prometheus: Temporal Change in the Skill Level of Work", American Sociological Review, December 1983.
U.S. Department of Labor, Estimates of Worker Trait Requirements for 4,000 Jobs as Defined in the Dictionary of Occupational Titles, Washington DC: U.S. Government Printing Office, 1956.
U.S. Department of Labor, Dictionary of Occupational Titles, Fourth Edition, Washington DC: U.S. Government Printing Office, 1977.
|