3.2.2. Fractiles are Estimated by the Pareto Interpolation Method

The Pareto interpolation is a method of estimating either the median, or the deciles, or the percentiles of a population that follow a Pareto distribution.

Studying the income distributions of various countries, the Italian economist Vilfredo Pareto eventually noticed a specific pattern of the income allocation among individuals: Whenever the amount of wealth doubles, the number of people falls by a constant factor. In the theoretical literature, this constant factor is usually called the Pareto coefficient and is labeled b. This factor may vary from country to country, but the pattern remains basically the same. Buchanan (2002) comments further on the Pareto distribution:

‘Unlike a standard bell curve distribution, in which great deviations from the average are very rare, Pareto's so-called fat-tailed distribution starts very high at the low end, has no bulge in the middle at all, and falls off relatively slowly at the high end, indicating that some number of extremely wealthy people hold the lion's share of a country's riches. In the United States, for example, something like 80% of the wealth is held by only 20% of the people. But this particular 80-20 split is not really the point; in some other country, the precise numbers might be 90-20 or 95-10 or something else. The important point is that the distribution (at the wealthy end, at least) follows a strikingly simple mathematical curve illustrating that a small fraction of people always owns a large fraction of the wealth. ’

Visually, the Pareto distribution can be represented by the probability density function (PDF) taking the power form f(y) = α kα / y(1+α), where y stands for income, α is a scalar, and k, the minimum level of income in the distribution. It captures well the 80-20 percent rule as f(y), the “probability” or fraction of the population that earns a small amount of income per person (y), is rather high for low-income levels, then decreases steadily as income y increases: The higher the values of α, the wider the gap of inequality. 21

Figure 3.1. Pareto Probability Density Function
Figure 3.1. Pareto Probability Density Function

Justifying that an income allocation fits Pareto distribution is the first step towards the calculation of dispersion indicators, such as the 17 fractiles to be estimated. This estimation can be performed using the Pareto interpolation technique (where the curved portion of the cumulative curve is approximated to a straight segment).

The Pareto interpolation method suggests a specific formula to estimate fractiles. In the case of the top decile TI90, the formula is written as follows: 22

(0.1) TI90 = k (10%)1/α

The estimation of the parameters k and α start with the determination of si, the lower bound of the income interval [si ; si+1) displayed in the IRS tables. For instance, the IRS table for Alabama in 2003 (listed in the table below), displays Ni and Yi for each income bracket [si ; si+1).

Table 3.3. All the Steps from IRS Tables to TI90
Income brackets si lower bound Ni Ni * Yi in $1,000 Yi * in $1,000 yi = Yi * / Ni * in $
[1 ; 30,000) 1 1,088,495 1,883,765 14,335,594 74,842,664  
[30,000 ; 50,000) 30,000 332,057 795,270 12,931,107 60,507,070 76,084
[50,000 ; 75,000) 50,000 229,168 463,213 14,052,671 47,575,963 102,709
[75,000 ; 100,000) 75,000 117,246 234,045 10,063,119 33,523,292 143,234
[100,000 ; 200,000) 100,000 92,505 116,799 12,021,949 23,460,173 200,859
200,000 or more 200,000 24,294 24,294 11,438,224 11,438,224 470,825
Total   1,883,765   74,842,665    
Table 3.3. continued
Table 3.3. continued

Columns Ni * and Yi * simply correspond to the cumulative sums of Ni and Yi, respectively. The variable labeled yi represents the average income earned by the individuals lying in the [si ; si+1) income interval. The Pareto coefficient bi = yi / si divides the average income earned in the income class [si ; si+1) by the minimum income of that class. It is then straightforward to derive the parameter αi = bi / (bi – 1). Both αi and pi, the fraction of tax units earning more than si are used in the estimation of k = si [ pi (1/ α i) ].

Next, the lower bound si chosen in the calculation of threshold income TI90 is such that the fraction pi of tax units with income above si is as close as possible to the fractile TI90. This way of choosing si is one of Piketty’s (2001) contributions. Finally, the Pareto formula mentioned above is applied here: estimated decile TI90 = k / 0.1(1/ α ). Similarly, the formula estimating the top percentile is TI99 = k / 0.01 (1/ α ), and so on for the other threshold-income fractiles.

Getting average-income fractiles requires nothing more than multiplying the Pareto coefficient bi with the corresponding threshold income. Keeping the same illustration of the top decile as previously, average income level is defined as follows:

(0.2) AI90-100 = bi . TI90

Then, inter-fractiles are deduced from simple subtractions summarized in the table below.

Table 3.4. Inter-Fractiles Calculation
Table 3.4. Inter-Fractiles Calculation

After the computation of fractiles for each state and each year from 1913 to 2003, more adjustments need to be made. Section 3.3 describes the main corrections added to the time-series (and the appendices all other adjustments not specified in the text).

Notes
21.

The scalar α will be used again in the analysis on convergence (Chapter 6). In that chapter, the hypothesis that the top income fractiles are well approximated by the Pareto distribution will be extended to the functional form of the Lorenz curve, without making further assumptions. The Lorenz curve will be used to derive in all states the lower fractiles of the distribution (down to the lower decile).

22.

To calculate the top percentile instead, then the formula becomes: TI99 = k (1%)1/α.