3.2.2. Fractiles are Estimated by the Pareto Interpolation Method

The Pareto interpolation is a method of estimating either the median, or the deciles, or the percentiles of a population that follow a Pareto distribution.

The Pareto Distribution

Studying the income distributions of various countries, the Italian economist Vilfredo Pareto eventually noticed a specific pattern of the income allocation among individuals: Whenever the amount of wealth doubles, the number of people falls by a constant factor. In the theoretical literature, this constant factor is usually called the Pareto coefficient and is labeled b. This factor may vary from country to country, but the pattern remains basically the same. Buchanan (2002) comments further on the Pareto distribution:

‘Unlike a standard bell curve distribution, in which great deviations from the average are very rare, Pareto's so-called fat-tailed distribution starts very high at the low end, has no bulge in the middle at all, and falls off relatively slowly at the high end, indicating that some number of extremely wealthy people hold the lion's share of a country's riches. In the United States, for example, something like 80% of the wealth is held by only 20% of the people. But this particular 80-20 split is not really the point; in some other country, the precise numbers might be 90-20 or 95-10 or something else. The important point is that the distribution (at the wealthy end, at least) follows a strikingly simple mathematical curve illustrating that a small fraction of people always owns a large fraction of the wealth. ’

Visually, the Pareto distribution can be represented by the probability density function (PDF) taking the power form f(y) = α kα / y(1+α), where y stands for income, α is a scalar, and k, the minimum level of income in the distribution. It captures well the 80-20 percent rule as f(y), the “probability” or fraction of the population that earns a small amount of income per person (y), is rather high for low-income levels, then decreases steadily as income y increases: The higher the values of α, the wider the gap of inequality. ²¹

Figure 3.1. Pareto Probability Density Function

Justifying that an income allocation fits Pareto distribution is the first step towards the calculation of dispersion indicators, such as the 17 fractiles to be estimated. This estimation can be performed using the Pareto interpolation technique (where the curved portion of the cumulative curve is approximated to a straight segment).

The Pareto Interpolation Method

The Pareto interpolation method suggests a specific formula to estimate fractiles. In the case of the top decile TI90, the formula is written as follows: ²²

(0.1) TI90 = k (10%)1/α

The estimation of the parameters k and α start with the determination of s_i, the lower bound of the income interval [s_i; s_i+1) displayed in the IRS tables. For instance, the IRS table for Alabama in 2003 (listed in the table below), displays N_i and Y_i for each income bracket [s_i; s_i+1).

Table 3.3. All the Steps from IRS Tables to TI90
Income brackets	s_i lower bound	N_i	N_i *	Y_iin $1,000	Y_i * in $1,000	y_i = Y_i * / N_i * in $
[1 ; 30,000)	1	1,088,495	1,883,765	14,335,594	74,842,664
[30,000 ; 50,000)	30,000	332,057	795,270	12,931,107	60,507,070	76,084
[50,000 ; 75,000)	50,000	229,168	463,213	14,052,671	47,575,963	102,709
[75,000 ; 100,000)	75,000	117,246	234,045	10,063,119	33,523,292	143,234
[100,000 ; 200,000)	100,000	92,505	116,799	12,021,949	23,460,173	200,859
200,000 or more	200,000	24,294	24,294	11,438,224	11,438,224	470,825
Total		1,883,765		74,842,665

Table 3.3. continued

Columns N_i * and Y_i * simply correspond to the cumulative sums of N_i and Y_i, respectively. The variable labeled y_i represents the average income earned by the individuals lying in the [s_i; s_i+1) income interval. The Pareto coefficient b_i = y_i / s_i divides the average income earned in the income class [s_i; s_i+1) by the minimum income of that class. It is then straightforward to derive the parameter α_i = b_i / (b_i – 1). Both α_i and p_i, the fraction of tax units earning more than s_i are used in the estimation of k = s_i [ p_i (1/ α i) ].

Next, the lower bound s_ichosen in the calculation of threshold income TI90 is such that the fraction p_i of tax units with income above s_i is as close as possible to the fractile TI90. This way of choosing s_iis one of Piketty’s (2001) contributions. Finally, the Pareto formula mentioned above is applied here: estimated decile TI90 = k / 0.1(1/ α ). Similarly, the formula estimating the top percentile is TI99 = k / 0.01 (1/ α ), and so on for the other threshold-income fractiles.

Getting average-income fractiles requires nothing more than multiplying the Pareto coefficient b_i with the corresponding threshold income. Keeping the same illustration of the top decile as previously, average income level is defined as follows:

(0.2) AI90-100 = b_i . TI90

Then, inter-fractiles are deduced from simple subtractions summarized in the table below.

Table 3.4. Inter-Fractiles Calculation

After the computation of fractiles for each state and each year from 1913 to 2003, more adjustments need to be made. Section 3.3 describes the main corrections added to the time-series (and the appendices all other adjustments not specified in the text).

Notes

21.

The scalar α will be used again in the analysis on convergence (Chapter 6). In that chapter, the hypothesis that the top income fractiles are well approximated by the Pareto distribution will be extended to the functional form of the Lorenz curve, without making further assumptions. The Lorenz curve will be used to derive in all states the lower fractiles of the distribution (down to the lower decile).

22.

To calculate the top percentile instead, then the formula becomes: TI99 = k (1%)1/α.