×

Strong, weak and false inverse power laws. (English) Zbl 1100.62013

Summary: Pareto, Zipf and numerous subsequent investigators of inverse power distributions have often represented their findings as though their data conformed to a power law form for all ranges of the variable of interest. I refer to this ideal case as a strong inverse power law (SIPL). However, many of the examples used by Pareto and Zipf, as well as others who have followed them, have been truncated data sets, and if one looks more carefully in the lower range of values that was originally excluded, the power law behavior usually breaks down at some point. This breakdown seems to fall into two broad cases, called here (1) weak and (2) false inverse power laws (WIPL and FIPL, resp.). Case 1 refers to the situation where the sample data fit a distribution that has an approximate inverse power form only in some upper range of values. Case 2 refers to the situation where a highly truncated sample from certain exponential-type (and in particular, “lognormal-like”) distributions can convincingly mimic a power law.
The main objectives of this paper are (a) to show how the discovery of Pareto–Zipf-type laws is closely associated with truncated data sets; (b) to elaborate on the categories of strong, weak and false inverse power laws; and (c) to analyze FIPLs in some detail. I conclude that many, but not all, Pareto–Zipf examples are likely to be FIPL finite mixture distributions and that there are few genuine instances of SIPLs.

MSC:

62E15 Exact distribution theory in statistics
62-07 Data analysis (statistics) (MSC2010)
62E10 Characterization and structure theory of statistical distributions
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aitchison, J. and Brown, J. A. C. (1957). The Lognormal Distribution . Cambridge Univ. Press. · Zbl 0081.14303
[2] Albert, R., Jeong, H. and Barabási, A.-L. (1999). Diameter of the World-Wide Web. Nature 401 130.
[3] Amaral, L. A. N., Scala, A., Barthelemy, M. and Stanley, H. E. (2000). Classes of small-world networks. Proc. Natl. Acad. Sci. U.S.A . 97 11,149–11,152.
[4] American Iron and Steel Institute (1957). Directory of Iron and Steel Works of the United States and Canada , 28th ed. American Iron and Steel Institute, New York.
[5] Arnold, B. C. (1983). Pareto Distributions . International Co-operative Publishing House, Burtonsville, MD. · Zbl 1169.62307
[6] Asmussen, S., Klüppelberg, C. and Sigman, K. (1999). Sampling at subexponential times, with queueing applications. Stochastic Process. Appl. 79 265–286. · Zbl 0961.60080 · doi:10.1016/S0304-4149(98)00064-7
[7] Auerbach, F. (1913). Das Gesetz der Bevölkerungskonzentration. Petermanns Geographische Mitteilungen 59 74–76.
[8] Bak, P. (1996). How Nature Works . Copernicus, New York. · Zbl 0894.00007
[9] Barabási, A.-L. (2002). Linked: The New Science of Networks . Perseus, Cambridge, MA.
[10] Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512. · Zbl 1226.05223 · doi:10.1126/science.286.5439.509
[11] Barabási, A.-L. and Bonabeau, E. (2003). Scale-free networks. Scientific American 288 60–69.
[12] Berg, L. (1958). Asymptotische Darstellungen für Integrale und Reihen mit Anwendungen. Math. Nachr. 17 101–135. · Zbl 0086.08802 · doi:10.1002/mana.19580170113
[13] Bianconi, G. and Barabási, A.-L. (2001). Competition and multiscaling in evolving networks. Europhys. Lett. 54 436–442.
[14] Bookstein, A. (1997). Informetric distributions. III. Ambiguity and randomness. J. American Society for Information Science 48 2–10.
[15] Bowley, A. L. (1899). The statistics of wages in the United Kingdom during the last hundred years. Part IV. Agricultural wages. J. Roy. Statist. Soc. 62 555–570.
[16] Box, G. E. P. and Muller, M. E. (1958). A note on the generation of random normal deviates. Ann. Math. Statist. 29 610–611. · Zbl 0085.13720 · doi:10.1214/aoms/1177706645
[17] Bulmer, M. G. (1974). On fitting the Poisson lognormal distribution to species-abundance data. Biometrics 30 101–110. · Zbl 0276.62088 · doi:10.2307/2529621
[18] David, H. A. (1970). Order Statistics . Wiley, New York. · Zbl 0223.62057
[19] Downey, A. B. (2003). Lognormal and Pareto distributions in the internet. Available at http://allendowney.com/research/ longtail.
[20] Edwards, A. M. (1943). Sixteenth Census of the United States, 1940. Population. Comparative Occupation Statistics for the United States, 1870 to 1940 . U.S. Government Printing Office, Washington.
[21] Embrechts, P., Klüppelberg, C. and Mikosch, T. (1997). Modelling Extremal Events . Springer, Berlin. · Zbl 0873.62116
[22] Fishman, G. S. and Moore, L. R. (1982). A statistical evaluation of multiplicative congruential random number generators with modulus \(2^31-1\). J. Amer. Statist. Assoc. 77 129–136. · Zbl 0477.65004 · doi:10.2307/2287778
[23] Gibrat, R. (1931). Les Inégalités Économiques . Libraire de Recueil Sirey, Paris. · JFM 57.0635.06
[24] Gong, W., Liu, Y., Misra, V. and Towsley, D. (2001). On the tails of Web file size distributions. In Proc. 39th Annual Allerton Conference on Communication, Control and Computing . Univ. Illinois Press, Champaign. Available at http://www1.cs. columbia.edu/ misra/pubs/allerton.pdf.
[25] Graham, R. L., Knuth, D. E. and Patashnik, O. (1994). Concrete Mathematics: A Foundation for Computer Science , 2nd ed. Addison–Wesley, Reading, MA. · Zbl 0836.00001
[26] Grandell, J. (1997). Mixed Poisson Processes . Chapman and Hall, New York. · Zbl 0922.60005
[27] Hall, P. (1979). On the rate of convergence of normal extremes. J. Appl. Probab. 16 433–439. JSTOR: · Zbl 0403.60024 · doi:10.2307/3212912
[28] Hanley, M. L. (1937). Word Index to James Joyce’s Ulysses . Univ. Wiscons
[29] Ijiri, Y. and Simon, H. A. (1977). Skew Distributions and the Sizes of Business Firms . North-Holland, Amsterdam. · Zbl 0392.90003
[30] Johnson, N. L., Kotz, S. and Balakrishnan, N. (1994). Distributions in Statistics: Continuous Univariate Distributions 1 , 2nd ed. Wiley, New York. · Zbl 0811.62001
[31] Kendall, M. G. (1961). Natural law in the social sciences. J. Roy. Statist. Soc. Ser. A 124 1–16. JSTOR: · Zbl 0099.24403 · doi:10.2307/2343149
[32] Klein, L. R. (1962). An Introduction to Econometrics . Prentice–Hall, Englewood Cliffs, NJ.
[33] Korčák, J. (1938). Deux types fondamentaux de distribution statistique. Bull. Inst. Internat. Statist. 30 (3) 295–298. · JFM 66.0613.03
[34] Krugman, P. (1996). The Self-Organizing Economy . Blackwell, Cambridge, MA.
[35] Lebergott, S. (1959). The shape of the income distribution. American Economic Review 49 328–347.
[36] Lotka, A. J. (1926). The frequency distribution of scientific productivity. J. Washington Academy of Sciences 16 317–323.
[37] Macauley, F. (1922). Pareto’s law and the general problem of mathematically describing the frequency distribution of income. In Income of the United States. Its Amount and Distribution 1909–1919 2 Chap. 23. National Bureau of Economic Research, New York.
[38] Mandelbrot, B. (1960). The Pareto–Lévy law and the distribution of income. Internat. Econom. Rev. 1 79–106. · Zbl 0201.51101 · doi:10.2307/2525289
[39] Mandelbrot, B. (1982). The Fractal Geometry of Nature . W. H. Freeman, San Francisco. · Zbl 0504.28001
[40] Mandelbrot, B. (1997). Fractals and Scaling in Finance: Discontinuity, Concentration, Risk . Springer, New York. · Zbl 1005.91001
[41] McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions . Wiley, New York. · Zbl 0882.62012
[42] Mitzenmacher, M. (2001). A brief history of generative models for power law and lognormal distributions. In Proc. 39th Annual Allerton Conference on Communication, Control and Computing 182–191. Univ. Illinois Press, Champaign.
[43] Montroll, E. and Shlesinger, M. F. (1982). On \(1/f\) noise and other distributions with long tails. Proc. Natl. Acad. Sci. U.S.A. 79 3380–3383. JSTOR: · Zbl 0498.60027 · doi:10.1073/pnas.79.10.3380
[44] Montroll, E. and Shlesinger, M. F. (1983). Maximum entropy formalism, fractals, scaling phenomena, and \(1/f\) noise: A tale of tails. J. Statist. Phys. 32 209–230. · doi:10.1007/BF01012708
[45] National Resources Committee (1938). Consumer Incomes in the United States: Their Distribution in 1935–36 . U.S. Government Printing Office, Washington, DC.
[46] Paddock, R. H. and Rodgers, R. P. (1939). Preliminary results of road-use studies. Public Roads 20 45–63.
[47] Pareto, V. (1895). La legge della demanda. Giornale degli Economisti 45–63.
[48] Pareto, V. (1897). Cours d’Économie Politique 2 . F. Rouge, Lausanne.
[49] Parr, J. B. and Suzuki, K. (1973). Settlement populations and the lognormal distribution. Urban Studies 10 335–352.
[50] Perline, R. (1982). An extreme value model of weakly harmonic (Pareto–Zipf type) laws. Ph.D. dissertation, Univ. Chicago.
[51] Perline, R. (1996). Zipf’s law, the central limit theorem, and the random division of the unit interval. Phys. Rev. E 54 220–223.
[52] Perline, R. (1998). Mixed Poisson distributions tail equivalent to their mixing distributions. Statist. Probab. Lett. 38 229–233. · Zbl 0917.60038 · doi:10.1016/S0167-7152(98)00019-4
[53] Polfeldt, T. (1970). Asymptotic results in non-regular estimation. Skand. Aktuarietidskr. 1970 suppl. 1–78. · Zbl 0223.62056
[54] Price, D. J. de S. (1963). Little Science, Big Science . Columbia Univ. Press, New York.
[55] Reed, W. J. (2001). The Pareto, Zipf and other power laws. Econom. Lett. 74 15–19. · Zbl 1007.91046 · doi:10.1016/S0165-1765(01)00524-9
[56] Reed, W. J. and Hughes, B. D. (2002). From gene families and genera to incomes and internet file sizes: Why power laws are so common in nature. Phys. Rev. E 66 067103.
[57] Sichel, H. S. (1975). On a distribution law for word frequencies. J. Amer. Statist. Assoc. 70 542–547.
[58] Simon, H. (1955). On a class of skew distribution functions. Biometrika 52 425–440. Also in Ijiri and Simon (1977). JSTOR: · Zbl 0066.11201 · doi:10.1093/biomet/42.3-4.425
[59] Simon, H. A. and Bonini, C. P. (1958). The size distribution of business firms. American Economic Review 48 607–617. Also in Ijiri and Simon (1977).
[60] Stamp, J. (1914). A new illustration of Pareto’s law. J. Roy. Statist. Soc. 77 200–204.
[61] Stewart, J. (1994). The Poisson–lognormal model for bibliometric/scientometric distributions. Information Processing and Management 30 239–251.
[62] Thatcher, A. R. (1976). The new earnings survey and the distribution of earnings. In The Personal Income Distribution (A. B. Atkinson, ed.) 227–268. Westview Press, Boulder, CO.
[63] Watts, D. J. (2003). Six Degrees: The Science of a Connected Age . Norton, New York.
[64] Zipf, G. (1947). The frequency and diversity of business establishments and personal occupations: A study of social stereotypes and cultural roles. J. Psychology 24 139–148.
[65] Zipf, G. K (1949). Human Behavior and the Principle of Least Effort . Addison–Wesley, Cambridge, MA.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.