[Kumiko Tanaka-Ishii] A Quest for Constancy Measures of Natural Language

Presto Researcher

Kumiko Tanaka-Ishii
The University of Tokyo


In this project, I study mathematical properties of natural language through the notion of computational constancy measures. A constancy measure characterizes a given text by having an invariant value for any size larger than a certain amount. The study of such measures has a 70-year history dating back to Yule’s K.
Constancy measures highlight the differences between mathematical models of natural language and real data. This study therefore leads to a better understanding of the nature of natural language. It should then support better modeling of language, which is the foundation of language engineering, including such important applications as automatic translation and information retrieval.
I seek constancy measures from different perspectives and verify their mathematical and empirical natures on both random and real language data. Moreover, the obtained measures enable consideration of the degree of complexity of natural language with respect to other kinds of real data.

