A Corpus-Based Model of the Middle Turkic Language using Weighted Averaging

Authors

  • B.I. Boborahimov Digital Technologies and Artificial Intelligence Development Research Institute Author
  • D.A. Akhmedjanova Uzbekistan State World Languages University Author
  • Kh.D. Sharipov Digital Technologies and Artificial Intelligence Development Research Institute Author

DOI:

https://doi.org/10.71310/pcam.5_69.2025.10

Keywords:

Middle Turkic language, averaging model, corpus linguistics, artificial language, mathematical linguistics, Turkic languages

Abstract

This paper formalizes and extends the classic concept of the "Middle Turkic"language (Karimov–Mutalov, 1992) by applying tools from corpus and computational linguistics. The proposed model frames the selection of an "optimal"form for a linguistic unit as a multi-criteria optimization problem over a set of factors, including frequency, prevalence, simplicity, cultural compatibility, and a matrix of interlingual mutual intelligibility. To ensure stability and interpretability, the model introduces soft (softmax) and "saturating" weights, domain and diachronic modifiers, and inter-language "fairness"constraints. The model achieves up to 89% accuracy in linguistic form selection, a 78% intelligibility level, high grammatical consistency (0.92), and a high assimilation rate (0.85). The paper concludes by discussing the model’s potential applications in machine translation, its limitations, and directions for future validation.

References

Karimov B.R., Mutalov Sh.Sh. O’rtaturk tili (“ORTATURK” – the averaged Turkic language). – Toshkent: Mehnat, 1992.

Johanson L., Csatу Y.B. The Turkic Languages. – 5th ed. – London: Routledge, 1998.

Languages Special Issue: Theoretical Studies on Turkic Languages // Languages journal. – 2023.

Kornfilt J. Turkish. – London: Routledge, 1997. – 575 p.

Yace C.B. The Case of the Turkish Language Reform // Omnes Journal of Multicultural Society. – 2016. – .6(2). – P. 1-20.

Bazarbayeva Z.M. Syllable Theory and Diachronic Phonology of Turkic Languages // Journal of Language and Linguistic Studies. – 2017. – .13(2). – P. 100-115.

Guksel A., Kerslake C. Turkish: A Comprehensive Grammar. – 5th ed. – London: Routledge, 2005.

Uzsoy A.S., Guksel A. Focus in Turkish // Lingua. – 2003. – .113(11). – P. 1023-1052.

Johansson L. Contact-induced change in Turkic languages // In Language Change: Contributions to the Study of Its Causes. – Berlin: Mouton de Gruyter, 2002. – P. 133-156.

Stachowski M. Dolgan and Yakut: Historical phonological perspectives. – Krakуw: Jagiellonian University Press, 2013.

Starostin S., Dybo A., Mudrak O. Etymological Dictionary of the Altaic Languages. – Leiden: Brill, 2003.

Isbarov K., et al. TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages. – 2025. – https://arxiv.org/abs/2502.11020.

Yeshpanov R., et al. Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration. – 2023. – https://arxiv.org/abs/2305.15749.

Johansson L., Csatу Y.A. The Turkic Languages and Peoples: An Introduction to Turkic Studies. – Wiesbaden: Harrassowitz, 2006.

Aydemir Y. Language Policy and Planning in Modern Turkey // Journal of Sociolinguistics. – 2010. – .14(2). – P. 214-235.

Johansson L. The History of Turkic // The Oxford Handbook of Iranian Languages (Ed. S. Windfuhr). – Oxford: Oxford University Press, 2013. – P. 55-78.

Downloads

Published

2025-11-16

Issue

Section

Статьи