TibNorm: Script to Normalise Tibetan Text

TibNorm is a utility for producing normalised versions of Tibetan texts to make them easier for contemporary users to search and read, in line with current Tibetan writing conventions.

Kyogoku, Yuki, Robbie Barnett, & Franz Xaver Erhard. (2024). TibNorm – Normaliser for Tibetan (Version v1). Zenodo. https://doi.org/10.5281/zenodo.10806456

See code on Diverge github

As part of the normalisation process, TibNorm:

changes Tibetan numbers into Arabic numerals
changes Tibetan brackets and quotation marks into the standard western equivalents
removes a ། if found after a ཀ, ག or ཤ, with or without a vowel – adds a ་ between ང and །
reduces two or more ་ to a single one
changes ཌ་ or ཊ་ to གས་ unless preceded by a white space, tab, or new line
changes non-standard “illegal” stacks into standard ones
deletes a ། if found at the beginning of a line

TibNorm also expands abbreviations so that they are shown in their full form. For abbreviations in classical Tibetan, TibNorm draws from the list of over 6,000 classical Tibetan abbreviations compiled by Bruno Lainé of the Tibetan Manuscript Project Vienna (TMPV) as part of the project’s Resources for Kanjur and Tanjur Studies. In TibNorm, the user can manually change the flag in the abbreviations table to exclude any abbreviation that they don’t want to expand.

TibNorm was developed for the Divergent Discourses project by Yuki Kyogoku of Leipzig University.