A history of Bulgarian orthography (Letters to the LangDev mailing list) *** A history of Bulgarian orthography (part 1) [22 December 1999] Once upon a time there was a Tongue, and that Tongue had Vowels. And four of those were long, and four were not; and four were high, and four were not; and four were front, and four were not. Art keeping count, Reader? How many? Not four-and-twenty; no. Just eight. Twice two times twain, as the vertices of a cube. At times they would slide along the edges of the cube: {-lng} Vowels became {+lng} in derived progressive stems, and {-frt} Vowels might correspond to {+frt} ones in ancient cognate stems. But few other things happened to them. Until things began to happen. A Law was passed, which came to be known as the Open Syllable Law; and it said, That no Syllable might have a Coda any longer. Then did some of the Codas join the Onset of the next Syllable, peacefully or otherwise; others were lost without a trace. So *_gos-tIs_ became _go-stI_, *_ed-t-_ became _e:-st-_. But the ones with higher sonority turned to the Peak of the same Syllable and were swallowed by it. The liquids were preserved, but were no longer Codas: *_gor-dUs_ became _gro:-dU_ in the South, _go-ro-dU_ in the North. The nasals and the glides were dissolved, and new Vowels came into being. Aye, new qualities of Vowels. For by that time the ancient {+lng} and {-lng} Vowels had come to be different in quality also. The differences in quantity were still there, but were no longer the only ones, and were not to bide much longer. So now there were eleven Vowels: Long: a < *o: y < *U: u < *{}w o~ < *{-frt}{+nas} & < *e:, *{-frt}j i < *I:, *{+frt}j, occasionally *{-frt}j e~ < *{+frt}{+nas} Short: o < *o e < *e Infrashort: U < *U I < *I And then Letters were made for all the Vowels, so that every Vowel could be written exactly as it was spoken. But no sooner was the Law done than it began to be undone. In divers ways in the various lands where the Tongue was spoken, or rather where its Dialects and later daughter and granddaughter Languages were spoken, some Vowels fell together, others stopped being pronounced at all; and the scribes were often at a loss what to do with twice as many Letters as they knew Vowels, and found it hard to follow the practice of the ancient books. There was more to come. (To be continued) [The Bogomils (an ancient sect that is one of Bulgaria's claims to glory) taught that God had wrought all the unseen things, but a fallen angel by the name of Satanael all the seen ones. The first, methinks, includes the mechanism of distinctive features, which is all symmetry, and it favours hypercubes. The second accounts for man's woefully wry articulatory and acoustic apparatus, which constantly creates asymmetry by conditional neutralisation of some features. So it befell that, as told in the parable, the Proto-Slavic system fell apart. Turkish offers an unusual case of stable symmetry, perhaps because there are other phenomena (vowel harmony) to keep it in place.] [On the subject of [j-]. That did not generally have phonemic status before {+frt} vowels, where palatalisation of consonants was also neutralised. It was not written by a separate letter; there were special letters, mostly ligatures in origin, for /j/ or /;/ plus a vowel. So there were letters for _ja_, _ju_ and _jo~_; but there were no _j+..._ variants of _o_, _U_ or _y_, because those had fallen together with _e_, _I_ and _i_, respectively. Various ways of distinguishing _e_ and _je_, and _e~_ and _je~_, were developed in a few places, but they were not a stable part of the script.] *** A history of Bulgarian ... (part 2): Russian [29 December 1999] Yes, there is a reason to talk about Russian here. Three reasons, in fact: (1) some netters are acquainted with it or have used it in their langdeving, and they may find the various details exposed here interesting and/or useful; (2) it illustrates many of the factors that were involved in the development of Bulgarian writing and spelling; (3) it was itself very much one of those factors. The two languages have had an interesting historical relationship. They were little more than dialects of one another when one of them, the one geographically closer to Byzantium, was written down; and this one became the language of lore for the speakers of both (and other tongues beside), so that in all contexts having to do with writing the terms `Old Slavic' and `Old Bulgarian' refer to the same thing. It arrived in Russia together with Christianity and its books, in the late 10C--early 11C, and a somewhat Russianised version of it came to be known as `Church Slav(on)ic'. Many OBg/ChSl words have been borrowed into Russian, including such as had regular Russian cognates in existence, usually leading to one of the following frequent situations: (1) B and R differ in register: (a) R is a substandard word and B the standard one, (b) R is the neutral word and B is elevated (literary); (2) B and R differ in meaning, and then usually B has the more abstract, metaphorical etc. sense. Examples (first the regular Ru word, then the cognate borrowed from OBg/ChSl, which is also the only word in current Bulgarian): (1a) _nadëzha_ `hope' (regional), _nadézhda_ dto. (standard); (1b) _górod_ `town, city' (neutral), _grad_ dto. (literary); (2) _gorozhánin_ `city-dweller', _grazhdanín_ `citizen'; _golová_ `head (of body)', _glavá_ `head (of family, of state); chapter'. * * * Basic literacy was not exactly rare in the Middle Ages, but the high art of writing was the clergy's reserved territory, and in its complexity they saw a source of prestige -- and of income. To what had originally been a straightforward phonetic script (though containing a few excess letters for Greek loanwords) they added a wealth of variant letters, 2 (two) aspiration marks (directly copied from Greek, though they no longer meant anything there either and had never meant anything in Ru or Bg), and 3 (three) functionally identical stress marks, also from Greek. There was also a host of ligatures and abbreviations (designed to save parchment -- increasingly at the expense of brain cells). When Peter the Great imported the printing press, he also cast out some of the now useless letters and reformed the shape of the rest after the image of the Roman script. Also he abolished the old abbreviations and all the diacritics. So if you regret the fact that Russian does not indicate the placement of stress, you now know whom to blame. The letters he abolished were the alternative variants of _e_, _o_ and _u_, zelo (originally /dz/, which had fallen together with /z/), on (Gk omega), ksi, psi, ja. The priests had their way with fita (Gk theta) and izhica (Gk ypsilon) until the year after the Russian Revolution, when the second great orthographic reform took place. * * * PSl had two {-hi,-frt} vowels, short and long, usually written as _a_ (though I wrote them as _o_ in my first tale). In OSl (OBg) *_a_ > /o/, while *_a:_ > /a/. In the contemporary predecessor of Moscow Russian *_a_ became /o/ only if stressed, /a/ otherwise. Due to the influence of OBg, however, all reflexes of {-hi,-frt,-lng} were written as _o_, and are to this day, much to the regret of schoolkids. OSl _e~_ > Ru /ja/, falling together with OSl _ja_; the letters became near-allographs in ChSl, and Peter chose a simplified form of the ex-nasal and discarded the old _ja_. OSl _(j)o~_ > Ru /(j)u/; the letters fell into disuse very early. OSl _(j)u_ > Ru /(j)u/. Peter kept the second half of the digraph _u_; the letter _ju_ was not changed. OSl _y_ > Ru /y/. Originally that was written as _Ui_, which must have echoed its pronunciation in OBg, but the first half was replaced by _I_ early in Russia. OSl _i_ > Ru /i/. Peter kept three letters for this: izhica stood for ypsilon in Greek loans, iota (Roman _i_) was written before another vowel, izhe (Gk eta) elsewhere. The first two were abolished in 1918. The Cyrillic _j_ (izhe with breve) was introduced after Peter's time. OSl _e_ > Ru /jo/ when stressed and not before a palatalised consonant, /e/ elsewhere. Both reflexes were written as _e_ (the fact that in ChSl the vowel is always /e/ being at least part of the reason). There were attempts to introduce a separate letter for /jo/ -- a modified _ju_, then _e_ with diaeresis -- but unfortunately they didn't catch on, so now _ë_ is only used when disambiguation is critical. Originally /e/ was always preceded by palatalisation (or by /j/ when word-initial), as is the usual way with Russian front vowels. A separate letter for hard /e/ was only deemed necessary after Peter's time, and then a reversed open _e_ was borrowed from the then current Bulgarian hand, where it was a mere allograph of _e_, and made useful. OSl _&_ was a relatively high vowel in ORu, probably an [ie] diphthong. It kept a distinct value until quite late, so the letter (jat') was also needed, but eventually it fell together with /e/, and the distinction became a purely orthographic one. In theory it was possible to tell which /e/ was which (the ones from _&_ do not alternate with /jo/), but the rule had exceptions (Ru _zvezdá_ `star' < OSl _zv&zda_, so the letter was jat', despite gen.pl. _zvëzd_), and not every word with /e/ in it has cognates where the /e/ is in /jo/-able position (let alone such that one can think of in real time), so long lists of jat'-ridden words were simply learned by rote -- until 1918. OSl _U_ and _I_ were always unstable, and became more so with time. (1) Where they could not fall out without rendering the word unpronounceable, they became /o/ and /e/, respectively, and began to be so written, too. This was a very early process. (2) Elsewhere they fell out. (2.1) If a soft vowel followed, the syllable break remained where it was, so a four-way distinction was created between _ta_ > /ta/, _tja_ > /t;a/, _tUja_ > /tja/ and _tIja_ > /t;ja/. As such both letters served a useful purpose, and they continue to serve it. (2.2) If another consonant followed, _U_ ceased to be written relatively early, whereas _I_ remained, but was now perceived as a mark of the only thing left of what had once been a vowel (viz., palatalisation). (Where the consonant failed to be palatalised, _I_ also ceased to be written, as in Ru _dni_ `days' from OSl _dIni_.) (2.3) Word-finally _I_ also became a palatalisation marker (except after a shibilant, where it is mute). _U_ was not a marker of anything, but continued being written until 1918, wasting large amounts of ink and timber (one author estimates that over 3% of all letters in pre-1918 Russian writing were redundant hard signs). Why Peter didn't discard it is anyone's guess; in any case, the Revolution did. A history of Bulgarian orthography (part 3) [30 December 1999] Yes, by _y_ I mean a {+hi,-frt,-rnd} vowel. This convention is common to all romanisations of Slavic. * * * In 1396 Bulgaria fell under the yoke of the Ottoman Empire, there to remain until 1878. This slowed down all literary activity in the country, which in turn (along with other factors) sped up the evolution of the spoken language away from its roots. The late 18th century, and most of the 19th, were Bulgaria's Renaissance. The national language, complete with a written form, was among the things that were to take shape at that time. But it was a situation very different from Peter's Russia, where there was a state -- a strong one -- and an established lay high speech -- that of Moscow. In Bulgaria there was a multitude of dialects, and no central Bulgarian-speaking authority to coordinate things. And since Bulgaria had not the resources to satisfy its need of books, lay or clerical, Russia now `returned the favour'. Bulgaria imported books, and with them many Russian and ChSl words, including many that were Bg loans in Ru itself. Some of those had synonyms or regular cognates in Bg, and then one of several situations would arise: (1) R and B differ in register: (a) B is a substandard word and R the standard one, (b) B is the neutral word and R is elevated; (2) R and B differ in meaning, and then usually R has the more abstract, metaphorical etc. sense. So _chčdo_ (B) is `child, offspring (of one's parents)', whereas _chŕdo_ (from ChSl) is `child, offspring (of the Church)'. An old priest might call a much younger person the former (affectionately, referring to his age) or the latter (formally, referring to his office and rank). In Ru _odézhda_, the standard word for `clothes', is a loan fr@m OBg/ChSl; the regular Ru development _odëzha_ is regional. Bg borrowed it back, but it means `garment', usually when talking of divine service; the regular word for `clothes' is not related. The Ru word _lev_ `lion' is likewise a ChSl loan -- the expected *_lëv_ does not exist (though a man called _Lev_ -- hi, Leo -- is usually nicknamed _Lëva_). In Bg `lion' is _lřv_, the regular cognate; the Ru loan _lev_ is the name of Bulgaria's currency, which goes back to the lion being the nation's totem animal. In Ru _dvizhénie_ is `moving (nomen actionis)' as well as `movement, motion'; in Bg it's been borrowed for the latter meaning, whereas the former is the homebred cognate _dvězhene_. And so on. But not only words were borrowed; orthographic practices were, too. * * * The various dialects of Bg have from 5 to 9 vowels. Present-day standard Bg has six, /a e i o u @/, with no /j/ or palatalisation before front vowels and significant, but not complete, raising of unstressed {-hi} vowels. No palatalisation before a consonant or in final position either. Herein the development of the OBg vowels; the letters we shall talk about at another time: OBg _(j)a_, _o_, _(j)u_, _e_, _i_ > Bg ditto. (Though unstressed _i_ after another vowel became /j/.) OBg _y_ > Bg /i/. OBg _e~_ > Bg /e/. OBg _(j)o~_ > Bg /(j)@/. (These stayed nasal until quite late, and may still be in some dialects.) OBg _&_ > Bg /ja/ when stressed and not before a syllable with a front vowel, /e/ otherwise. There are many exceptions, however, due to analogy and other reasons. OBg _U_ and _I_ were always unstable, and became more so with time. (1) Where they could not fall out without rendering the word unpronounceable, _U_ became /@/ and _I_ became /;@/, /@/ or /e/. (I write /;@/ here, not /j@/, because a consonant always preceded it.) (2) Elsewhere they fell out; and if a syllable break followed, it was also lost, so all of _tja_, _tUja_ and _tIja_ > /t;a/. A history of Bulgarian orthography (part 4) [04 January 2000] Three questions were asked, and thrice each one was answered: (1) Must there be one speech, and one writing? a. nay and nay; b. nay and aye; c. aye and aye. (2) What must a word know? a. its roots; b. its kin; c. itself. (3) What must a tongue know? a. its roots; b. its peers; c. itself. So if one man says [maIt] and the other says [mIxt], (1a) each might spell the word in a way that reflects his own pronunciation; or (1b) they might both spell it the same, say _might_, but pronounce it in different ways; or (1c) one pronunciation, say [maIt], might be made standard and reflected in the spelling, and the others would have to cope with that. And a word such as [saIn] might be (2a) spelt with a _g_ in it because it comes from Latin _signum_, or (2b) spelt with a _g_ because it is cognate to ['sIgn@l], or (2c) spelt without _g_, because no [g] is pronounced. And the whole orthography might be made to assert (3a) the tongue's ancient lineage and old glory, or (3b) its kinship and liaisons with others (this one need not be a conscious priority; see below), or (3c) its individual and inimitable character (this is where national scripts and `national characters', such as Spanish _ń_, come in). * * * The Bulgarian loremasters tended to be under the influence of Church Slavonic, which for a long time they equated with Old Slavic. Most were educated (or at least had lived) in the free Orthodox Slavic lands, Russia and Serbia (when the latter was liberated). And it showed in the way they wrote. Mostly they were confused by the sound /@/, which did not exist in Russian (and, by extension, ChSl). One writer called it an abomination (how can Bg need a sound that is absent in ChSl? can the daughter be of a different nature than the mother, can a woman give birth to a monkey, or vice versa?) and wanted it abolished and replaced by /o/, /e/ or /u/, mirroring the development of _I_, _U_ and _o~_ in ChSl/Ru. The less radical ones merely wanted a way to write it; but the letter _o~_ and the original use of _U_ (and _I_) had been forgotten. So some started to write /@/ as _a_ with a breve, as in Roumanian; in unstressed position, however, unadorned _a_ was often written, the rest being left to reduction. And most employed the letter _e~_ in its ChSl function, for /ja/, although its reflex in Bg is /e/. Some writers also used _e~_ with a breve for stressed /j@/. That's what Peter Beron did in his famous _Fish Primer_, printed in Bucharest in 1824. Those breved _a_ and _e~_, plus a special letter for /dZ/ (the same one that is used now in Serbian), were the only things that set his spelling apart from ChSl. So he used the whole array of diacritics, variant letters and redundant letters for Greek words. All of those were abandoned, however, when the civilian Cyrillic style was introduced in the mid-19C. Aye, even theta and izhica, which lived on in Russia until 1918. Only iota stayed for a while, because a few were considering using it for /j/ or replacing the descendant of eta (the regular Cyrillic _i_) with it. * * * Eventually the loremasters formed schools, and some of those were liberal and some were conservative. A well-known representative of the latter was Najden Gerov, whose dictionary, published in six instalments in 1985--1908, is a very important reference work. He wanted the written language, including its orthography, to be based on the speech of the people. Except he wanted it to be all things to all people. And since no two OSl vowels have the same reflexes in all dialects, he ended up employing all OSl vowel letters (including _je_, though not _je~_); he also found a job for _ë_ /jo/. In all he used 16 vowel letters. (His ery was made with a big er, not a wee one as most other writers'. And yes, he used iotated versions of the regular Cyrillic _a_ and _e_ -- for him the `reversed R' was not a _ja_, but a proper civilian form of the wee jus, and was to be pronounced as /e/, /ja/ or perhaps /en/, depending on one's native dialect.) The orthography that was made official by law in 1899 may look very conservative now -- but it was quite moderate compared to Gerov's (and to some other projects). (To be continued) A history of Bulgarian orthography (part 5) [06 January 2000] The belief that Bg was a direct descendant of ChSl didn't make it into the 2nd half of the 19C, and the plain and iotated big jus (_o~_ and _jo~_) were restored in their etymological places. Some advocated using them for /@/ and /j@/ (or /;@/) everywhere. That proposal clashed against tradition, however, as well as the fact that the common standard language wasn't done taking shape yet, and one of the most authoritative loremasters (Marin Drinov, the founder of the Academy of Sciences) pointed out that in some parts OBg _o~_ was sounded as the vowel in English _bud_ and OBg _U_ as the one in _bird_. That being so, the iotated jus was effectively restricted to the 1st and 3pl present tense endings of verbs with a soft final stem consonant. (The same forms of verbs with a hard stem contained a plain jus.) But that was not universal: some dialects had /a/ (resp. /ja/) in those endings, stressed or not, and their speakers (among whom were some of the most successful writers, journalists and editors) wrote them so, which meant that they had no use for the iotated jus. * * * Word-final _U_ wore off early, as in all Slavic languages. Since that was almost the only way for a word to end in a hard consonant, the direction of the implication was reversed from the actual "_bogU_ `god' is consonant-final because _U_ used to be pronounced, and therefore written, at the end" to the apparent "_U_ must be written, though not pronounced, at the end of _bogU_ because it is consonant-final". So _U_ came to be written after every word-final hard consonant, including such as were borrowed from other languages after the fall of the ers and had never ended in any vowel represented by _U_. Many realised that that was just a waste of paper and ink (12% of it all, according to one estimate), but tradition was strong. Word-final _I_ also fell off. The consonant before it remained soft for a long time, but in most dialects all palatalisation was eventually lost except before a [-frt] vowel. Yet _I_ was still written at the end of certain words. There were rules that helped one determine if a consonant-final word was to be written with _U_ or _I_ at the end, but errors were common. Stem-internally OBg _I_ had become /@/, and was now written _U_ (except for Gerov and his school, who insisted on _I_). But the fact that it had had something to do with palatalisation got it a new job: it began to be used for palatalisation (but not /j/!) before /o/ and sometimes /e/, there being no universally accepted iotated forms of those vowels. Some writers, however, preferred _j_ (_i_-breve) even after a consonant. * * * There was one context in which final _U_ and _I_ were not lost. OBg did not have a definite article, but the placement of the demonstrative pronoun _tU_ after a noun was common, and in Bg (though not in the other Slavic languages) the two were pulled into a single form. The article then lost its er, but protected the noun's, where _U_ > /@/ and _I_ > /;@/, eg _bykU tU_ `bull that' > /bi'k@t/, _konI tU_ `horse that' > /'kon;@t/ (both stress patterns are frequent with monosyllabic stems). So in the dialects that formed the standard; but just about any [-frt], resp. [+frt], vowel can be found in some part of the country. Also depending on the region, but orthogonally to both stress and the quality of the vowel, the /t/ of the article could wear off, giving such forms as /bi'k@/, /'kon;@/. Two things about this whole matter confused the loremasters. First, they didn't know which of the many forms of the article to choose as standard. (Their usual tie-break, OBg, was useless, since it didn't have an article.) One of them suggested that three forms be taken, but for different syntactic functions, both to keep everyone happy and to make up for the loss of morphological case. It was a crazy idea, but was not rejected altogether, and although it was decided that the vowel of the article had to be /@/, every standard has acknowledged two forms, a long and a short one (with and without /-t/), with artificial rules governing the choice between them. Second, there was the matter of the spelling. In the long form _-tU_ was added to the noun, which already ended in _-U_ or _-I_, so in that position (though not in any other) _I_ was sounded as /;@/. That was etymologically precise, although synchronically the vowel before /t/ belonged to the article, not to the noun. The short form was tougher: the article was just /@/ or /;@/, but it could not be written as _-U_/_-I_, because word-finally they had to be mute, or as _-o~_/_-jo~_, as they didn't belong there etymologically. So _-a_/_-ja_ became the preferred spelling (though the pronunciation it was derived from was not preferred!), even when stressed, and with nothing but context to indicate whether _kraka_ was /kra'ka/ `legs' or /kra'k@/ `the leg'. * * * Thus in the last quarter, and especially in the last decade, of the 19C a _de facto_ standard spelling already existed, although the search for a better one never ceased; every now and then the Acad of Sci would come out with a project, or the Min of Ed (after the Liberation) would form a committee with the purpose of making one, or a writer (or group of them) would launch a journal and try out a new spelling in it. But the climate turned out to be hostile to really radical changes. *** A history of Bulgarian orthography (part 6) [11 January 2000] In 1899 a decree of the Min of Ed made the _de facto_ orthography official. One change was made: the juses were taken out of the 1sg and 3pl verb endings and replaced by _a_ and _ja_, although the speakers of most dialects pronounced them as /@/ and /;@/, stressed or not. The change implied that the iotated ius went out of business, although it could still be seen in some editions, at the very end of the alphabet. This orthography made use of 32 letters, which were the ones that Russian now uses minus ery and reversed /e/, but plus two others: (1) jat (called `double /e/'), which was to be written wherever OBg had had _&_ and pronounced as /e/ if it was unstressed or followed by a palatalised or palatoalveolar consonant or a syllable with a front vowel and as /ja/ otherwise (though exceptions abounded), and (2) big jus (called `wide /@/' because of the width of the letter), which was to be written stem-internally wherever OBg had had _o~_. * * * After WWI the Agrarian Union was voted into power. That was a left-of-the-centre organisation that enjoyed wide support among the population of then predominantly rural Bulgaria. The new government started carrying out one democratic reform after the other. Most of those were designed to soften the effect of the war, which BG had fought on the wrong side, and had been severely punished for it. That was hard work, so the government took their time with the orthography. But its turn came eventually, and in 1921 the Min of Ed of the Agrarian Union decreed the following: (1) The letter double /e/ was abolished, and replaced by _e_ or _ja_, depending on its pronunciation. (2) The letter big er was abolished, and replaced by wide /@/ where it was pronounced and by nothing where it was not. (3) The letter wee er was abolished, and replaced by _j_ where it was a palatalisation marker, by _ja_ where it stood for /j@/ (in the full form of the definite article), and by nothing where it was not pronounced. So there were 29 letters in the reformed alphabet. * * * The rule of the Agrarian Union was appreciated by most layers of society -- but not by all. The fat cats decided they would have no more of it, and on 9 June 1923 a fascist coup took place. The PM and leader of the AU was assaulted by a gang of assassins, stabbed with knives and hewed at with axes and then left to bleed to death. Reaction reigned throughout the country. Having seized the power, the extreme Right set on undoing the AU's reforms -- including their orthography. The 1899 standard was brought back and made official again, with one change: Double /e/ was now only to be written for /e/ if another form of the same word or a cognate word had /ja/, and vice versa. This made things easier, but not by much. Wide /@/ and the two ers, schwaful or mute, were restored to their former glory. * * * And so it was until 1945, a year after the Patriotic Front took the power. That was a time of great changes in all walks of life, and orthography not the last; and the reform that was decided upon, the most recent one, included the following: (1) The letter double /e/ was abolished again -- this time for good -- and replaced by _e_ or _ja_, depending on its pronunciation. (2) The letter wide /@/ was abolished, and replaced by big er; except in the form /s@/ `(they) are', which was now to be written _sa_. (The writing of word-final er was generally avoided, as people had been used not to pronounce it at all there; where there was no way around it, as in Turkish loans, it was recommended that a stress mark should be written over the er.) (3) The mute word-final big er was abolished (as in 1921). (4) The mute word-final wee er was likewise abolished, and the one in the article, read as /;@/, was replaced by _ja_ (as in 1921); but it was kept as a marker of palatalisation before /o/ (and occasionally /e/). * * * And so the tale is now told. Created and maintained by Ivan A Derzhanski. Last modified: 17 January 2015.