Gustek

<gustek@riseup.net>
PGP Key
My code

Welcome.

This page shouldn't even exist. I initially intended to publish a GPLv3 translation on this site, but I thought it would be weird to only publish a single, un-introduced file.

Anyway, I'm Gustek, a 2nd year university student and I live in France. I mostly program in Rust, C and Common Lisp but I enjoy some more functional stuff as well.

I enjoy learning about languages, I speak French, Polish, English and Spanish.

I wrote this page using a single 68KiB Lisp expression because it's fun, I guess. You can find it here if you want (it's Beerware so feel free to do whatever you want with it). It was written for SBCL though thus you might need to modify it a bit, at least the shebang, to be able to run it with your implementation.
Please note that this site imports a font from the Google API to render Arabic text. If you don't want to have anything to do with them, feel free to block Google and download the Mirza font family in order to render this page properly.


My personal romanisation for Arabic — 14.VIII.2024

As consonants, ي, و, ه, ن, م, ل, ك, ق, ف, غ, ش, س, ز, ر, ذ, د, خ, ج, ث, ت, ب are simply rendered as b, t, th, j, kh, d, dh, r, z, s, sh, gh, f, q, k, l, m, n, h, w, y.

As vowels, ي, و, ى/آ/ا are romanised as ā, ū, ī or â, û, î. The short vowels ـُ, ـِ, ـَ are rendered as a, i, u. The tanwīn vowels ـٌ, ـٍ, ـً are never written.

ء, whether on top of a long vowel or not, is transliterated as ʔ, ʾ, `, '. Similarly, ع is transliterated as ʕ, ʿ, ´. If a vowel follows the ع or ء after the definite marker ال, and the ʔ/ʕ form is not used, the vowel shall be capitalised. For instance, العقيدة could be written as both al-ʿAqīdah and al-ʕaqīdah.

ة is rendered as ah . Tanwīn vowels are never written after ة.

The pharyngeal(ised) consonants ظ, ط, ض, ص, ح can be written either ḥ, ṣ, ḍ, ṭ, ẓ or ẖ, s̱, ḏ, ṯ, ẕ.

Before shamsiyyah letters, ال is not written as al, rather the following consonant is doubled. For instance, الدين is written ad-Dīn. Monosyllabic prepositions and proclitics are linked to the word by an hyphen. When these precede a definite word, the a of ال is dropped. For instance, في الدين is written fī-d-Dīn.

The grammatical short vowel endings are never written, thus ال is never rendered as ul or il.

Here are a few examples:


Languages I speak, learn and want to learn (in order) with resources — 04.I.2024

  1. French
  2. English
  3. Spanish
  4. Polish
  5. Russian
  6. Persian
  7. Classical Arabic
  8. German

Kakoune syntax highlighting and indentation suck — 31.VII.2023

Default indentation is super-rigid and it manages to get math mode wrong every single time when editing LaTeX.


Why the Kazakh alphabets suck, and how to improve them — 23.VII.2023

Kazakh language is mainly written in two alphabets: Cyrillic (majority) and Latin (official). They are both bad, but for different reasons.

Let's start with the Cyrillic alphabet. It goes like this:

А /ɑ̝/ Ә /æ̝/ Б /b/ Г /g/ Ғ /ʁ/ Д /d/ Е /je̘/ Ж /ʑ/ З /z/ И /əj/ Й /j/
К /k/ Қ /q/ Л /l/ М /m/ Н /n/ Ң /ŋ/ О /o̞/ Ѳ /ɵ/ П /p/ Р /ɾ/ С /s/
Т /t/ У /ʊw/ /w/ Ұ /o̙/ Ү /ʉ/ Х /χ/ Ш /ɕ/ Ы /ə/ І /ɪ̞/ Э /e/

I removed the letters found only in loanwords because it's not important here. I put the vowels in bold typeface because they are the problem here.

First of all, as we can see, the /w/ and /ʊw/ sounds are represented by the same letter. This is problematic, for obvious reasons. The improvement idea is, as the /v/ sound is only found in loanwords, to use the В letter to represent /w/, as it is done in Kalmyk and Mongolian.

The next problem are the Ә and Ы letters. First, having a letter visually identical to the schwa, not represent the schwa /ə/ sound, is at best asinine. In other Turkic (and also in Mongolic) languages using the Cyrillic alphabet, ы usually represents /ɯ/, and it is illogical to have a yer ⟨ы⟩, visually containing the letter ⟨i⟩, representing a sound far apart from the /i/-ish sounds. The improvement idea is to write the /æ/ sound Ӕ (as done in Ossetian), and the /ə/ sound Ә, as simple as that.

Next is Ұ. Although it is not a major problem, it feels a bit un-natural to have a very у-looking sound representing not a /u/-ish nor /y/-ish letter. The improvement idea is, once again, as the /ju/ sound only exists in loanwords, to use the Ю letter to write the /o̙/ sound.

Last but not least: И and І. It simply makes no sense to write /əj/ with a letter representing /i/ everywhere else. Thus, the improvement idea is to simply write it ӘЙ. Then І is not a problem in itself, as there is precedent (notably in Ruthenian) for writing the /ɪ̞/ sound like this, although it may be simpler to write it И.

Now, the Latin alphabet (things get really worse).

A /ɑ̝/ Ä /æ/ B /b/ D /d/ E /e/ /je/ G /g/ Ğ /ʁ/ H /χ/ I /ɪ̞/ İ /j/ /əj/
J /ʑ/ K /k/ L /l/ M /m/ N /n/ Ñ /ŋ/ O /o/ Ö /ɵ/ P /p/ Q /q/
R /ɾ/ S /s/ Ş /ɕ/ T /t/ U /ʊw/ /w/ Ū /o̙/ Ü /ʉ/ Y /ə/ Z /z/

The processus of latinization hoped to be completed by 2025 by the Kazakh government is but an attempt to break away and to destroy the memoryof the Soviet period and of its historical ties to Russia, while showing a desperate desire to befriend the West. This is of the uttermoststupidity and ungratefulness. But we're not here to discuss politics, thus I'll simply show why their Latin alphabet is pure trash.

First and maybe the biggest of problems, ⟨ñ⟩. It's also used by the ALA-LC (it's made by Americans, thus it's not a surprise to find such stupidities) and the Common Turkic Alphabet project, serving the same role as the latinization of Kazakh. Anyways, in its original language (Spanish) and in more than 16 languages, it represents /ɲ/. Its use for representing /ŋ/ simply makes no sense at all. Simple solution: just use the IPA symbol ⟨Ŋ⟩ or latinize the Cyrillic innovation (which is to me more visually pleasant) ⟨Ꞑ⟩.

Now, Ū. It's obviously a simple latinization of the Cyrillic letter, and it doesn't make more sense that it does in Cyrillic. Thus, the solution would be more or less the same: in this case, simply use Ō, it's visually more straightforward.

For U, the solution is identical: move the /w/ to W, and only keep one sound per letter.

Then, İ, I and Y. The desire to resemble Turkish is obvious, and it shouldn't be blamed, Kazakh is a Turkic language after all. But representing both /əj/ and /j/ with the same letter is more than simian. Similarly, representing a neighbourg of /i/ with I instead of İ is actually the contrary of what is done in the Turkish language. My solution here is to represent /j/ with Y, /ə/ with I, /ɪ̞/ with İ and /əj/ with IY. As simple as that.

Next is E. It has been taken as phonetically identical to the similarly looking Cyrillic letter, and the Э has been merged in it. Once again, it does nothing but confuse the language learner, thus it should simply represent /e/ with E, and /je̘/ with YE.

Lastly, Ğ. Once again, its use arises from the desire of resembling Turkish. But it makes little sense here. Although it can be deduced just by looking at it if you know Kazakh phonology, it can be confusing and accidentaly read like in Turkish. The solution here would be to use either Ģ or Ġ, the latter already being used for representing a closely neighbouring sound in Arabic.

To summarize, here are the two alphabets, with my modifications added:

А Ӕ Б В Г Ғ Д Е Ж З И Й К Қ Л М Н Ң О Ѳ П Р С Т У Ү Х Ш Ә ӘЙ Э Ю
A Ä B D E Ə G Ģ H I İ J K L M N Ꞑ O Ō Ö P Q R S Ş T U Ü W Y YƏ Z


About the Polish Cyrillic alphabet — 26.VI.2023

The Cyrillic orthography I drafted for the Polish language this year is exceptionally bad, for various reasons:

  1. Constructing ⟨ć⟩, ⟨l⟩, ⟨rz⟩, etc. through other means than soft consonants and the soft sign anihilates all etymology;
  2. The use of Serbian, Montenegrin and Tatar letters is simply not adapted for a Western Slavic language such as Polish;
  3. Accented letters do not exist as a single glyph in most if not all encodings;
  4. Some letters used here are actually opposite to what they represent (such as ⟨љ⟩ for ⟨ł⟩)

For these reasons, I came up with another proposal for a Cyrillic writing system for Polish, available here


Why LATEX sucks — 26.V.2023

It sucks mainly for two reasons: the first one is the extreme difficulty that comes with trying to use a different writing system than Latin or even some special diacritics, let alone actually mixing different writing systems. If anyone knows how to do this only loading packages (I can accept redefining one command or two, but not writing 20 lines of code for each writing system I wanna load), please send me an email, I really need this. The next problem, sadly the biggest one and the most impossible to fix, is error reporting. If you haven't seen the error fifty times yet, you will struggle to find out where it happened and why it happened. It's still the greatest typesetting tool for a lot of reasons though. Use LATEX.


The Pillars of Lisp — העמודים הלסוף‎

תורה לסוף

— סנהדרין 72a

כָּךְ הָיָה הַקָּדוֹשׁ בָּרוּךְ הוּא מַבִּיט בַּתּוֹרָה וּבוֹרֵא אֶת הָעוֹלָם, וְהַתּוֹרָה אָמְרָה בְּרֵאשִׁית בָּרָא אֱלֹהִים. וְאֵין רֵאשִׁית אֶלָּא תּוֹרָה, הֵיאַךְ מָה דְּאַתְּ אָמַר (משלי ח, כב): ה' קָנָנִי רֵאשִׁית דַּרְכּוֹ.

— בּראשׁית רבּה, א:א

אז‎, ברא אלוהים את הע‎לם באמצעות Lisp.

source

The Lisp programming language has 4 different pillars. Lacking even a single one of them disqualifies a language from being considered as a Lisp.

  1. READ, EVAL, PRINT and LOOP. Those do not need to form a REPL as we intend it nowadays. Rather, they must be present and available to the user for him to be able to write a complete REPL using a code of the following form:

    (LOOP
      (PRINT
       (EVAL
        (READ))))
    
    The READ and PRINT functions must also satisfy the following equality: id = read ∘ print

  2. Homoiconicity, from Greek ομος and εικων, literally “same representation”. It means that source code written in the language can be manipulated in the language itself as primitive data. Even if not a pillar by itself, the idea of macros naturally arises from this concept, as homoiconicity enables powerful and complex macro-systems to function. What also arises from the concept of representation of the code as primitive data is direct self-evaluation, most notably in Lisp through the Meta-Circular Evaluator, using only seven primitive operators.
  3. No non-atom except cons cells. As many non-algebraic types, such as symbols, numbers, characters, Unicode characters, pointers, etc. as wanted may exist, but no algebraic type, meaning a type formed by combination of other types, is allowed to be primitively present in the language, except the linked list, or cons cell. Thus, Clojure is entirely excluded from the category of Lisps, given that its Java-based types contain a lot of those.
  4. MCE. The language must expose the necessary primitives to implement a Meta-Circular Evaluator using the least number of operators possible.

Arabic and Hebrew Orthographies for Lojban

Arabic — rablermorna-pa — رابلهرمۆرنا-پا

An Arabic orthography already exists for Lojban, called rablermorna, but I think it has multiple problems, making it difficult to use and largely suboptimal.
I thus designed my own Arabic orthography for Lojban: rablermorna-pa.

Latin letter رابلهرمۆرنا-پا
p پ
t ت ط
k ک ق
f ف
s س ص
c ش
b ب
d د ض
g گ
v ڤ
z ز ظ
j ج ژ
m م
l ل
n ب
r ر غ
x خ
' ح
. ء
a ا
.a أ
e ه
i ي
.i ئ
o ۆ
u و
.u ؤ
y ى

This system only uses long vowels, thus making it easily readable and writeable. It also allows to represent diphthongs the same way as they are represented with the Latin orthography.
Some letters have multiple equivalents. This is to allow visual diversity and avoid the unlinking of the first consonant, for instance when writing words starting with a ⟨d⟩: داشرو vs. ضاشرو .

Two vowels are imported from Kurdish: [ɔ] and [ε]. I've chosen to use the “ha” to represent the latter as a linguistical precedent exists and the “ḥa” is phonetically close the the sound of ⟨'⟩ ([ħ] vs. [h]) and unused.

All consonants come from Standard Arabic except one which comes from vernacular Maghrebi Arabic, namely the “va” and three coming from Standard Persian, namely “pe”, “ge” and “že”. The latter is not strictly necessary, but it allows to illustrate a clear separation, for instance between the different rafsi of a lujvo.

As in rablermorna it is possible to eliminate vowels if the word contains more than one. For instance, the gismu cmene can be written either as شمهنه  or شمن .

For example, the following Lojban text

pu ze'u se djuno lei ze'u renvi gi'e se jijnu ca le temci be le nu le dzena ca jmive vau fa ledu'u le bradi be le se nei ba klama la tlunranan.
renders as
پو زهحو سه صجونۆ لهي زهحو رهنڤي گيحه سه جيجنو شا له تهمشي به له نو له ضزهنا شا جميڤه ڤاو فا لهضوحو له برادي به له سه نهي با كلاما لا تلونانانء

Hebrew — xeblermorna — חֵבּלֵרמֹרןַ‎

This Hebrew orthograhy for Lojban cannot simply be derived from the Arabic orthography, most notably regarding vowels: whereas Standard Hebrew is richer than Arabic regarding consonants and vowels, the lack of diversity of languages using the Hebrew Alphabet prevents the existence of non-native long vowels such as the Kurdish [ɔ] mentionned earlier. Thus, the orthography is required to rely on diacritics, called nīqqūd to represent vowels.

Latin letter חֵבּלֵרמֹרןַ‎
p פּ
t תּ‎ *ת
k כּ ק
f פ
s שׂ ס צ *ץ‎
c שׁ‎
b בּ
d דּ‎ *ד‎
g גּ‎
v ב
z ז‎
j ג‎
m מ *ם
l ל
n נ *ן‎
r ר
x ח *כ
' ה
. א ע‎

In this orthography also, some letters have multiple equivalents, although with one small change: all forms followed by a star are called sūfīt and can only be used at the end of a word.

As in rablermorna-pa, diphthongs are represented using long vowels, namely “vav” ⟨ו⟩ for u* diphthongs and “yod” ⟨י⟩ for i* diphthongs.

Vowels, except for ⟨y⟩ are directly taken from Hebrew, according to the following table (vowels are here placed on an “aleph”):

Latin vowel חֵבּלֵרמֹרןַ‎ כַּרנץ‎ַ‎ַ
a אַ
e אֵ
i אִ
o אֹ
u אֻ
y אָ

As with the Arabic orthography, it is possible to eliminate vowel signs if the word contains more than one consonant. For instance, the lujvo prigau can be written either as פּרִגּ‎ַו or פּרגּ‎ו .

For example, the following Lojban text

pu ze'u se djuno lei ze'u renvi gi'e se jijnu ca le temci be le nu le dzena ca jmive vaufa ledu'u le bradi be le se nei ba klama la tlunranan.
renders as
פֻּ‎ ז‎ֵהֻ‎ שֵׂ דּ‎ג‎ֻ‎ן‎ֹ לֵי ז‎ֵהֻ‎ רֵנבִ גּ‎ִהֵ שֵׂ ג‎ִג‎נֻ‎ שׁ‎ַ לֵ תּ‎ֵמשׁ‎ִ בֵּ לֵ נֻ‎ לֵ דּ‎ז‎ֵנַ שׁ‎ַ ג‎מִבֵ בַופַ לֵדּ‎ֻ‎הֻ‎ לֵ בּרַדּ‎ִ בֵּ לֵ שֵׂ נֵי בַּ כּלַמַ לַ תּ‎לֻ‎נרַנַנא


A proposal for a Polish cyrillic alphabet

Here is my proposal for adapting the cyrillic alphabet to suit the Polish language. Don't use it, it's terrible, but everybody makes mistakes, right?
I wanted to keep most of the differences that exist in the current alphabet, even for likewise-sounded letters.
Following this principle, I tried to keep some letters that could have been omitted without impacting readability too much.
This can lead to the illogical situation of the cyrillic text being the same length as the latin one, the shorthands offered by automatically managing palatalisation being left unused.
I don't like digraphs, thus I used some ligatures from the Serbian cyrillic alphabet, but due to the lack of ligatures for the [ɕ] and [ʑ] sounds, I needed to pick the letters from somewhere else, but unfortunately, the letter corresponding to the [ʑ] sound is not present in any Slavic-language alphabet, thus I decided to pick it from the Tatar alphabet instead.
The overall design is influenced by the Russian, Belarusian and Serbian cyrillic alphabets.

Polish letter(s) Cyrillic equivalent(s) Notes
A a А a
Ą ą Ѫ ѫ Stole that from Common Slavonic.
B b Б б
C c Ц ц
Ć ć Ћ ћ
Cz cz Ч ч
Ci ci Ћи ћи Note that if the „ci” is followed by a vowel, the soft version of the latter shall be used, e.g. „Ciebie”: « Ћебе ».This applies to all consonants (although the special ones mentionned in this table shall be softened using a ь or a ligature).
D d Д д
Dz dz Ѕ ѕ
Dź dź Ђ ђ
Dż dż Џ џ
Dzi dzi Ђи ђи Ligatures spares us the hideous « Дзьи ».
E e Э э
Ę ę Ѧ ѧ See ‚ą’.
F f Ф ф
G g Г г
H h / Ch ch Х х
I i И и
J j Й й
K k К к
L l Л л
Ł ł Љ љ I first wanted to use the Belarusian « ў », but it would have lost etymology, thus I decided to use the serbian « љ », although it is originally designed for the „lj” sound.
M m М м
N n Н н
Ń ń Њ њ
Ni ni Њи њи
O o О о
Ó ó О́ о́ For the sake of etymology. It was either this or 1) an also ugly « у » with an acute accent or 2) losing etymology.
P p П п
R r Р р
Rz rz Р̌ р̌ This sucks. I use it to preserve etymology, but it sucks.
S s С с
Ś ś Щ щ Decided to reuse the letter making the same sound in Russian.
Sz sz Ш ш
T t Т т
U u У у
W w В в
Y y Ы ы
Z z З з
Ź ź Җ җ Here comes the Tatar alphabet. It looks kind of symetrical to the Russian Щ thus I like it.
Ż ż Ж ж
Ja ja Я я
Ją ją Ѭ ѭ See also ‚ą’ and ‚ę’.
Je je Е е
Ję ję Ѩ ѩ See also ‚ą’, ‚ę’ and ‚ją’.
Ji ji / Jy jy І і See below.
Jo jo Ё ё
Jó jó / Ju ju Ю ю Never seen a „jó” in my life, thus I assume that it does not exist and mix it with the „ju”.

Here is the alphabet in traditional Eastern-Slavic order:

А Б В Г Д Ѕ Ђ Џ Е Ё Ж Җ З И Й К Л Љ М Н Њ О О́ П Р Р̌ С Т У Ф Х Ц Ћ Ч Ш Щ Ы Ь Э Ю Я І Ѫ Ѧ Ѭ Ѩ

This leaves us with 46 letters (including ligatures), to 35 letters for the latin-ish alphabet, although the latter uses a lot of digraphs.

Following this line comes an example of the script in action on the beginning of Le Petit Prince by Antoine de Saint-Exupéry:

Mały Książę

Antoine de Saint-Exupéry

Мaљы Кщѭжѧ

Антљaн ды Сaнкт-Экзупэри

Kiedy miałem sześć lat, zobaczyłem pewnego razu wspaniały obrazek w ksziążce o dżungli
zatytułowanej Historie prawdziwe. Przedstawiał węża boa połykającego Iwa. Oto kopia
tego rysunku:
[rysunek]

W książce tej napisano: „Węże boa połykają zdobycz w całości, bez przeżuwania.
Potem nie mogą się ruszyć i zapadają w półroczny sen, w czasie którego trawią”.
Odtąd sporo rozmyślałem o przygodach w dżungli, aż w końcu udało mi się nakreślić
kredką mój pierwszy rysunek. To był mój rysunek numer 1. Wyglądał tak:
[rysunek]

 

Кеды мяљэм шэщћ лат, зобачыљэм пэвнэо разu вспаниаљы образэк в кщѭжцэ о џунгли
затытуљованэй Хисторе правђивэ. Пр̌эдставяљ вѧжа боа пољыкаѭцэго Ива. Ото копя
тэго рысунку:
[рысунэк]

В кщѭжцэ тэй написано: « Вѧжэ боа пољыкаѭ здобыч в цаљощћи, бэз пр̌эжувањя.
Потэм ње мoгѫ щѩ рушыћ и западаѭ в pо́љрочны сэн, в чаще кто́рэго травѭ. »
Одтѫд спoрo розмыщлаљэм о пр̌ыгодах в џунгли, аж в коњцу удаљо ми щѩ накрэщлић
крэдkѫ мо́й первшы рысунэк. То быљ мо́й рысунэк нумэр 1. Выглѫдаљ так :
[рысунэк]


The (Internet) Voskhod Protocol — 22.VIII.2022

The (Internet) Voskhod Protocol is a small Internet protocol designed for fast and simple document exchange on a small to medium size network.
It is roughly similar to Gopher, while trying to fix its flaws. You can find its specification here


How I license my programs — 28.IV.2022


GPLv3 translation - 25.IV.2022

Here 's my attempt to translate the GNU General Public License version 3 into French. It probably contains a lot of typos and incomprehensible parts, if you spot some, please notify them to me by e-mail.

By the way, quick note on the GPL, that I cannot include in the translation: please don't add the “or any later version” clause to your copyright headers.
Please don't accept automatically a license you didn't read. Even if GNU says that

Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns

Those details can cause problems, like the tivoization clause caused some to quite a lot of people while updating from version 2 to version 3.
So please just say “GPL-3.0-only”.