Alexander Gromnitsky's Blog

In search of a decent offline Android Polish dictionary

Latest update:

Dime a dozen Play Store apps

As always, none of them are any good, for no one who wrote those apps is using them. Most of them are adware &/or junkware that crashes on random input.

One curious app is PWN-Oxford Dictionary, with, what it seems, a bespoke dictionary from a legit Polish book publisher Wydawnictwo Naukowe PWN. The Play Store says it's "not available for your device", though. At first, I thought the app has never been updated for Android 10+, but after opening the listing on a tablet with Android 8.x, I got a message saying the app isn't unavailable in my region (Ukraine). Great.

A generic dictionary app + a separate polish dictionary

Due to the peculiarities of Polish orthography, such an app should support diacritic-insensitive lookups. I settled on aard2.

The next step was to find a dictionary in the slob format.

sjp.pl + wiktionary

Some Good Samaritan has webscrapped sjp.pl (słownik języka polskiego, a crowdsourced dictionary) & augmented the definitions from Wikisłownika (pl.wiktionary.org).

The dictionary comes pre-formatted for Kindle, but there is also a .txt version in a simple word\tdefinition form. I grabbed SJP_202108161949.txt.zip & fed it to pyglossary, filtering out entries without useful content:

$ unzip -p SJP_202108161949.txt.zip | sed -E 's/\t[^<]+(<h1)/\t\1/' \
| grep -v '</h1>$' \
| grep -v '</h1><p><b<p class="s">Wikipedia</p>$' > sjp.txt
$ pyglossary/main.py sjp.txt sjp.slob

$ du *slob
27820K sjp.slob

PWN

The same company that made the geolocked app above, was publishing desktop dictionaries for Windows in the 2000s. Someone has uploaded Słownik języka polskiego PWN as an .iso on archive.org. I'm not sure it fells under a category of abandonware, thus make of it what you will. Out of curiosity I tried to run it on w11 & then on a w7 vm, but the program's installer wouldn't even start. It did successfully run on a w2k vm.

Anyway, this desktop program contains a file named slo.win (59M). We can do a 2-step conversion: .win → .txt → slob, where .txt means a so called tabfile format, using the parlance of pyglossary.

Prerequisites:

  1. # dnf install unshield bsdtar
  2. Clone pwn2dict & pyglossary repos.

Save this Makefile

i := Slownik.iso
pwn2dict := ~/Downloads/pwn2dict/pwn2dict.py
pyglossary := ~/Downloads/pyglossary/main.py
out := _out
cache := $(out)/cache

$(out)/pwn_słownik.slob: $(cache)/pwn_słownik.txt
$(pyglossary) $< $@

$(cache)/%.txt: $(cache)/Tekst/Data/slo.win
$(pwn2dict) -t $< $@

$(cache)/%.win: $(cache)/setup/data1.cab $(cache)/setup/data1.hdr
unshield -d $(cache) -g Tekst x $<

$(cache)/setup/%: $(i)
@mkdir -p $(dir $@)
bsdtar -xf $< -C $(cache) setup/$*

in the same directory as Slownik.iso, correct the values of pwn2dict & pyglossary variables if nessesary, then run make. The result

$ du _out/*slob
9892K _out/pwn_słownik.slob

is 6 times smaller than the original .win file.

pwnsjp

The following has nothing to do with Android, but it brought a smile to my face.

slo.win file can be reused on a regular machine with a neat little ncurses program–a viewer of various PWN dictionaries. Compile it, then run as

$ pwnsjp -f /path/to/slo.win


Tags: ойті
Authors: ag