Alexander Gromnitsky's Blog

Making Debian or Fedora persistent live images

Air Date:
Latest update:

When you download a 'live' ISO, dd it to a USB drive, you notice that all your tweaks or installed packages vanish after a reboot. If you think about how most such 'live' ISOs work, it becomes apparent why:

$ parted -s Fedora-Xfce-Live-44-1.7.x86_64.iso print free | grep '^[PN ]'
Partition Table: gpt
Number  Start   End     Size    File system  Name       Flags
 1      32.8kB  2897MB  2897MB               ISO9660    hidden, msftdata
 2      2897MB  2929MB  31.5MB  fat16        Appended2  boot, esp
        2929MB  2929MB  512B    Free Space

ISO9660 is a read-only filesystem, & the fact it was written onto a writable medium is irrelevant: its fs driver contains no implementation for writing data blocks, & the Linux VFS layer immediately returns EROFS (code 30, Read-only file system) when it sees that a fs was mounted read-only.

A common workaround is to use OverlayFS; in the case of 'live' ISOs, to do an overlay with a chunk of RAM.

Obviously, you can do an overlay with a filesystem that supports write operations instead, like ext4, but inside the Live ISO there isn't one, & hence there is nothing to do an overlay with.

While you can always create an ext4 partition manually, how do you tell the 'live' OS to use it during boot? This distro corner has no standardisation whatsoever, & everybody is doing it in their own unique way. E.g., Debian & Ubuntu have diverged so much throughout the years that even the kernel parameters for their 'persistence' implementations differ. While it may seem logical to an impartial spectator to keep at least the user-facing interface the same between the distros, it's not how it is done in practice.

Ubuntu

  • Kernel parameter: "persistent".
  • An (empty) partition must have the label "casper-rw".

What is annoying is that it's surprisingly non-obvious to detect whether such a trick worked: if your partition is /dev/sda4, & Ubuntu does not show it as mounted, & /cow is roughly the size of /dev/sda4:

$ df -h | grep cow
/cow           9.8G  161M  9.2G   2% /

then persistence is on. If, on the other hand, you see this:

$ df -h | grep casper
/dev/disk/by-label/casper-rw  9.8G  161M  9.2G   2% /var/log

you most likely mistyped the word persistent.

The next issue is how to save grub parameters in the .iso. As it's absolutely useless to mount it to modify files, you can either extract everything from the .iso, edit what you want in grub.cfg, & recreate the image, or, alternatively, do a simple 12-byte to 12-byte swap:

$ export LANG=C
$ sed -i 's/quiet splash/persistent  /' xubuntu-26.04-desktop-amd64.iso

It's amusingly hacky, but works. If your replacement string is not equal in length to the pattern, you'll corrupt the ISO9660 filesystem, & grub will refuse to boot the kernel.

(See a github sample for a script that does all this; it assumes a Linux host & injects an ext4 partition into a copy of an .iso. You can always resize the partition (& its filesystem) in real time using the Disks utility that the .iso ships with.)

Debian

  • Kernel parameter: "persistence".
  • A partition must have:
    • the label "persistence";
    • a file named persistence.conf in the root of the partition with a line akin to "/ union".

Notice that it was "persistent" for Ubuntu, but it's "persistence" for Debian. Why not.

The same mechanism of rude byte swapping in the .iso applies here too:

$ export LANG=C
$ sed -i 's/splash quiet/persistence /' debian-live-13.5.0-amd64-xfce.iso

Detecting a successful overlay is easier:

$ mount | grep sda3
/dev/sda3 on /run/live/persistence/sda3 type ext4 (rw,noatime)
overlay on / type overlay (rw,noatime,lowerdir=/run/live/rootfs/filesystem.squashfs/,upperdir=/run/live/persistence/sda3/rw,workdir=/run/live/persistence/sda3/work,redirect_dir=on)

Fedora

  • Kernel parameters: "selinux=0 rd.live.overlay=LABEL=foo:/bar".
  • A partition must have:
    • a label "foo" (choose whatever you want, but it must correspond to the value in the kernel parameter);
    • a "bar" directory (again, see the kernel parameter);
    • an "ovlwork" directory (this is a hardcoded name).

To check:

$ df -h | grep sdb1
/dev/sdb1        9.8G  134M  9.1G   2% /run/initramfs/overlayfs
$ mount | grep Live
LiveOS_rootfs on / type overlay (rw,relatime,lowerdir=/run/rootfsbase,upperdir=/run/overlayfs,workdir=/run/ovlwork)
$ file /run/overlayfs
/run/overlayfs: symbolic link to /run/initramfs/overlayfs/bar

In the case of Fedora, this is all mostly useless. Its 'linux' loader command in grub.cfg menu entries contains no space to sacrifice for a different 40-byte-long string. You, of course, can delete one menu entry completely & substitute it with your own, but this would be rather fragile: if, in the next version of Fedora, the size of grub.cfg changes, your script will corrupt the underlying ISO9660 filesystem.

If the only reliable way here is to extract the rootfs from the .iso to edit it, why bother with overlays then? This is what Fedora Live mounts during boot:

$ isoinfo -i Fedora-Xfce-Live-44-1.7.x86_64.iso -Jf | grep -i liveos/
/LiveOS/squashfs.img

Despite its name, it's a 2.6GB EROFS image file (the name is a pun on a generic EROFS error code).

The image contains everything, including the kernel & initramfs. We can just create 2 image files:

  1. a FAT32 one to hold EFI/BOOT/BOOTX64.EFI, alongside the kernel & initramfs;
  2. an ext4 one with a label, say "Fedora-Live", into which we extract the contents of squashfs.img.

The ext4 partition can be of any length, & our 'live' Fedora image will have space to hold user files without any shenanigans with overlays.

After creating these 2 images, we combine them into 1 (with a GPT layout), & dd it onto a USB drive.

grub.cfg can be as short as:

set timeout=3
menuentry "Fedora Live" {
  linux /vmlinuz rd.live.image root=LABEL=Fedora-Live rw noresume
  initrd /initramfs
}

rd.live.image parameter is required for systemd to start livesys service, otherwise, no liveuser will be created.

The mechanism works for any official Fedora spin.

See another github sample for a script that does all this. For a quick test in QEMU, you'll need to specify a UEFI bios:

$ sudo ./mflip 10G Fedora-Xfce-Live-44-1.7.x86_64.iso out.img

$ alias qemu3d='qemu-kvm -machine q35 \
   -bios /usr/share/OVMF/OVMF_CODE.fd -m 4G \
   -display gtk,gl=on -smp 2 \
   -device virtio-vga-gl,hostmem=2G,blob=true,venus=true'

$ qemu3d out.img

Tags: ойті
Authors: ag

Obfuscating Image Links

Air Date:
Latest update:

I noticed this recently, though it started happening about a year ago. On some websites (archive.org's bookreader), a normal <img> tag suddenly began to look like some insane MS Internet Explorer extension from 1998:

<img src="blob:https://example.com/b501e863-fe43-4b63-ae5d-dac14cac097e">

The web page that contains it renders the image fine, but when a fairly naïve user posts that link into a chat, it results in nothing: the blob referenced by <img> exists only in the memory of a specific browser instance, & the server will return 404 for any https://example.com/UUID.

Why do they do this? Every n years some people get scared of hotlinking (bastards keep stealing our traffic), AI luddites try to ruin business of evil corporations, & fans of toy-level DRM amuse themselves with a new scheme.

If you look at the fetch-requests of such a page, you'll see resources that look like images but actually aren't:

$ url='https://ia800206.us.archive.org/🙈.jpg'

$ curl -sI "$url" | grep -e type -e length -e obfuscate
content-type: image/jpeg
content-length: 267470
x-obfuscate: 1|uoEV6/PZOWtGhOZdVM898w==

$ curl -s "$url" | head -c25 | file -
/dev/stdin: data

Such a .jpg is encrypted. Part of the key is in the X-Obfuscate header, but neither a random AI scraper nor any social network knows about this. The cipher is also not disclosed, & every website may use any scheme it wants: following any recommendations would defeat the entire purpose of obfuscation.

The image-rendering algorithm then becomes:

  1. download the encrypted file;

  2. decrypt it using the key from the corresponding header & push the result into a blob;

  3. create a link to the blob using URL.createObjectURL function;

  4. inject into the DOM an img element whose src attribute is equal to the newly created link.

We can increase entropy further by writing our own custom element:

<img-blob alt="a fluffy cat" src="cat.bin"></img-blob>

that will do all of the above on its own. (I terser'ed the source code of the example for I absolutely don't want you to use it in anything serious: the entire approach is extremely user-hostile & anti-web.)

For efficiency, archive.org AES-CTR-encrypts only the first 1024 bytes of the image. Browsers know about AES but strictly require a secure context that can be annoying during testing; hence, for mickey mouse DRM we can simply use XOR-encryption.

The X-Obfuscate header itself can be obfuscated even more, e.g.:

x-obfuscate: rlW2MKWmnJ9hVwbtZFjtVzgyrFV6VPVkZwZ0AFW9Pt==

looks like a base64 string, but:

$ echo rlW2MKWmnJ9hVwbtZFjtVzgyrFV6VPVkZwZ0AFW9Pt== | base64 -d | xxd
base64: invalid input
00000000: ae55 b630 a5a6 9c9f 6157 06ed 6458 ed57  .U.0....aW..dX.W
00000010: 3832 ac55 7a54 f564 6706 7400 55bd 3e    82.UzT.dg.t.U.>

I checked if DeepSeek could figure it out: it spent 9 minutes & left 2 villages in Zhejiang province without water, was several times very close to the target, but ultimately failed.

The string had been post-processed with rot13:

$ alias rot13="tr 'A-Za-z' 'N-ZA-Mn-za-m'"
$ echo rlW2MKWmnJ9hVwbtZFjtVzgyrFV6VPVkZwZ0AFW9Pt== | rot13 | base64 -d
{"version": 1, "key": "12345"}

As homework, you can add an equivalent of loading="lazy" to the custom element using the Intersection Observer API.


Tags: ойті
Authors: ag

Offline Math: Converting LaTeX to SVG with MathJax

Air Date:
Latest update:

Pandoc can prepare LaTeX math for MathJax via its eponymous --mathjax option. It wraps formulas in <span class="math"> elements and injects a <script> tag that points to cdn.jsdelivr.net, which means rendering won't work offline or in case of the 3rd-party server failure. You can mitigate this by providing your own copy of the MathJax library, but the mechanism still fails when the target device doesn't support JavaScript (e.g., many epub readers).

At the same time, practically all browsers support MathML. Use it (pandoc's --mathml option), if you care only about the information superhighway: your formulas will look good on every modern device and scale delightfully. Otherwise, SVGs are the only truly portable option.

Now, how can we transform the html produced by

$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
  pandoc -s -f markdown --mathjax

into a fully standalone document where the formula gets converted into SVG nodes?

  1. Use an html parser like Nokogiri, and replace each <span class="math"> node with an image. There are multiple ways to convert a TeX-looking string to an SVG: using MathJax itself (which provides a corresponding CLI example), or by doing it in a 'classical' fashion with pdflatex. (You can read more about this method in A practical guide to EPUB, chapters 3.4 and 4.6.)
  1. Alternatively, load the page into a headless browser, inject MathJax scripts, and serialise the modified DOM back to html.

I tried the 2nd approach in 2016 with the now-defunct phantomjs. It worked, but debugging was far from enjoyable due to the strangest bugs in phantomjs. I can still run the old code, but it depends on an ancient version of the MathJax library that, for obvious reasons, isn't easily upgradable within the phantomjs pre-es6 environment.

Nowadays, Puppeteer would certainly do, but for this kind of task I prefer something more lightweight.

There's also jsdom. Back in 2016, I tried it as well, but it was much slower than running phantomjs. Recently, I gave jsdom another try and was pleasantly surprised. I'm not sure what exactly tipped the scales: computers, v8, or jsdom itself, but it no longer feels slow in combination with MathJax.

$ wc -l *js *conf.json
  24 loader.js
 105 mathjax-embed.js
  12 mathjax.conf.json
 141 total

Roughly 50% of the code is nodejs infrastructure junk (including CL parsing), the rest is a MathJax config and jsdom interactions:

let dom = new JSDOM(html, {
  url: `file://${base}/`,
  runScripts: /* very */ 'dangerously',
  resources: new MyResourceLoader(), // block ext. absolute urls
})

dom.window.my_exit = function() {
  cleanup(dom.window.document) // remove mathjax <script> tags
  console.log(dom.serialize())
}

dom.window.my_mathjax_conf = mathjax_conf // user-provided

let script = new Script(read(`${import.meta.dirname}/loader.js`))
let vmContext = dom.getInternalVMContext()
script.runInContext(vmContext)

The most annoying step here is setting url property that jsdom uses to resolve paths to relative resources. my_exit() function is called by MathJax when its job is supposedly finished. loader.js script is executed in the context of the loaded html:

window.MathJax = {
  output: { fontPath: '@mathjax/%%FONT%%-font' },
  startup: {
    ready() {
      MathJax.startup.defaultReady()
      MathJax.startup.promise.then(window.my_exit)
    }
  }
}

Object.assign(window.MathJax, window.my_mathjax_conf)

function main() {
  var script = document.createElement('script')
  script.src = 'mathjax/startup.js'
  document.head.appendChild(script)
}

document.addEventListener('DOMContentLoaded', main)

The full source is on Github.

Intended use is as follows:

$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
  pandoc -s -f markdown --mathjax |
  mathjax-embed > 1.html

The resulting html doesn't use JavaScript and doesn't fetch any external MathJax resources. mathjax-embed script itself always works offline.


Tags: ойті
Authors: ag

The Size of Adobe Reader Installers Through The Years

Air Date:
Latest update:

At the time of writing, the most recent Adobe Reader 25.x.y.z 64-bit installer for Windows 11 weights 687,230,424 bytes. After installation, the program includes 'AI' (of course), an auto-updater, sprinkled ads for Acrobat online services everywhere, and 2 GUIs: 'new' and 'old'.

For comparison, the size of SumatraPDF-3.5.2 installer is 8,246,744 bytes. It has no 'AI', no auto-updater (though it can check for new versions, which I find unnecessary, for anyone sane would install it via scoop anyway), and no ads for 'cloud storage'.

The following chart shows how the Adobe Reader installer has grown in size over the years. When possible, 64-bit versions of installers were used.

adobe reader vs sumatrapdf

Next Day Update:

Best comment on Hacker News: "Looks like a chart crime scene."

Alright, here's your linear graph, along with the source from which both graphs were generated. All point labels are version numbers.

adobe reader vs sumatrapdf (linear scale)
Tags: ойті
Authors: ag