Air Date:
Latest update:
I noticed this recently, though it started happening about a year ago.
On some websites (archive.org's bookreader), a normal <img> tag
suddenly began to look like some insane MS Internet Explorer extension
from 1998:
<img src="blob:https://example.com/b501e863-fe43-4b63-ae5d-dac14cac097e">
The web page that contains it renders the image fine, but when a
fairly naïve user posts that link into a chat, it results in
nothing: the blob referenced by <img> exists only in the memory of a
specific browser instance, & the server will return 404 for
any https://example.com/UUID.
Why do they do this? Every n years some people get scared of
hotlinking (bastards keep stealing our traffic), AI luddites try to
ruin business of evil corporations, & fans of toy-level DRM amuse
themselves with a new scheme.
If you look at the fetch-requests of such a page, you'll see
resources that look like images but actually aren't:
$ url='https://ia800206.us.archive.org/🙈.jpg'
$ curl -sI "$url" | grep -e type -e length -e obfuscate
content-type: image/jpeg
content-length: 267470
x-obfuscate: 1|uoEV6/PZOWtGhOZdVM898w==
$ curl -s "$url" | head -c25 | file -
/dev/stdin: data
Such a .jpg is encrypted. Part of the key is in the X-Obfuscate
header, but neither a random AI scraper nor any social network knows
about this. The cipher is also not disclosed, & every website may use
any scheme it wants: following any recommendations would defeat the
entire purpose of obfuscation.
The image-rendering algorithm then becomes:
download the encrypted file;
decrypt it using the key from the corresponding header & push the
result into a blob;
create a link to the blob using URL.createObjectURL function;
inject into the DOM an img element whose src attribute is equal
to the newly created link.
We can increase entropy further by writing our own custom element:
<img-blob alt="a fluffy cat" src="cat.bin"></img-blob>
that will do all of the above on its own. (I terser'ed the source
code of the example for I absolutely don't want you to use it in
anything serious: the entire approach is extremely user-hostile &
anti-web.)
For efficiency, archive.org AES-CTR-encrypts only the first 1024 bytes
of the image. Browsers know about AES but strictly require a secure
context that can be annoying during testing; hence, for mickey mouse
DRM we can simply use XOR-encryption.
The X-Obfuscate header itself can be obfuscated even more, e.g.:
x-obfuscate: rlW2MKWmnJ9hVwbtZFjtVzgyrFV6VPVkZwZ0AFW9Pt==
looks like a base64 string, but:
$ echo rlW2MKWmnJ9hVwbtZFjtVzgyrFV6VPVkZwZ0AFW9Pt== | base64 -d | xxd
base64: invalid input
00000000: ae55 b630 a5a6 9c9f 6157 06ed 6458 ed57 .U.0....aW..dX.W
00000010: 3832 ac55 7a54 f564 6706 7400 55bd 3e 82.UzT.dg.t.U.>
I checked if DeepSeek could figure it out: it spent 9 minutes & left
2 villages in Zhejiang province without water, was several times
very close to the target, but ultimately failed.
The string had been post-processed with rot13:
$ alias rot13="tr 'A-Za-z' 'N-ZA-Mn-za-m'"
$ echo rlW2MKWmnJ9hVwbtZFjtVzgyrFV6VPVkZwZ0AFW9Pt== | rot13 | base64 -d
{"version": 1, "key": "12345"}
As homework, you can add an equivalent of loading="lazy" to the
custom element using the Intersection Observer API.
Tags: ойті
Authors: ag
Air Date:
Latest update:
Pandoc can prepare LaTeX math for MathJax via its eponymous
--mathjax option. It wraps formulas in <span class="math">
elements and injects a <script> tag that points to
cdn.jsdelivr.net, which means rendering won't work offline or in
case of the 3rd-party server failure. You can mitigate this by
providing your own copy of the MathJax library, but the mechanism
still fails when the target device doesn't support JavaScript (e.g.,
many epub readers).
At the same time, practically all browsers support MathML. Use it
(pandoc's --mathml option), if you care only about the information
superhighway: your formulas will look good on every modern device and
scale delightfully. Otherwise, SVGs are the only truly portable
option.
Now, how can we transform the html produced by
$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
pandoc -s -f markdown --mathjax
into a fully standalone document where the formula gets converted into
SVG nodes?
- Use an html parser like Nokogiri, and replace each
<span class="math"> node with an image. There are multiple ways to
convert a TeX-looking string to an SVG: using MathJax itself (which
provides a corresponding CLI example), or by doing it in a
'classical' fashion with pdflatex. (You can read more about this
method in A practical guide to EPUB, chapters 3.4 and 4.6.)
- Alternatively, load the page into a headless browser, inject
MathJax scripts, and serialise the modified DOM back to html.
I tried the 2nd approach in 2016 with the now-defunct phantomjs. It
worked, but debugging was far from enjoyable due to the strangest bugs
in phantomjs. I can still run the old code, but it depends on an
ancient version of the MathJax library that, for obvious reasons,
isn't easily upgradable within the phantomjs pre-es6 environment.
Nowadays, Puppeteer would certainly do, but for this kind of task
I prefer something more lightweight.
There's also jsdom. Back in 2016, I tried it as well, but it was much
slower than running phantomjs. Recently, I gave jsdom another try and
was pleasantly surprised. I'm not sure what exactly tipped the scales:
computers, v8, or jsdom itself, but it no longer feels slow in
combination with MathJax.
$ wc -l *js *conf.json
24 loader.js
105 mathjax-embed.js
12 mathjax.conf.json
141 total
Roughly 50% of the code is nodejs infrastructure junk (including CL
parsing), the rest is a MathJax config and jsdom interactions:
let dom = new JSDOM(html, {
url: `file://${base}/`,
runScripts: /* very */ 'dangerously',
resources: new MyResourceLoader(), // block ext. absolute urls
})
dom.window.my_exit = function() {
cleanup(dom.window.document) // remove mathjax <script> tags
console.log(dom.serialize())
}
dom.window.my_mathjax_conf = mathjax_conf // user-provided
let script = new Script(read(`${import.meta.dirname}/loader.js`))
let vmContext = dom.getInternalVMContext()
script.runInContext(vmContext)
The most annoying step here is setting url property that jsdom uses
to resolve paths to relative resources. my_exit() function is called
by MathJax when its job is supposedly finished. loader.js script is
executed in the context of the loaded html:
window.MathJax = {
output: { fontPath: '@mathjax/%%FONT%%-font' },
startup: {
ready() {
MathJax.startup.defaultReady()
MathJax.startup.promise.then(window.my_exit)
}
}
}
Object.assign(window.MathJax, window.my_mathjax_conf)
function main() {
var script = document.createElement('script')
script.src = 'mathjax/startup.js'
document.head.appendChild(script)
}
document.addEventListener('DOMContentLoaded', main)
The full source is on Github.
Intended use is as follows:
$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
pandoc -s -f markdown --mathjax |
mathjax-embed > 1.html
The resulting html doesn't use JavaScript and doesn't fetch any
external MathJax resources. mathjax-embed script itself always works
offline.
Tags: ойті
Authors: ag
Air Date:
Latest update:
At the time of writing, the most recent Adobe Reader 25.x.y.z 64-bit
installer for Windows 11 weights 687,230,424 bytes. After
installation, the program includes 'AI' (of course), an auto-updater,
sprinkled ads for Acrobat online services everywhere, and 2 GUIs:
'new' and 'old'.
For comparison, the size of SumatraPDF-3.5.2 installer is 8,246,744
bytes. It has no 'AI', no auto-updater (though it can check for new
versions, which I find unnecessary, for anyone sane would install it
via scoop anyway), and no ads for 'cloud storage'.
The following chart shows how the Adobe Reader installer has grown in
size over the years. When possible, 64-bit versions of installers were
used.
Next Day Update:
Best comment on Hacker News: "Looks like a chart crime scene."
Alright, here's your linear graph, along with the
source from which both graphs were generated. All
point labels are version numbers.
Tags: ойті
Authors: ag
Air Date:
Latest update:
Peter Weinberger (the "w" in awk), while working at Bell Labs,
wrote an experimental
implementation of a network file system. Included with Research Unix
v8 (Feb 1985, licensed strictly for educational use), it allowed to
share / (yes) with other machines running v8 by specifying a mapping
between a local uid/gui and the desired view from the LAN.
Weinberger described peculiarities of his netfs as
"If A mounted B's file system somewhere, and B mounted A's, then the
directory tree was infinite. That's mathematics, not a bug."
His /usr/src/netfs/TODO contained an existential question:
'why does it get out of synch?'
The connection of this netfs and Sun's NFS is murky.
Steve Johnson:
"I remember Bill Joy visiting Bell Labs and getting a very complete
demo of RFS and being very impressed. Within a year, Sun announced
NFS."
Unix System V SVR3, released by AT&T in 1987, included a different
version of netfs, which they officially began calling RFS. Appearing
18 months after Sun announced NFS, it briefly attempted to compete,
but failed on 2 fronts simultaneously: ⓐ big vendors (Dec, IBM, HP)
disliked its licensing terms, and ⓑ the protocol's brittleness
discouraged ports to non-Unix systems. NFS won, becoming widely
used--even by NeXTSTEP.
Lyndon Nerenberg:
'We ran RFS on a "cluster" of four 3B2s [AT&T microcomputers], and
while it worked, to varying degrees, the statefulness of the
protocol inevitably led to the whole thing locking up, requiring a
reboot of all four machines to recover.'
Tags: ойті
Authors: ag