Offline Math: Converting LaTeX to SVG with MathJax
Latest update:
Pandoc can prepare LaTeX math for MathJax via its eponymous
--mathjax
option. It wraps formulas in <span class="math">
elements and injects a <script>
tag that points to
cdn.jsdelivr.net, which means rendering won't work offline or in
case of the 3rd-party server failure. You can mitigate this by
providing your own copy of the MathJax library, but the mechanism
still fails when the target device doesn't support JavaScript (e.g.,
many epub readers).
At the same time, practically all browsers support MathML. Use it
(pandoc's --mathml
option), if you care only about the information
superhighway: your formulas will look good on every modern device and
scale delightfully. Otherwise, SVGs are the only truly portable
option.
Now, how can we transform the html produced by
$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
pandoc -s -f markdown --mathjax
into a fully standalone document where the formula gets converted into
SVG nodes?
- Use an html parser like Nokogiri, and replace each
<span class="math">
node with an image. There are multiple ways to
convert a TeX-looking string to an SVG: using MathJax itself (which
provides a corresponding CLI example), or by doing it in a
'classical' fashion with pdflatex. (You can read more about this
method in A practical guide to EPUB, chapters 3.4 and 4.6.)
- Alternatively, load the page into a headless browser, inject
MathJax scripts, and serialise the modified DOM back to html.
I tried the 2nd approach in 2016 with the now-defunct phantomjs. It
worked, but debugging was far from enjoyable due to the strangest bugs
in phantomjs. I can still run the old code, but it depends on an
ancient version of the MathJax library that, for obvious reasons,
isn't easily upgradable within the phantomjs pre-es6 environment.
Nowadays, Puppeteer would certainly do, but for this kind of task
I prefer something more lightweight.
There's also jsdom. Back in 2016, I tried it as well, but it was much
slower than running phantomjs. Recently, I gave jsdom another try and
was pleasantly surprised. I'm not sure what exactly tipped the scales:
computers, v8, or jsdom itself, but it no longer feels slow in
combination with MathJax.
$ wc -l *js *conf.json
24 loader.js
105 mathjax-embed.js
12 mathjax.conf.json
141 total
Roughly 50% of the code is nodejs infrastructure junk (including CL
parsing), the rest is a MathJax config and jsdom interactions:
let dom = new JSDOM(html, {
url: `file://${base}/`,
runScripts: /* very */ 'dangerously',
resources: new MyResourceLoader(), // block ext. absolute urls
})
dom.window.my_exit = function() {
cleanup(dom.window.document) // remove mathjax <script> tags
console.log(dom.serialize())
}
dom.window.my_mathjax_conf = mathjax_conf // user-provided
let script = new Script(read(`${import.meta.dirname}/loader.js`))
let vmContext = dom.getInternalVMContext()
script.runInContext(vmContext)
The most annoying step here is setting url
property that jsdom uses
to resolve paths to relative resources. my_exit()
function is called
by MathJax when its job is supposedly finished. loader.js
script is
executed in the context of the loaded html:
window.MathJax = {
output: { fontPath: '@mathjax/%%FONT%%-font' },
startup: {
ready() {
MathJax.startup.defaultReady()
MathJax.startup.promise.then(window.my_exit)
}
}
}
Object.assign(window.MathJax, window.my_mathjax_conf)
function main() {
var script = document.createElement('script')
script.src = 'mathjax/startup.js'
document.head.appendChild(script)
}
document.addEventListener('DOMContentLoaded', main)
The full source is on Github.
Intended use is as follows:
$ echo 'Ohm'\''s law: $I = \frac{V}{R}$.' |
pandoc -s -f markdown --mathjax |
mathjax-embed > 1.html
The resulting html doesn't use JavaScript and doesn't fetch any
external MathJax resources. mathjax-embed
script itself always works
offline.
Tags: ойті
Authors: ag