Superscripts & subscripts via a coprocess
Latest update:
How do you write 25m3? In Markdown, you'll probably end up
with
25m<sup>3</sup>
Some text editors have a special machanism for entering characters not
present in the current keyboard layout. E.g., in Emacs C-x 8 ^ 2
inserts superscript 2, but this won't work for writing a
formula like y = x2a, as C-x 8 ^ a
injects the â
character.
Unicode should've had at least a super/subscript version of Latin
characters, but that never happend. It has a subset of it, spreaded
across seemingly random sections. For instance, a subscript letter j
sits in the Latin Extended-C block between a Finno-Ugric ∃
(that remainds me of existential quantification quantifier) & a
superscript capital V.
Unicode has the Superscripts and Subscripts section, but it appears
more like an afterthought, incomplete & abandoned:
If one peruses Unicode blocks, it's possible to assemble most of the
Latin characters in super/subscript variants to write a simple script
for character substitution. Then, any decent text editor could use
such a script to replace a chunk of text with a superscript version of
it.
I also thought of a slightly more interesting feature: selecting a
chunk of text with HTML tags (like <i>y = x<sup>2a</sup></i>
) in the
text editor to transform only <sup>
or <sub>
nodes.
Script 1: supsub
It reads its input from the stdin line-by-line:
$ lsb_release -d | ./supsub sub
dₑₛ𞁞ᵣᵢₚₜᵢₒₙ: fₑdₒᵣₐ ᵣₑₗₑₐₛₑ ₃₉ ₍ₜₕᵢᵣₜᵧ ₙᵢₙₑ₎
$ lsb_release -d | ./supsub sub | ./supsub invert
ᵈ𞀵ˢ𞀿ʳⁱ𞀾ᵗⁱ𞀼ⁿ: ᶠ𞀵ᵈ𞀼ʳ𞀰 ʳ𞀵ˡ𞀵𞀰ˢ𞀵 ³⁹ ⁽ᵗʰⁱʳᵗʸ ⁿⁱⁿ𞀵⁾
$ lsb_release -d | ./supsub sub | ./supsub invert | ./supsub restore
dеsсrірtіоn: fеdоrа rеlеаsе 39 (thіrty nіnе)
(Some mobile browsers have a lot of trouble rendering even Latin
superscript/subscript characters.)
$ cat supsub
#!/usr/bin/env -S ruby --disable-gems
db = DATA.read.split(/\s+/).filter {|v| v}
$supa = db.map {|v| [v[0], v[1]] }.to_h
$sub = db.map {|v| [v[0], v[2]] }.to_h
mode = 'sup|sub|invert|restore'
abort "Usage: supsub #{mode} < file.txt" unless ARGV[0] =~ /^#{mode}$/
def tr mode, chars
case mode
when 'sup'
chars.map { |ch| $supa[ch.downcase] || ch }.join
when 'sub'
chars.map { |ch| $sub[ch.downcase] || ch }.join
when 'invert'
supa_v = $supa.invert
sub_v = $sub.invert
chars.map do |ch|
if supa_v[ch]
$sub[supa_v[ch]] || ch
elsif sub_v[ch]
$supa[sub_v[ch]] || ch
else
ch
end
end.join
else # restore
supa_v = $supa.invert
sub_v = $sub.invert
chars.map { |ch| supa_v[ch] || sub_v[ch] || ch}.join
end
end
while (line = STDIN.gets)
print tr(ARGV[0], line.chars)
end
__END__
0⁰₀ 1¹₁ 2²₂ 3³₃ 4⁴₄ 5⁵₅ 6⁶₆ 7⁷₇ 8⁸₈ 9⁹₉ +⁺₊ -⁻₋ =⁼₌ (⁽₍ )⁾₎
aᵃₐ bᵇb cᶜ𞁞 dᵈd eᵉₑ fᶠf gᵍg hʰₕ iⁱᵢ jʲⱼ kᵏₖ lˡₗ mᵐₘ nⁿₙ oᵒₒ
pᵖₚ q𐞥q rʳᵣ sˢₛ tᵗₜ uᵘᵤ vᵛᵥ wʷw xˣₓ yʸᵧ zᶻz
а𞀰ₐ б𞀱𞁒 в𞀲𞁓 г𞀳𞁔 ґґ𞁧 д𞀴𞁕 е𞀵ₑ ж𞀶𞁗 з𞀷𞁘 и𞀸𞁙 іⁱ𞁨 їїї ййй
к𞀹𞁚 л𞀺𞁛 м𞀻ₘ нᵸн о𞀼ₒ п𞀽𞁝 р𞀾ₚ с𞀿𞁞 т𞁀т у𞁁𞁟 ф𞁂𞁠 х𞁃𞁡 ц𞁄𞁢
ч𞁅𞁣 ш𞁆𞁤 щщщ ьꚝь ю𞁉ю яяя
Script 2: supsub-xml, version 1
$ echo '<sub>test</sub> <i>lol</i> <sup>haha</sup> but <sup>it works</sup>!' | ./supsub-xml
ₜₑₛₜ <i>lol</i> ʰᵃʰᵃ but ⁱᵗ ʷᵒʳᵏˢ!
With the help of 鋸, we parse the stdin as an XML fragment, & replace
<sup>
or <sub>
nodes with the result of supsub
script.
#!/usr/bin/env ruby
require 'nokogiri'
cmd = ARGV[0] || File.join(__dir__, 'supsub')
doc = Nokogiri::HTML.fragment STDIN.read
doc.css('sup,sub').each do |node|
IO.popen("#{cmd} #{node.name}", 'w+') do |t|
t.write node.text
t.close_write
node.replace t.gets
end
end
print doc.to_s
The script works, & for the most practical purposes one may leave it
as is, but it has 1 issue I find quite barbaric: it forks the external
program each time it needs to transform a string.
Script 2: supsub-xml, version 2
No, we are not going to rewrite supsub
as a library; we are going to
invoke it from supsub-xml
as a coprocess.
What is a coprocess? Stevens, in his APUE, described it as a program
that runs alongside a parent process & communicates with it via 2
one-way pipes.
Think of it as a local microservice: you write to it using one pipe,
then read a responce using another pipe.
A simple (but contrived) example of a Ruby script that drives
another program is to use the tr(1) utility for upcasing a string:
$ cat coprocess # broken, read below
#!/usr/bin/env ruby
parent_in, parent_out = IO.pipe
child_in, child_out = IO.pipe
spawn "tr '[a-z]' '[A-Z]'", in: child_in, out: parent_out
parent_out.close
child_in.close
child_out.puts "lol"
print parent_in.gets
child_out.puts "haha"
print parent_in.gets
If we run it, though, we won't get LOL and HAHA--the script would hang
indefinitely on the print parent_in.gets
line. The reason for that
is the usage of the libc fwrite(2) by the tr utility (at least in the
coreutils version under Linux). fwrite(2) uses a libc stream that is
buffered by default. What happens is that tr(1) eats "lol" string and
prints "LOL" into its stdout, but our coprocess
script doesn't see
the result, for "LOL" is stuck in a buffer.
We can fix the script by executing tr under the stdbuf(1) utility:
spawn "stdbuf -i0 -o0 tr '[a-z]' '[A-Z]'", in: child_in, out: parent_out
then bytes should start moving freely through the pipes:
$ ./coprocess
LOL
HAHA
Unfortunately, this fix won't work if the coprocess is a Ruby
script. Ruby has its own IO mechanism that stdbuf(1) cannot affect.
The general solution for this kind of problems is to trick a coprocess
in thinking it is connected to a (pseudo) terminal.
Thankfully, Ruby has a nifty built-in
pty extension that
abstracts away most of the communication with a pseudo terminal
device.
$ cat supsub-xml
#!/usr/bin/env ruby
require 'nokogiri'
require 'pty'
cmd = ARGV[0] || File.join(__dir__, 'supsub')
class Coprocess
def initialize cmd
@master, slave = PTY.open
read, @write = IO.pipe
spawn cmd, in: read, out: slave
read.close
slave.close
end
def puts str; @write.puts str; end
def gets; @master.gets; end
end
transforms = {
"sup" => Coprocess.new("#{cmd} sup"),
"sub" => Coprocess.new("#{cmd} sub")
}
doc = Nokogiri::HTML.fragment STDIN.read
doc.css('sup,sub').each do |node|
tr = transforms[node.name]
tr.puts node.text
node.replace tr.gets.chomp
rescue
warn "transforming `#{node.text}` failed: #{$!}"
end
print doc.to_s
If we add sleep
to the very end of the script, than we may examine
how this arrangement works:
$ $$
bash: 42396: command not found
$ echo '<sub>test</sub> <i>lol</i> <sup>haha</sup>!' | ./supsub-xml
ₜₑₛₜ <i>lol</i> ʰᵃʰᵃ!
While the script sleeps, from another terminal:
$ pstree -p 42396 -al
bash,42396
└─ruby,100032 ./supsub-xml
├─ruby,100033 --disable-gems ... sup
└─ruby,100034 --disable-gems ... sub
$ file /proc/100032/fd/* # supsub-xml
...
$ file /proc/100033/fd/* # supsub sup
...
Tags: ойті
Authors: ag