Alexander Gromnitsky's Blog

Superscripts & subscripts via a coprocess

Latest update:

How do you write 25m3? In Markdown, you'll probably end up with

25m<sup>3</sup>

Some text editors have a special machanism for entering characters not present in the current keyboard layout. E.g., in Emacs C-x 8 ^ 2 inserts superscript 2, but this won't work for writing a formula like y = x2a, as C-x 8 ^ a injects the â character.

Unicode should've had at least a super/subscript version of Latin characters, but that never happend. It has a subset of it, spreaded across seemingly random sections. For instance, a subscript letter j sits in the Latin Extended-C block between a Finno-Ugric (that remainds me of existential quantification quantifier) & a superscript capital V.

Unicode has the Superscripts and Subscripts section, but it appears more like an afterthought, incomplete & abandoned:

Superscript and Subscripts

If one peruses Unicode blocks, it's possible to assemble most of the Latin characters in super/subscript variants to write a simple script for character substitution. Then, any decent text editor could use such a script to replace a chunk of text with a superscript version of it.

I also thought of a slightly more interesting feature: selecting a chunk of text with HTML tags (like <i>y = x<sup>2a</sup></i>) in the text editor to transform only <sup> or <sub> nodes.

Script 1: supsub

It reads its input from the stdin line-by-line:

$ lsb_release -d | ./supsub sub
dₑₛ𞁞ᵣᵢₚₜᵢₒₙ:    fₑdₒᵣₐ ᵣₑₗₑₐₛₑ ₃₉ ₍ₜₕᵢᵣₜᵧ ₙᵢₙₑ₎
$ lsb_release -d | ./supsub sub | ./supsub invert
ᵈ𞀵ˢ𞀿ʳⁱ𞀾ᵗⁱ𞀼ⁿ:    ᶠ𞀵ᵈ𞀼ʳ𞀰 ʳ𞀵ˡ𞀵𞀰ˢ𞀵 ³⁹ ⁽ᵗʰⁱʳᵗʸ ⁿⁱⁿ𞀵⁾
$ lsb_release -d | ./supsub sub | ./supsub invert | ./supsub restore
dеsсrірtіоn:    fеdоrа rеlеаsе 39 (thіrty nіnе)

(Some mobile browsers have a lot of trouble rendering even Latin superscript/subscript characters.)

$ cat supsub
#!/usr/bin/env -S ruby --disable-gems

db = DATA.read.split(/\s+/).filter {|v| v}
$supa = db.map {|v| [v[0], v[1]] }.to_h
$sub = db.map {|v| [v[0], v[2]] }.to_h

mode = 'sup|sub|invert|restore'
abort "Usage: supsub #{mode} < file.txt" unless ARGV[0] =~ /^#{mode}$/

def tr mode, chars
  case mode
  when 'sup'
    chars.map { |ch| $supa[ch.downcase] || ch }.join
  when 'sub'
    chars.map { |ch| $sub[ch.downcase] || ch }.join
  when 'invert'
    supa_v = $supa.invert
    sub_v = $sub.invert
    chars.map do |ch|
      if supa_v[ch]
        $sub[supa_v[ch]] || ch
      elsif sub_v[ch]
        $supa[sub_v[ch]] || ch
      else
        ch
      end
    end.join
  else # restore
    supa_v = $supa.invert
    sub_v = $sub.invert
    chars.map { |ch| supa_v[ch] || sub_v[ch] || ch}.join
  end
end

while (line = STDIN.gets)
  print tr(ARGV[0], line.chars)
end

__END__
0⁰₀ 1¹₁ 2²₂ 3³₃ 4⁴₄ 5⁵₅ 6⁶₆ 7⁷₇ 8⁸₈ 9⁹₉ +⁺₊ -⁻₋ =⁼₌ (⁽₍ )⁾₎
aᵃₐ bᵇb cᶜ𞁞 dᵈd eᵉₑ fᶠf gᵍg hʰₕ iⁱᵢ jʲⱼ kᵏₖ lˡₗ mᵐₘ nⁿₙ oᵒₒ
pᵖₚ q𐞥q rʳᵣ sˢₛ tᵗₜ uᵘᵤ vᵛᵥ wʷw xˣₓ yʸᵧ zᶻz
а𞀰ₐ б𞀱𞁒 в𞀲𞁓 г𞀳𞁔 ґґ𞁧 д𞀴𞁕 е𞀵ₑ ж𞀶𞁗 з𞀷𞁘 и𞀸𞁙 іⁱ𞁨 їїї ййй
к𞀹𞁚 л𞀺𞁛 м𞀻ₘ нᵸн о𞀼ₒ п𞀽𞁝 р𞀾ₚ с𞀿𞁞 т𞁀т у𞁁𞁟 ф𞁂𞁠 х𞁃𞁡 ц𞁄𞁢
ч𞁅𞁣 ш𞁆𞁤 щщщ ьꚝь ю𞁉ю яяя

Script 2: supsub-xml, version 1

$ echo '<sub>test</sub> <i>lol</i> <sup>haha</sup> but <sup>it works</sup>!' | ./supsub-xml
ₜₑₛₜ <i>lol</i> ʰᵃʰᵃ but ⁱᵗ ʷᵒʳᵏˢ!

With the help of 鋸, we parse the stdin as an XML fragment, & replace <sup> or <sub> nodes with the result of supsub script.

#!/usr/bin/env ruby

require 'nokogiri'

cmd = ARGV[0] || File.join(__dir__, 'supsub')
doc = Nokogiri::HTML.fragment STDIN.read

doc.css('sup,sub').each do |node|
  IO.popen("#{cmd} #{node.name}", 'w+') do |t|
    t.write node.text
    t.close_write
    node.replace t.gets
  end
end

print doc.to_s

The script works, & for the most practical purposes one may leave it as is, but it has 1 issue I find quite barbaric: it forks the external program each time it needs to transform a string.

Script 2: supsub-xml, version 2

No, we are not going to rewrite supsub as a library; we are going to invoke it from supsub-xml as a coprocess.

What is a coprocess? Stevens, in his APUE, described it as a program that runs alongside a parent process & communicates with it via 2 one-way pipes.

Driving a coprocess

Think of it as a local microservice: you write to it using one pipe, then read a responce using another pipe.

A simple (but contrived) example of a Ruby script that drives another program is to use the tr(1) utility for upcasing a string:

$ cat coprocess # broken, read below
#!/usr/bin/env ruby

parent_in, parent_out = IO.pipe
child_in, child_out = IO.pipe
spawn "tr '[a-z]' '[A-Z]'", in: child_in, out: parent_out
parent_out.close
child_in.close

child_out.puts "lol"
print parent_in.gets
child_out.puts "haha"
print parent_in.gets

If we run it, though, we won't get LOL and HAHA--the script would hang indefinitely on the print parent_in.gets line. The reason for that is the usage of the libc fwrite(2) by the tr utility (at least in the coreutils version under Linux). fwrite(2) uses a libc stream that is buffered by default. What happens is that tr(1) eats "lol" string and prints "LOL" into its stdout, but our coprocess script doesn't see the result, for "LOL" is stuck in a buffer.

We can fix the script by executing tr under the stdbuf(1) utility:

spawn "stdbuf -i0 -o0 tr '[a-z]' '[A-Z]'", in: child_in, out: parent_out

then bytes should start moving freely through the pipes:

$ ./coprocess
LOL
HAHA

Unfortunately, this fix won't work if the coprocess is a Ruby script. Ruby has its own IO mechanism that stdbuf(1) cannot affect.

The general solution for this kind of problems is to trick a coprocess in thinking it is connected to a (pseudo) terminal.

coprocess via pty

Thankfully, Ruby has a nifty built-in pty extension that abstracts away most of the communication with a pseudo terminal device.

$ cat supsub-xml
#!/usr/bin/env ruby

require 'nokogiri'
require 'pty'

cmd = ARGV[0] || File.join(__dir__, 'supsub')

class Coprocess
  def initialize cmd
    @master, slave = PTY.open
    read, @write = IO.pipe
    spawn cmd, in: read, out: slave
    read.close
    slave.close
  end

  def puts str; @write.puts str; end
  def gets; @master.gets; end
end

transforms = {
  "sup" => Coprocess.new("#{cmd} sup"),
  "sub" => Coprocess.new("#{cmd} sub")
}

doc = Nokogiri::HTML.fragment STDIN.read
doc.css('sup,sub').each do |node|
  tr = transforms[node.name]
  tr.puts node.text
  node.replace tr.gets.chomp
rescue
  warn "transforming `#{node.text}` failed: #{$!}"
end

print doc.to_s

If we add sleep to the very end of the script, than we may examine how this arrangement works:

$ $$
bash: 42396: command not found
$ echo '<sub>test</sub> <i>lol</i> <sup>haha</sup>!' | ./supsub-xml
ₜₑₛₜ <i>lol</i> ʰᵃʰᵃ!

While the script sleeps, from another terminal:

$ pstree -p 42396 -al
bash,42396
  └─ruby,100032 ./supsub-xml
      ├─ruby,100033 --disable-gems ... sup
      └─ruby,100034 --disable-gems ... sub

$ file /proc/100032/fd/* # supsub-xml
...
$ file /proc/100033/fd/* # supsub sup
...

Tags: ойті
Authors: ag