<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:content="http://purl.org/rss/1.0/modules/content/">

<channel>
<pubDate>Tue, 13 Feb 2024 17:58:57 GMT</pubDate>
<title>Alexander Gromnitsky&#39;s Blog</title>
<description>Пограмування, ойті, витяги, цитати</description>
<link>https://sigwait.org/~alex/blog</link>
<language>en</language>
<itunes:image href="https://sigwait.org/~alex/blog/itunes.png"/>
<itunes:explicit>false</itunes:explicit>

  <itunes:category text="Drama" />


<itunes:author>ag</itunes:author>



<item>
  <title>An HTTP client in Bash</title>
  <link>https://sigwait.org/~alex/blog/2024/02/13/http-client-in-bash.html</link>
  <guid>https://sigwait.org/~alex/blog/2024/02/13/http-client-in-bash.html</guid>
  <pubDate>Tue, 13 Feb 2024 17:58:57 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>I recently saw a tweet where a guy was asking how to download curl
within a minimal Debian container that didn't have any scripting
language installed except for Bash, no wget, or anything like that.</p>
<p>If such a container has apt-get, but you lack permission to run it,
there is a reliable way to force apt-get to <em>download</em> a .deb file with
all its dependencies under a regular user, but we won't discuss that
here.</p>
<p>I got curious about how hard it would be to write a primitive HTTP
get-only client in Bash, as Bash is typically compiled with "network"
redirection support:</p>
<pre>$ <b>exec</b> 3&lt;&gt; /dev/tcp/www.gnu.org/80
$ <b>printf</b> "%s\r\n" 'HEAD /robots.txt HTTP/1.1' &gt;&amp;3
$ <b>printf</b> "%s\r\n\r\n" 'Host: www.gnu.org' &gt;&amp;3
$ <b>cat</b> &lt;&amp;3
HTTP/1.1 200 OK
Date: Sun, 11 Feb 2024 07:02:40 GMT
Server: Apache/2.4.29
Content-Type: text/plain
Content-Language: non-html
…
</pre>

<p>This could've been useful before the days of TLS everywhere, but it
won't suffice now: to download a statically compiled curl binary from
Github, we need TLS support and proper handling of 302
redirections. Certainly, it's possible to cheat: put the binary on our
web server and serve it under plain HTTP, but that would be too easy.</p>
<p>What if we use ncat+openssl as a forward TLS proxy? ncat may serve as
an <a href="https://sigwait.org/~alex/blog/2024/01/11/rWQ3T5.html">initd-like super-server</a>, invoking "openssl s_client" on each
connection:</p>
<pre><code>$ cat proxy.sh
#!/bin/sh
read -r host
openssl s_client -quiet -no_ign_eof -verify_return_error "$host"
$ ncat -vk -l 10.10.10.10 1234 -e proxy.sh
</code></pre>
<p>The 1st thing we need in the bash-http-get client is URL parsing. It
wouldn't have been necessary if Github served files directly from
"Releases" pages, but it does so through redirects. Therefore, when we
grab <code>Location</code> header from a response, we need to disentangle its
hostname from a pathname.</p>
<p>Ideally, it should work like <code>URL()</code> constructor in JavaScript:</p>
<pre><code>$ node -pe 'new URL("https://q.example.com:8080/foo?q=1&amp;w=2#lol")'
URL {
  href: 'https://q.example.com:8080/foo?q=1&amp;w=2#lol',
  origin: 'https://q.example.com:8080',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'q.example.com:8080',
  hostname: 'q.example.com',
  port: '8080',
  pathname: '/foo',
  search: '?q=1&amp;w=2',
  searchParams: URLSearchParams { 'q' =&gt; '1', 'w' =&gt; '2' },
  hash: '#lol'
}
</code></pre>
<p>StackOverflow has various examples of how to achieve that using
regular expressions, but none of them were able to parse the example
above. I tried asking ChatGPT to repair the regex, but it only made it
worse. Miraculously, Google's Gemini supposedly fixed the regex on the
second try (I haven't tested it extensively).</p>
<pre><code>$ cat lib.bash
declare -A URL

url_parse() {
    local pattern='^(([^:/?#]+):)?(//((([^:/?#]+)@)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))?(\?([^#]*))?(#(.*))?'
    [[ "$1" =~ $pattern ]] &amp;&amp; [ "${BASH_REMATCH[2]}" ] &amp;&amp; [ "${BASH_REMATCH[4]}" ] || return 1
    URL=(
        [proto]=${BASH_REMATCH[2]}
        [host]=${BASH_REMATCH[4]}
        [hostname]=${BASH_REMATCH[7]}
        [port]=${BASH_REMATCH[9]}
        [pathname]=${BASH_REMATCH[10]:-/}
        [search]=${BASH_REMATCH[12]}
        [hash]=${BASH_REMATCH[14]}
    )
}
</code></pre>
<p>Next, we need to separate headers from a response body. This means
looking for the 1st occurrence of <code>\r\n\r\n</code>. Sounds easy,</p>
<pre><code>grep -aobx $'\r' file | head -1
</code></pre>
<p>until you decide to port the client to a BusyBox-based system like
Alpine Linux. The latter has grep that doesn't support <code>-ab</code>
options. There are some advices on employing od(1), but no
examples. If we print a file using a 2-column format:</p>
<pre><code>0000000 68
0000001 20
0000002 3a
…
</code></pre>
<p>where the left column is a decimal offset, we can convert the 1st 32KB
of the response into a single line and search for the pattern using
<code>grep -o</code>:</p>
<pre><code>od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
    grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
    awk '{if (NR==1) print $7+0}'
</code></pre>
<p>Here's the full version of the client that supports only URLs with the
<code>https</code> protocol. It saves the response in a temporary file and looks
for the <code>\r\n\r\n</code> offset. If the HTTP status code was 200, it prints
the body to stdout. If it was 302, it extracts the value of the
<code>Location</code> header and recursively calls itself with a new URL.</p>
<pre><code>#!/usr/bin/env bash

set -e -o pipefail
. "$(dirname "$(readlink -f "$0")")/lib.bash"

tmp=`mktemp fetch.XXXXXX`
trap 'rm -f $tmp' 0 1 2 15
eh() { echo "$*" 1&gt;&amp;2; exit 2; }

[ $# = 3 ] || eh Usage: fetch.bash proxy_host proxy_port url
proxy_host=$1
proxy_port=$2
url=$3

get() {
    url_parse "$1"; [ "${URL[proto]}" = https ] || return 1

    exec 3&lt;&gt; "/dev/tcp/$proxy_host/$proxy_port" || return 1
    echo "${URL[hostname]}:${URL[port]:-443}" &gt;&amp;3
    printf "GET %s HTTP/1.1\r\n" "${URL[pathname]}${URL[search]}${URL[hash]}" &gt;&amp;3
    printf '%s: %s\r\n' Host "${URL[hostname]}" Connection close &gt;&amp;3
    printf '\r\n' &gt;&amp;3
    cat &lt;&amp;3
}

get "$url" &gt; "$tmp" || eh ':('
[ -s "$tmp" ] || eh 'Empty reply, TLS error?'

offset_calc() {
    if echo 1 | grep -aobx 1 &gt;/dev/null 2&gt;&amp;1; then # gnu-like grep
        grep -aobx $'\r' "$tmp" | head -1 | tr -d '\r\n:' | \
            xargs -r expr 1 +
    else                                      # busybox?
        od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
            grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
            awk '{if (NR==1) print $7+0}'
    fi || echo -1
}
offset=`offset_calc`
headers() { head -c "$offset" "$tmp" | tr -d '\r'; }
hdr() { headers | grep -m1 -i "^$1:" | cut -d' ' -f2; }

status=`head -1 "$tmp" | cut -d' ' -f2`
case "$status" in
    200) [ "$offset" = -1 ] &amp;&amp; offset=-2 # invalid responce, dump all
         tail -c+$((offset + 2)) "$tmp"
         [ "$offset" -gt 0 ] ;;
    302) headers 1&gt;&amp;2; echo 1&gt;&amp;2
         hdr location | xargs "$0" "$1" "$2" ;;
    *)   headers 1&gt;&amp;2; exit 1
esac
</code></pre>
<p>It should work even on Alpine Linux of FreeBSD:</p>
<pre><code>$ ./fetch.bash 10.10.10.10 1234 https://github.com/stunnel/static-curl/releases/download/8.6.0/curl-linux-arm64-8.6.0.tar.xz &gt; curl.tar.xz
HTTP/1.1 302 Found
Location: https://objects.githubusercontent.com/…
…
$ file curl.tar.xz
curl.tar.xz: XZ compressed data, checksum CRC64
</code></pre>
]]></description>
  
</item>

<item>
  <title>Comparing Compression</title>
  <link>https://sigwait.org/~alex/blog/2024/01/29/PYWKXg.html</link>
  <guid>https://sigwait.org/~alex/blog/2024/01/29/PYWKXg.html</guid>
  <pubDate>Mon, 29 Jan 2024 08:10:28 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>Do you benchmark compression tools (like xz or zstd) on your own data,
or do you rely on common wisdom? The best result for an uncompressed
300MB XFS image from the <a href="https://sigwait.org/~alex/blog/2024/01/25/0rQoiM.html">previous post</a> was achieved by bzip2, which is
rarely used nowadays. How does one quickly check a chunk of data
against N popular compressors?</p>
<p>E.g., an unpacked tarball of Emacs 29.2 source code consists of 6791
files with a total size of 276MB. If you were to distribute it as a
<em>.tar.something</em> archive, which compression tool would be the optimal
choice? We can easily write a small utility that answers this
question.</p>
<pre><code>$ ./comprtest ~/opt/src/emacs/emacs-29.2 | tee table
tar: Removing leading `/' from member names
szip             0.59   56.98        126593557
gzip             9.21   72.70         80335332
compress         3.57   57.45        125217137
bzip2           17.28   78.08         64509672
rzip            17.61   79.50         60336377
lzip           113.61   81.67         53935898
lzop             0.67   57.14        126121462
xz             111.03   81.89         53295220
brotli          13.10   78.14         64336399
zstd             1.13   73.77         77179446
</code></pre>
<p><code>comprtest</code> is a 29 LOC long shell script. The 2nd column here
indicates time in seconds, the 3rd column displays
<math alttext="100(1-compressed/orig)">
<mn>100</mn><mo>(</mo><mn>1</mn><mo>-</mo><mfrac><mi>compressed</mi><mi>orig</mi></mfrac><mo>)</mo>
</math>, representing space saving in % (higher % is better), &amp; the 4th column
shows the final result in bytes.</p>
<p>Then we can sort the table by the 3rd column &amp; draw a bar chart:</p>
<pre><code>$ sort -nk3 table | cpp -P plot.gp | gnuplot -persist
</code></pre>
<img alt="" src="https://sigwait.org/~alex/blog/2024/01/29/emacs.svg">
<style>
@media (prefers-color-scheme: dark) {
  img[src="emacs.svg"], img[src="emacs.stackexchange.com.svg"] {
    filter: invert(0.7);
  }
}
</style>

<p>If you're wondering why all of a sudden the C preprocessor becomes
part of it, read on.</p>
<p><code>comprtest</code> expects either a file as an argument or a directory (in
which case it creates a plain <em>.tar</em> of it first). Additional optional
arguments specify which compressors to use:</p>
<pre><code>$ ./comprtest /usr/libexec/gdb gzip brotli
gzip             0.60   61.17          6054706
brotli           1.17   65.84          5325408
</code></pre>
<p>The gist of <a href="https://sigwait.org/~alex/blog/2024/01/29/comprtest">the script</a> involves looping over a list of
compressors:</p>
<pre><code>archivers='szip gzip compress bzip2 rzip lzip lzop xz brotli zstd'
…
for c in ${@:-$archivers}; do
    echo $c
    case $c in
        szip   ) args='&lt; "$input" &gt; $output' ;;
        rzip   ) args='-k -o $output "$input"' ;;
        brotli ) args='-6 -c "$input" &gt; $output' ;;
        *      ) args='-c "$input" &gt; $output'
    esac

    eval "time -p $c $args" 2&gt;&amp;1 | awk '/real/ {print $2}'
    osize=`wc -c &lt; $output`

    echo $isize $osize | awk '{print 100*(1-$2/($1==0?$2:$1))}'
    echo $osize
    rm $output
done | xargs -n4 printf "%-8s  %11.2f  %6.2f  %15d\n"
</code></pre>
<ul>
<li>Not every archive tool has gzip-compatible CLI.</li>
<li>We are using a default compression level for each tool with the
exception of <code>brotli</code>, as its default level 11 is excruciatingly
slow.</li>
<li>szip is an interface to the Snappy algorithm. Your distro probably
doesn't have it in its repos, hence run <code>cargo install szip</code>. Everything else should be available via dnf/apt.</li>
</ul>
<p>Bar charts are generated by a gnuplot script:</p>
<pre><code>$ cat plot.gp
$data &lt;&lt;E
#include "/dev/stdin"
E
set key tmargin
set xtics rotate by -30 left
set y2tics
set ylabel "Seconds"
set y2label "%"
set style data histograms
set style fill solid
plot $data using 2 axis x1y1 title "Time", \
     "" using 3:xticlabels(1) axis x1y2 title "Space saving"
</code></pre>
<p>Here is where the C preprocessor comes in handy: without an injected
"datablock" it won't be possible to draw a graph with 2 ordinates when
reading data from stdin.</p>
<p>In an attempt to demonstrate that xz is not always the best choice, I
benchmarked a bunch of XML files (314MB):</p>
<pre><code>$ ./comprtest ~/Downloads/emacs.stackexchange.com.tar
szip             0.59   63.70        119429565
gzip             7.18   77.59         73724710
compress         4.03   67.17        108015563
bzip2           21.37   83.36         54751478
rzip            17.42   85.93         46304199
lzip           119.70   85.06         49151518
lzop             0.67   63.63        119667058
xz             125.80   85.55         47559464
brotli          13.56   82.52         57509978
zstd             1.07   79.40         67766890
</code></pre>
<img alt="" src="https://sigwait.org/~alex/blog/2024/01/29/emacs.stackexchange.com.svg">
]]></description>
  
</item>

<item>
  <title>Disk images as archive file formats</title>
  <link>https://sigwait.org/~alex/blog/2024/01/25/0rQoiM.html</link>
  <guid>https://sigwait.org/~alex/blog/2024/01/25/0rQoiM.html</guid>
  <pubDate>Thu, 25 Jan 2024 17:10:04 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>As a prank, how do you create an archive in Linux that ⓐ cannot be
opened in Windows (without WSL2 or Cygwin), ⓑ <em>can</em> be opened in MacOS
of FreeBSD?</p>
<p>Creating an <em>.cpio</em> or <em>.tar.xz</em> won't cut it: file archivers such as
7-Zip are free &amp; easy to install. Furthermore, sending an ext4
image, generated as follows:</p>
<pre><code>$ truncate -s 10M file.img
$ mkfs.ext4 file.img
$ sudo mount -o loop file.img /somewhere
$ sudo cp something /somewhere
$ sudo umount /somewhere
</code></pre>
<p>doesn't help nowadays, for 7-Zip opens them too<a class="footnote" href="#OrQoiM-1"><sup>1</sup></a>. Although disk cloning utils like
FSArchiver can produce an image file from a directory, they are
exclusive to Linux.</p>
<p>It boils down to this: which filesystems can be read across
Linux/MacOS/FreeBSD that Windows file archivers don't recognise?  This
rules out fat/ntfs/udf, for they are too common, or f2fs/nilfs2, for
they are Linux-only.</p>
<p>The only viable candidate I found is XFS. Btrfs was a contender, but
I'm unsure how to mount it on Mac.</p>
<p>Below is a script to automate the creation of prank archives. It takes
any zip/tar.gz (or anything else that bsdtar is able to parse) &amp;
outputs an image file in the format specified by the output file
extension:</p>
<pre><code>sudo ./mkimg file.zip file.xfs
</code></pre>
<p>It requires sudo, for <code>mount -o loop</code> can't be done under a regular
user.</p>
<pre><code>#!/bin/sh

set -e

input=$1
output=$2
type=${2##*.}
[ -r "$input" ] &amp;&amp; [ "$output" ] &amp;&amp; [ "`id -u`" = 0 ] || {
    echo Usage: sudo mkimg file.zip file.ext2 1&gt;&amp;2
    exit 1
}
mkfs=mkfs.$type
cmd() { for c; do command -v $c &gt;/dev/null || { echo no $c; return 1; }; done; }
cmd bsdtar "$mkfs"

cleanup() {
    set +e
    umount "$mnt" 2&gt;/dev/null
    rm -rf "$mnt" "$log"
    [ "$ok" ] || rm -f "$output"
}

trap cleanup 0 1 2 15
usize=`bsdtar tvf "$input" | awk '{s += $5} END {print s}'`
mnt=`mktemp -d`
log=`mktemp`

case "$type" in
    msdos|*fat) size=$((1024*1024 + usize*2)); opt_tar=--no-same-owner ;;
    ext*|udf  ) size=$((1024*1024 + usize*2)) ;;
    f2fs      ) size=$((1024*1024*50 + usize*2)) ;;
    btrfs     ) size=$((114294784 + usize*2)) ;;
    nilfs2    ) size=$((134221824 + usize*2)) ;;
    xfs       ) size=$((1024*1024*300 + usize*2)) ;;
    jfs       ) size=$((1024*1024*16 + usize*2)); opt=-q ;;
    hfsplus   )
        size=$((1024*1024 + usize*2))
        [ $((size % 4096)) != 0 ] &amp;&amp; size=$((size + (4096-(size % 4096)))) ;;
    *) echo "$type is untested" 1&gt;&amp;2; exit 1
esac
rm -f "$output"
truncate -s $size "$output"
$mkfs $opt "$output" &gt; "$log" 2&gt;&amp;1 || { cat "$log"; exit 1; }

mount -o loop "$output" "$mnt"
bsdtar -C "$mnt" $opt_tar --chroot -xf "$input"
[ "$SUDO_UID" ] &amp;&amp; chown "$SUDO_UID:$SUDO_GID" "$output"
ok=1
</code></pre>
<p><em>.xfs</em> files start at a size of 300MB, even if you place a
0-length file in it, but bzip2 compresses such an image into 6270
bytes.</p>
<p>To mount an <em>.xfs</em> under a regular user, use
<a href="https://github.com/libyal/libfsxfs/">libfsxfs</a>.</p>
<hr>

<ol>
<li id="OrQoiM-1"><code>7z -i</code> prints all supported formats.</li>
</ol>
]]></description>
  
</item>

<item>
  <title>Home streaming &amp; inetd-style servers</title>
  <link>https://sigwait.org/~alex/blog/2024/01/11/rWQ3T5.html</link>
  <guid>https://sigwait.org/~alex/blog/2024/01/11/rWQ3T5.html</guid>
  <pubDate>Thu, 11 Jan 2024 08:49:53 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>The easiest way to stream a movie is to serve it using a static HTTP
server that supports <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests">range requests</a>. For this, even Ruby's Webrick
will do the job. Type this in a directory with your <em>The Sopranos</em>
collection:</p>
<pre><code>$ ruby -run -ehttpd . -b 127.0.0.1 -p 8000
</code></pre>
<p>&amp; point mpv or vlc to a particular episode:</p>
<pre><code>$ mpv http://127.0.0.1/s01e01.mp4
</code></pre>
<p>This should work as if you're playing a local file. To play a movie
with a web browser, make sure the web server returns correct
<code>Content-Type</code> headers. A container format counts too: e.g., Chrome
doesn't like <em>mkv</em>.</p>
<p>Can we do something similar without the HTTP server? Depending on the
container format, it's possible to feed mpv with a raw TCP
stream. We'll lose seeking, but if we were creating, say, a YouTube
Shorts or Facebook Reels competitor, this won't matter, for consumers
of these kind of clips don't care much about that.</p>
<p>The most primitive solution requires only 2 utils:</p>
<ol>
<li><p><em>ncat</em>, that can listen on a socket &amp; fork an external program when
someone connects to the former:</p>
<pre><code>$ cat mickeymousetube
#!/bin/sh

export movie="${1:?Usage: ${0##*/} file.mkv [port]}"
port=${2:-61001}
type pv ncat || exit 1

__dirname=$(dirname "$(readlink -f "$0")")
ncat -vlk -e "$__dirname/pv.sh" 127.0.0.1 $port
</code></pre>
</li>
<li><p><em>pv</em>, the famous pipe monitor that can limit a transfer rate;
without the limiter, mpv eats all available bandwidth:</p>
<pre><code>$ cat pv.sh
#!/bin/sh
pv -L2M "$movie"
</code></pre>
<p>The <code>-L2M</code> option means max 2MB/s.</p>
</li>
</ol>
<p>Then run <code>mickeymousetube</code> in one terminal &amp; <code>mpv tcp://127.0.0.1:61001</code> in another to play a clip.</p>
<h3>tcplol</h3>
<p>How hard may it be to replace <em>ncat</em> with our custom script? What
<em>ncat</em> does with <code>-e</code> option is akin to what inetd did back in the
day:</p>
<img alt="Steps performed by inetd" src="https://sigwait.org/~alex/blog/2024/01/11/inetd.svg">

<p>(The illustration is from Stevens' <em>UNIX Network Programming</em>.)</p>
<p>Instead of creating a server that manages sockets, one writes a
program that simply reads from stdin and outputs to stdout. All the
intricacies of properly handling multiple clients are managed by the
super-duper-server.</p>
<p>There is no (x)inetd package in modern distros like Fedora, as systemd
has superseded it with <a href="https://0pointer.de/blog/projects/inetd.html">socket activation</a>.</p>
<p>Suppose we have a script that asks a user for his nickname &amp; greets
him in return:</p>
<pre><code>$ cat hello.sh
#!/bin/sh
uname 1&gt;&amp;2
while [ -z "$name" ]; do
    printf "Nickname? "
    read -r name || exit 1
done
echo "Hello, $name!"
</code></pre>
<p>To expose it to a network, we can either write 2 systemd unit files &amp;
place them in <code>~/.config/systemd/user/</code>, <em>or</em> opt for a tiny 37 LOC <a href="https://sigwait.org/~alex/blog/2024/01/11/tcplol">Ruby script</a> instead:</p>
<pre><code>require 'socket'

usage = 'Usage: tcplol [-2v] [-h 127.0.0.1] -p 1234 program [args...]'
…

server = TCPServer.new opt['h'], opt['p']
loop do
  client = server.accept
  cid = client.remote_address.ip_unpack.join ':'

  warn "Client #{cid}"
  pid = fork do
    $stdin.reopen client
    $stdout.reopen client
    $stderr.reopen client if opt['2']
    client.close
    exec(*ARGV)
  end
  client.close
  Thread.new(cid, pid) do
    Process.wait pid
    warn "Client #{cid}: disconnect"
  end
end
</code></pre>
<p>This is a classic fork server that uses a thread for each fork to
watch out for zombies. The linked tcplol script performs an additional
clean-up in case the server gets hit with a SIGINT, for example.</p>
<p><em>ncat</em>, on the other hand, operates quite differently:</p>
<ol>
<li>it creates 2 pipes;</li>
<li>after each new connection, it forks itself;</li>
<li>it connects the 2 pipes to the child's stdin/stdout;</li>
<li>(in the parent process) it listens on a connected socket using
select(2) syscall and transfers data to/from the child using the 2
pipes; we'll talk about select(2) and the concept of multiplexing
later on.</li>
</ol>
<p>Anyhow, if we run our much simpler "super-server":</p>
<pre><code>$ ./tcplol -v -p 1234 ./hello.sh
</code></pre>
<p>&amp; connect to it with 2 socat clients, the process tree under Linux
would look like:</p>
<pre><code>$ pstree `pgrep -f ./tcplol` -ap
ruby,259576 ./tcplol -v -p 8000 ./hello.sh
  ├─hello.sh,259580 ./hello.sh
  ├─hello.sh,259587 ./hello.sh
  ├─{ruby},259583
  ├─{ruby},259588
  └─{ruby},259589
</code></pre>
<p>The dialog:</p>
<pre>$ socat - TCP4:127.0.0.1:8000
Nickname? <i>Dude</i>
Hello, Dude!
</pre>

<p>(Why socat? We can use <em>ncat</em> as well, but the latter <a href="https://github.com/nmap/nmap/issues/1413">doesn't close
its end of a connection</a>; it
hangs in CLOSE_WAIT until one presses <kbd>Ctrl-D</kbd>.)</p>
<p>To play a movie, run</p>
<pre><code>$ ./tcplol -v -p 8000 ./pv.sh file.mkv
</code></pre>
<p>using a modified version of pv.sh script:</p>
<pre><code>#!/bin/sh
echo Streaming "$1" 1&gt;&amp;2
pv -L2M "${1?Usage: pv.sh file}"
</code></pre>
<p>Then connect to the server with</p>
<pre><code>$ mpv tcp://127.0.0.1:8000
</code></pre>
<h3>Mickey mouse SOCKS4 server</h3>
<p>inetd-style services can perform various actions, not just humbly
write to stdout. Nothing prevents such a service from opening a
connection to a different machine and relaying bytes from it to the
tcplol clients.</p>
<p>To illustrate the perils of the low-level socket interface, let's
write a crude, allow-everyone socks4 service and test it with
curl. The objective is to retrieve <code>security.txt</code> file from Google
using a TLS connection like so:</p>
<pre><code>$ curl -L https://google.com/.well-known/security.txt --proxy socks4://127.0.0.1:8000
</code></pre>
<p>As a socks4 client, curl sends a request to <code>127.0.0.1:8000</code> with an
IP+port to which it wants our service to establish a connection
(meaning we don't have to resolve google.com domain name
ourselves). We decode this and promptly send an acknowledgment
reply. This is the 1st part of socks4.rb which we are going run under
tcplol:</p>
<pre><code>$stdout.sync = true

req = $stdin.read 8 + 1
ver, command, port, ip = req.unpack 'CCnN' # 8 bytes
abort 'Invalid CONNECT' unless ver == 4 &amp;&amp; command == 1

ip = ip.to_s(16).scan(/.{2}/).map(&amp;:hex) # [a,b,c,d]
res = [0, 90].pack('C*') +               # request granted
      [port].pack('n') + ip.pack('C*')
$stdout.write res
</code></pre>
<p>What should we do next? As soon as curl gets the value of 'res'
variable, it eagerly starts sending a TLS ClientHello message to
<code>127.0.0.1:8000</code>. At this point, we don't need to analyse exactly what
it sends--our primary concern is relaying traffic to and fro as
quickly as possible without losing bytes.</p>
<p>To temporarily test that we have correctly negotiated SOCKS4, we can
conclude the script with the <em>ncat</em> call:</p>
<pre><code>exec "ncat", "-v", ip.join('.'), port.to_s
</code></pre>
<p>It should work. However, we can also rewrite that line in pure Ruby
using the <code>Kernel.select</code> method. What we need here is to monitor 2
file descriptors in different modes to react to changes in their
state:</p>
<ol>
<li>in <em>reading</em> mode: stdin and a TCP socket to google.com;</li>
<li>in <em>writing</em> mode: the TCP socket to google.com.</li>
</ol>
<p>(We assume that stdout is always available.) This kind of
programming--being notified when an IO connection is ready (for
reading or writing, for example) on a set of file descriptors--is
called IO multiplexing. Most web programmers never encounter it
because the socket interface is many levels below the stack they are
working in, but it may be interesting sometimes to see how the sausage
is made.</p>
<p>Replace <code>exec "ncat"</code> line with:</p>
<pre><code>require 'socket'

s = TCPSocket.new ip.join('.'), port
wbuf = []
BUFSIZ = 1024 * 1024
loop do
  sockets = select [$stdin, s], [s], [], 5

  sockets[0].each do |socket|   # readable
    if socket == $stdin
      input = $stdin.readpartial BUFSIZ
      wbuf &lt;&lt; input
    else
      input = socket.readpartial BUFSIZ
      $stdout.write input
    end
  end

  sockets[1].each do |socket|   # writable
    wbuf.each { |v| socket.write v }
    wbuf = []
  end
end
</code></pre>
<p>We're establishing a connection to google.com and then initiating the
monitoring of two file descriptors in an endless loop. The <code>select</code>
method blocks until one or both of these file descriptors become
available for reading or writing. The last argument to it is a timeout
in seconds.</p>
<p>When <code>select</code> unblocks, <code>sockets[0]</code> contains an array of file
descriptors available for reading. If it's stdin, we read the data the
OS kernel thinks is obtainable &amp; save such a chunk to <code>wbuf</code> array. If
it is a socket to google.com, we read some bytes from it &amp; immediately
write them to stdout for curl to consume.</p>
<p><code>sockets[1]</code> contains an array of file descriptors available for
writing. We only have 1 google.com socket here, to which we write the
contents of <code>wbuf</code> array.</p>
<p>The script terminates when <code>$stdin.readpartial</code> returns an
<code>EOFError</code>. This indicates to curl that the other party has closed its
connection.</p>
<p>If you run <a href="https://sigwait.org/~alex/blog/2024/01/11/socks4.rb">socks4.rb</a> under tcplol:</p>
<pre><code>./tcplol -v -p 8000 ./socks4.rb
</code></pre>
<p>and observe errors tcplol prints from socks4.rb, you'll see that curl
makes 2 requests to google.com, for the first one yields 301.</p>
<pre><code>$ curl -sLI https://google.com/.well-known/security.txt --proxy socks4://127.0.0.1:8000 | grep -E '^HTTP|content-length'
HTTP/2 301
content-length: 244
HTTP/2 200
content-length: 246
</code></pre>
]]></description>
  
</item>

<item>
  <title>Shell postcards</title>
  <link>https://sigwait.org/~alex/blog/2024/01/01/FHjc1s.html</link>
  <guid>https://sigwait.org/~alex/blog/2024/01/01/FHjc1s.html</guid>
  <pubDate>Tue, 13 Feb 2024 12:19:30 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>In the very distant past, there were web services for creating
"virtual postcards": collections of cats, flowers, Santas, &amp;c,
sometimes animated, where you would chose a picture, write a couple of
words, provide a email of a recipient, &amp; press Submit.</p>
<p>I think today, when every desktop OS has a decent terminal emulator,
we can sent little shell scripts instead. E.g., if Bob receives a file</p>
<pre><code>$ fold -w34 postcard
tail -c+37 "$0"|base64 -d|gzip -cd
#H4sIALdOkGUAA62XO47bMBCG+1xBDbNNu
sBeCnDhA+QQrlwY2c1GyCIrJ0jHQoUKFbI
...
EkmUw04ZNvxUwGDoSYCTCG+IUYwfpg/7MA
v8BGNZ3QFsTAAA=
</code></pre>
<p>he won't be able to immidiately see what it is about unless he runs
it:</p>
<pre>$ chmod +x postcard
$ ./postcard
<img alt="an unencrypted postcard" src="https://sigwait.org/~alex/blog/2024/01/01/RN-BART.svg"></pre>

<p>The "payload" in the "postcard" is modified ANSI art from 1992
(<code>RN-BART.MIR</code> in <a href="http://artscene.textfiles.com/artpacks/1992/mirage01.zip">mirage01.zip</a> artpack), converted to UTF-8.</p>
<p>We can generate such a program with a simple script that expects
raw payload form stdin:</p>
<pre><code>$ cat mkpostcard
#!/bin/sh

set -e
prefix() { printf 'tail -c+37 "$0"|base64 -d|gzip -cd\n#'; }

script=`mktemp`
trap 'rm -f "$script"' 1 2 15

prefix &gt; "$script"
gzip -c | base64 -w0 &gt;&gt; "$script"
chmod +x "$script"
mv "$script" "${1:-postcard}"
</code></pre>
<p>Easy-peasy. We are not adding a shebang, as in this case, <a href="https://sigwait.org/~alex/blog/2016/10/29/bom-and-exec.html">it may be
omitted</a>.</p>
<p>We can also go further &amp; generate a password-protected postcard. This
presents some difficulties, though: POSIX doesn't mention any command
line utility for (en|de)cryption, &amp; openssl could be missing on a
target machine. Thus we either</p>
<ul>
<li>write <a href="https://henry-flower.dreamwidth.org/511514.html">a xor-cipher in
awk</a> or</li>
<li>embed a source code of a C implementation of ChaCha20, but again,
the target machine may not have a compiler installed or</li>
<li>embed an αcτµαlly pδrταblε εxεcµταblε what will take care of decryption.</li>
</ul>
<p>Everybody has heard about <a href="https://github.com/jart/cosmopolitan">Cosmopolitan
Libc</a> toolchain during the
Covid19 days, but personally, I haven't had any use for it.</p>
<p>It's an amazing peace of work. E.g., this minimal <a href="http://www.cypherspace.org/rsa/rc4c.html">rc4 cipher
implementation</a> (I believe
it's public domain)</p>
<pre><code>$ cat rc4.c
#define S ,t=s[i],s[i]=s[j],s[j]=t /* rc4 hexkey &lt;file */
unsigned char k[256],s[256],i,j,t;main(c,v,e)char**v;{++v;while(++i)s[
i]=i;for(c=0;*(*v)++;k[c++]=e)sscanf((*v)++-1,"%2x",&amp;e);while(j+=s[i]
+k[i%c]S,++i);for(j=0;c=~getchar();putchar(~c^s[t+=s[i]]))j+=s[++i]S;}
</code></pre>
<p>compiles into</p>
<pre><code>$ file rc4
rc4: DOS/MBR boot sector; partition 1 : ID=0x7f, active, start-CHS (0x0,0,1), end-CHS (0x3ff,255,63), startsector 0, 4294967295 sectors
$ du rc4
404K    rc4
</code></pre>
<p>that runs on Linux (including aarch64, I specifically checked that!)
and FreeBSD.</p>
<p><code>rc4.c</code> has a few deficiencies:</p>
<ul>
<li>if its <code>argv[1]</code> is null or an empty string, it coredumps;</li>
<li>it requires a hex-formatted string, not a plain-text password.</li>
</ul>
<p>At first, I fixed the bug, &amp; added an str2hex conversion. I could
continue &amp; add everything else: embed an encrypted message, read the
password from terminal, &amp;c, but why? Writing shell scrips is easier &amp;
a postcard recipent then gets a text file, not a binary<a class="footnote" href="#FHjc1s-1"><sup>1</sup></a>.  Hence, I reverted
to the "stock" <code>rc4.c</code>, &amp; decided to use Ruby's erb instead:</p>
<pre><code>$ cat message.erb
#&lt;%= rc4=File.read(rc4) %&gt;
#&lt;%= message=File.read(message) %&gt;
slice() { tail -c+$1 "$0" | head -c $2 | base64 -d | gzip -cd; }
cleanup() { rm -f "$rc4"; stty echo; }
set -e
rc4=`mktemp`
trap cleanup 0
trap 'cleanup; echo 1&gt;&amp;2; exit 1' 1 2 15
slice 2 '&lt;%= rc4.length %&gt;' &gt; "$rc4"
chmod +x "$rc4"
stty -echo
printf 'Password: ' 1&gt;&amp;2; read -r password; printf "\n" 1&gt;&amp;2
[ -n "$password" ]
hexkey=`printf '%s' "$password" | od -A n -t x1 -v | tr -d ' \n'`
slice '&lt;%= rc4.length+4 %&gt;' '&lt;%= message.length %&gt;' | "$rc4" $hexkey
</code></pre>
<p>It generates a "password-protected" shell script, that</p>
<ol>
<li>extracts the αcτµαlly pδrταblε <code>rc4</code> εxεcµταblε from its 1st
comment line;</li>
<li>asks a user for a password;</li>
<li>converts the password to a hex string;</li>
<li>extracts the "message" from its 2nd comment line and decrypts it
with <code>rc4</code>.</li>
</ol>
<p>Makefile, that assists in creating such a script:</p>
<pre><code>password := monkey
message := payload/hello1.txt
out := _out
CC := ~/opt/s/cosmocc/bin/cosmocc

all: $(out)/message

$(out)/%: %.c
    @mkdir -p $(dir $@)
    $(CC) --std=c89 -o $@ $&lt;

$(out)/message: message.erb $(out)/rc4
    gzip -c &lt; $(out)/rc4 | base64 -w0 &gt; rc4.text
    $(out)/rc4 $(hexkey) &lt; $(message) | gzip -c | base64 -w0 &gt;message.text
    erb rc4=rc4.text message=message.text $&lt; &gt; $@
    chmod +x $@
    rm *.text

.DELETE_ON_ERROR:
hexkey = $(shell printf '%s' $(call se,$(password)) | od -A n -t x1 -v | tr -d ' \n')
se = '$(subst ','\'',$1)'
</code></pre>
<p>You'll need to adjust <code>CC</code> variable which expects the compiler from
Cosmopolitan Libc toolchain.</p>
<p>Then</p>
<pre>$ make password=12345 message=payload/TO-GZ.txt
...
$ du _out/message
304K    _out/message
$ _out/message
Password:
<img alt="a pixelated dog" src="https://sigwait.org/~alex/blog/2024/01/01/dog.svg"></pre>

<p>I should end with a remainder that rc4 is not a moderd,
state-of-the-art cipher, &amp; you should absolutely not blindly run
"encrypted" postcards from unknown sources.</p>
<hr>

<ol>
<li id="FHjc1s-1">Whilst αcτµαlly pδrταblε εxεcµταblεs are also
shell scripts that cleverly pretend otherwise, they are not
plain-text files.</li>
</ol>
]]></description>
  
</item>

<item>
  <title>Superscripts &amp; subscripts via a coprocess</title>
  <link>https://sigwait.org/~alex/blog/2023/12/21/fkvF42.html</link>
  <guid>https://sigwait.org/~alex/blog/2023/12/21/fkvF42.html</guid>
  <pubDate>Thu, 21 Dec 2023 12:20:06 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>How do you write 25m<sup>3</sup>? In Markdown, you'll probably end up
with</p>
<pre><code>25m&lt;sup&gt;3&lt;/sup&gt;
</code></pre>
<p>Some text editors have a special machanism for entering characters not
present in the current keyboard layout. E.g., in Emacs <code>C-x 8 ^ 2</code>
inserts superscript <sup>2</sup>, but this won't work for writing a
formula like <i>y = x<sup>2a</sup></i>, as <code>C-x 8 ^ a</code> injects the <em>â</em>
character.</p>
<p>Unicode should've had at least a super/subscript version of Latin
characters, but that never happend. It has a <em>subset</em> of it, spreaded
across seemingly random sections. For instance, a subscript letter <em>j</em>
sits in the Latin Extended-C block between a Finno-Ugric <em>∃</em>
(that remainds me of existential quantification quantifier) &amp; a
superscript capital <em>V</em>.</p>
<p>Unicode has the Superscripts and Subscripts section, but it appears
more like an afterthought, incomplete &amp; abandoned:</p>
<p><img src="https://sigwait.org/~alex/blog/2023/12/21/unicode-Superscript-and-Subscripts.svg" alt="Superscript and Subscripts"></p>
<p>If one peruses Unicode blocks, it's possible to assemble most of the
Latin characters in super/subscript variants to write a simple script
for character substitution. Then, any decent text editor could use
such a script to replace a chunk of text with a superscript version of
it.</p>
<p>I also thought of a slightly more interesting feature: selecting a
chunk of text with HTML tags (like <code>&lt;i&gt;y = x&lt;sup&gt;2a&lt;/sup&gt;&lt;/i&gt;</code>) in the
text editor to transform only <code>&lt;sup&gt;</code> or <code>&lt;sub&gt;</code> nodes.</p>
<p><video controls="" loop="" src="https://sigwait.org/~alex/blog/2023/12/21/supsub-xml.mp4" type="video/mp4"></video></p>
<h3>Script 1: supsub</h3>
<p>It reads its input from the stdin line-by-line:</p>
<pre><code>$ lsb_release -d | ./supsub sub
dₑₛ𞁞ᵣᵢₚₜᵢₒₙ:    fₑdₒᵣₐ ᵣₑₗₑₐₛₑ ₃₉ ₍ₜₕᵢᵣₜᵧ ₙᵢₙₑ₎
$ lsb_release -d | ./supsub sub | ./supsub invert
ᵈ𞀵ˢ𞀿ʳⁱ𞀾ᵗⁱ𞀼ⁿ:    ᶠ𞀵ᵈ𞀼ʳ𞀰 ʳ𞀵ˡ𞀵𞀰ˢ𞀵 ³⁹ ⁽ᵗʰⁱʳᵗʸ ⁿⁱⁿ𞀵⁾
$ lsb_release -d | ./supsub sub | ./supsub invert | ./supsub restore
dеsсrірtіоn:    fеdоrа rеlеаsе 39 (thіrty nіnе)
</code></pre>
<p>(Some mobile browsers have a lot of trouble rendering even Latin
superscript/subscript characters.)</p>
<pre><code>$ cat supsub
#!/usr/bin/env -S ruby --disable-gems

db = DATA.read.split(/\s+/).filter {|v| v}
$supa = db.map {|v| [v[0], v[1]] }.to_h
$sub = db.map {|v| [v[0], v[2]] }.to_h

mode = 'sup|sub|invert|restore'
abort "Usage: supsub #{mode} &lt; file.txt" unless ARGV[0] =~ /^#{mode}$/

def tr mode, chars
  case mode
  when 'sup'
    chars.map { |ch| $supa[ch.downcase] || ch }.join
  when 'sub'
    chars.map { |ch| $sub[ch.downcase] || ch }.join
  when 'invert'
    supa_v = $supa.invert
    sub_v = $sub.invert
    chars.map do |ch|
      if supa_v[ch]
        $sub[supa_v[ch]] || ch
      elsif sub_v[ch]
        $supa[sub_v[ch]] || ch
      else
        ch
      end
    end.join
  else # restore
    supa_v = $supa.invert
    sub_v = $sub.invert
    chars.map { |ch| supa_v[ch] || sub_v[ch] || ch}.join
  end
end

while (line = STDIN.gets)
  print tr(ARGV[0], line.chars)
end

__END__
0⁰₀ 1¹₁ 2²₂ 3³₃ 4⁴₄ 5⁵₅ 6⁶₆ 7⁷₇ 8⁸₈ 9⁹₉ +⁺₊ -⁻₋ =⁼₌ (⁽₍ )⁾₎
aᵃₐ bᵇb cᶜ𞁞 dᵈd eᵉₑ fᶠf gᵍg hʰₕ iⁱᵢ jʲⱼ kᵏₖ lˡₗ mᵐₘ nⁿₙ oᵒₒ
pᵖₚ q𐞥q rʳᵣ sˢₛ tᵗₜ uᵘᵤ vᵛᵥ wʷw xˣₓ yʸᵧ zᶻz
а𞀰ₐ б𞀱𞁒 в𞀲𞁓 г𞀳𞁔 ґґ𞁧 д𞀴𞁕 е𞀵ₑ ж𞀶𞁗 з𞀷𞁘 и𞀸𞁙 іⁱ𞁨 їїї ййй
к𞀹𞁚 л𞀺𞁛 м𞀻ₘ нᵸн о𞀼ₒ п𞀽𞁝 р𞀾ₚ с𞀿𞁞 т𞁀т у𞁁𞁟 ф𞁂𞁠 х𞁃𞁡 ц𞁄𞁢
ч𞁅𞁣 ш𞁆𞁤 щщщ ьꚝь ю𞁉ю яяя
</code></pre>
<h3>Script 2: supsub-xml, version 1</h3>
<pre><code>$ echo '&lt;sub&gt;test&lt;/sub&gt; &lt;i&gt;lol&lt;/i&gt; &lt;sup&gt;haha&lt;/sup&gt; but &lt;sup&gt;it works&lt;/sup&gt;!' | ./supsub-xml
ₜₑₛₜ &lt;i&gt;lol&lt;/i&gt; ʰᵃʰᵃ but ⁱᵗ ʷᵒʳᵏˢ!
</code></pre>
<p>With the help of 鋸, we parse the stdin as an XML fragment, &amp; replace
<code>&lt;sup&gt;</code> or <code>&lt;sub&gt;</code> nodes with the result of <code>supsub</code> script.</p>
<pre><code>#!/usr/bin/env ruby

require 'nokogiri'

cmd = ARGV[0] || File.join(__dir__, 'supsub')
doc = Nokogiri::HTML.fragment STDIN.read

doc.css('sup,sub').each do |node|
  IO.popen("#{cmd} #{node.name}", 'w+') do |t|
    t.write node.text
    t.close_write
    node.replace t.gets
  end
end

print doc.to_s
</code></pre>
<p>The script works, &amp; for the most practical purposes one may leave it
as is, but it has 1 issue I find quite barbaric: it forks the external
program each time it needs to transform a string.</p>
<h3>Script 2: supsub-xml, version 2</h3>
<p>No, we are not going to rewrite <code>supsub</code> as a library; we are going to
invoke it from <code>supsub-xml</code> as a <em>coprocess</em>.</p>
<p>What is a coprocess? Stevens, in his APUE, described it as a program
that runs alongside a parent process &amp; communicates with it via 2
one-way pipes.</p>
<p><img src="https://sigwait.org/~alex/blog/2023/12/21/coprocess.svg" alt="Driving a coprocess"></p>
<p>Think of it as a local microservice: you write to it using one pipe,
then read a responce using another pipe.</p>
<p>A simple (but contrived) example of a Ruby script that <em>drives</em>
another program is to use the tr(1) utility for upcasing a string:</p>
<pre><code>$ cat coprocess # broken, read below
#!/usr/bin/env ruby

parent_in, parent_out = IO.pipe
child_in, child_out = IO.pipe
spawn "tr '[a-z]' '[A-Z]'", in: child_in, out: parent_out
parent_out.close
child_in.close

child_out.puts "lol"
print parent_in.gets
child_out.puts "haha"
print parent_in.gets
</code></pre>
<p>If we run it, though, we won't get LOL and HAHA--the script would hang
indefinitely on the <code>print parent_in.gets</code> line. The reason for that
is the usage of the libc fwrite(2) by the tr utility (at least in the
coreutils version under Linux). fwrite(2) uses a libc stream that is
buffered by default. What happens is that tr(1) eats "lol" string and
prints "LOL" into its stdout, but our <code>coprocess</code> script doesn't see
the result, for "LOL" is stuck in a buffer.</p>
<p>We can fix the script by executing tr under the stdbuf(1) utility:</p>
<pre><code>spawn "stdbuf -i0 -o0 tr '[a-z]' '[A-Z]'", in: child_in, out: parent_out
</code></pre>
<p>then bytes should start moving freely through the pipes:</p>
<pre><code>$ ./coprocess
LOL
HAHA
</code></pre>
<p>Unfortunately, this fix won't work if the coprocess is a Ruby
script. Ruby has its own IO mechanism that stdbuf(1) cannot affect.</p>
<p>The general solution for this kind of problems is to trick a coprocess
in thinking it is connected to a (pseudo) terminal.</p>
<img alt="coprocess via pty" src="https://sigwait.org/~alex/blog/2023/12/21/supsub-xml.svg">

<p>Thankfully, Ruby has a nifty built-in
<a href="https://docs.ruby-lang.org/en/master/PTY.html">pty</a> extension that
abstracts away most of the communication with a pseudo terminal
device.</p>
<pre><code>$ cat supsub-xml
#!/usr/bin/env ruby

require 'nokogiri'
require 'pty'

cmd = ARGV[0] || File.join(__dir__, 'supsub')

class Coprocess
  def initialize cmd
    @master, slave = PTY.open
    read, @write = IO.pipe
    spawn cmd, in: read, out: slave
    read.close
    slave.close
  end

  def puts str; @write.puts str; end
  def gets; @master.gets; end
end

transforms = {
  "sup" =&gt; Coprocess.new("#{cmd} sup"),
  "sub" =&gt; Coprocess.new("#{cmd} sub")
}

doc = Nokogiri::HTML.fragment STDIN.read
doc.css('sup,sub').each do |node|
  tr = transforms[node.name]
  tr.puts node.text
  node.replace tr.gets.chomp
rescue
  warn "transforming `#{node.text}` failed: #{$!}"
end

print doc.to_s
</code></pre>
<p>If we add <code>sleep</code> to the very end of the script, than we may examine
how this arrangement works:</p>
<pre><code>$ $$
bash: 42396: command not found
$ echo '&lt;sub&gt;test&lt;/sub&gt; &lt;i&gt;lol&lt;/i&gt; &lt;sup&gt;haha&lt;/sup&gt;!' | ./supsub-xml
ₜₑₛₜ &lt;i&gt;lol&lt;/i&gt; ʰᵃʰᵃ!
</code></pre>
<p>While the script sleeps, from another terminal:</p>
<pre><code>$ pstree -p 42396 -al
bash,42396
  └─ruby,100032 ./supsub-xml
      ├─ruby,100033 --disable-gems ... sup
      └─ruby,100034 --disable-gems ... sub

$ file /proc/100032/fd/* # supsub-xml
...
$ file /proc/100033/fd/* # supsub sup
...
</code></pre>
]]></description>
  
</item>

<item>
  <title>Glyphs in popular monospaced fonts</title>
  <link>https://sigwait.org/~alex/blog/2023/11/30/sLTpEg.html</link>
  <guid>https://sigwait.org/~alex/blog/2023/11/30/sLTpEg.html</guid>
  <pubDate>Thu, 30 Nov 2023 17:30:07 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>Linus Torvalds famously uses an ancient <a href="https://git.kernel.org/pub/scm/editors/uemacs/uemacs.git/tree/">uemacs</a> editor that he
updated to support UTF-8 some years ago. This post is not about
Torvalds or his editor but about the file named <code>UTF-8-demo.txt</code> that
caught my attention while I was glancing through the above repo.</p>
<p>Chrome displays it almost correctly--except for the "formulas"
block. Judging from the shape of the glyphs, the default monospace
font I use in Fedora</p>
<pre><code>$ fc-match monospace
LiberationMono-Regular.ttf: "Liberation Mono" "Regular"
</code></pre>
<p>doesn't contain all the characters from <code>UTF-8-demo.txt</code>; hence,
Chrome does <em>font fallback</em>. In its devtools it reports:</p>
<pre><code>Liberation Mono     — Local file (5,438 glyphs)
DejaVu Sans         — Local file (1,179 glyphs)
Droid Sans Thai     — Local file (415 glyphs)
Droid Sans Ethiopic — Local file (320 glyphs)
Segoe UI Historic   — Local file (45 glyphs)
Noto Sans Math      — Local file (7 glyphs)
Droid Sans Fallback — Local file (5 glyphs)
Segoe UI Symbol     — Local file (1 glyph)
Noto Color Emoji    — Local file (1 glyph)
Times New Roman     — Local file (1 glyph)
</code></pre>
<p>(Your list would be, of course, completely different.)</p>
<p>Interestingly, xterm, gvim and gedit</p>
<p><img src="https://sigwait.org/~alex/blog/2023/11/30/UTF-8-demo.gedit.svg" alt="UTF-8-demo.txt in gedit"></p>
<p>... render the file better than Chrome (they don't mangle the
"formulas" block) and Emacs 29.1 not only fails the "formulas" test
but incorrectly aligns pseudo-graphics at the end of the file.</p>
<p>Anyhow, I became curious about how many installed monospace fonts I
have can render that file without <em>font substitution</em> (spoiler: none).</p>
<p>Is there a way to disable font fallback? We can provide a custom
fontconfig config via</p>
<pre><code>$ FONTCONFIG_FILE=~/tmp/lol.conf gedit
</code></pre>
<p>or convert <code>UTF-8-demo.txt</code> to the pdf format using a tool that, by
default, doesn't know about font substitution:</p>
<pre><code># input output font
txt2pdf() {
  awk '{print "    " $0}' &lt; "${1:-/dev/null}" | \
    pandoc --pdf-engine=xelatex \
    -V "monofont:${3:-Roboto Mono}" -V "mainfont:${3:-Roboto Mono}" \
    -V geometry:"top=1cm,left=1cm,bottom=1.5cm,right=1cm" \
    -t pdf -o "${2:-${1%.*}.pdf}"
}
</code></pre>
<p>(Yes, we trick pandoc into thinking the input is markdown; IRL you'd
probably want to add <code>-V papersize=a4</code> and <code>-V fontsize=12pt</code> options
to it.)</p>
<p>Then we can type:</p>
<pre><code>$ txt2pdf UTF-8-demo.txt lol.pdf 'Ubuntu Mono'
</code></pre>
<p>and examine <code>lol.pdf</code> to see that a typeface, created "to complement
the Ubuntu tone of voice", not only has "a contemporary style" but
also "conveys a precise, reliable and free attitude":</p>
<p><img src="https://sigwait.org/~alex/blog/2023/11/30/UTF-8-demo.ubuntu-mono.svg" alt="UTF-8-demo.txt in gedit"></p>
<p>How to do the same for all installed fixed-width fonts? First, we
obtain the list of such fonts:</p>
<pre><code>$ type fc.mono
fc.mono is aliased to `fc-list :mono family | awk -F, "{print \$1}" | sort -u'
$ fc.mono
Bitstream Vera Sans Mono
Courier 10 Pitch
Courier New
Cursor
DejaVu Sans Mono
Droid Sans Mono
Inconsolata
Liberation Mono
Ligconsolata
Material Icons
Material Icons Outlined
Material Icons Round
Material Icons Sharp
Material Icons Two Tone
Nimbus Mono PS
Noto Color Emoji
Source Code Pro
Terminus
Ubuntu Mono
</code></pre>
<p>(I have Roboto Mono v2 installed as well, but fontconfig doesn't
recognise it as <a href="https://github.com/google/fonts/issues/225">a monospace font</a>.)</p>
<p>then</p>
<pre><code>$ (IFS=$'\n'; for fn in `fc.mono`; do txt2pdf UTF-8-demo.txt "$fn".pdf "$fn"; done)
$ ls *pdf -1
'Bitstream Vera Sans Mono.pdf'
'Courier New.pdf'
'DejaVu Sans Mono.pdf'
'Droid Sans Mono.pdf'
Inconsolata.pdf
'Liberation Mono.pdf'
Ligconsolata.pdf
'Nimbus Mono PS.pdf'
'Source Code Pro.pdf'
'Ubuntu Mono.pdf'
</code></pre>
<p>In my case, not every <em>.txt→.pdf</em> conversion was successful, but among
those that succeeded, 'DejaVu Sans Mono.pdf' produced the best
result, glyph-wise.</p>
]]></description>
  
</item>

<item>
  <title>Share a PulseAudio sink with an Android phone</title>
  <link>https://sigwait.org/~alex/blog/2023/11/06/U9uAPG.html</link>
  <guid>https://sigwait.org/~alex/blog/2023/11/06/U9uAPG.html</guid>
  <pubDate>Mon, 06 Nov 2023 21:20:29 GMT</pubDate>
  
    <author>alexander.gromnitsky@gmail.com (ag)</author>
  
  
    <category>ойті</category>
  
  <description><![CDATA[<p>While attempting to find a pair of semi-decent USB-only speakers, I
became so frustrated that contemplated creating a mock simulator with
"greater dynamic range", "improved bass", &amp; even "lower distortion" to
reliably replicate the advertised "quality" of these fake USB
speakers. (Why fake? Because they all (?) use USB-A for <em>power</em> &amp;
require a 3.5mm jack input for sound.) Most speakers in the $0-$100
range often sound as if someone placed a phone in a metal bucket &amp;
started playing Quake 2 theme music using the phone's mighty speaker.</p>
<p>Here's a high precision simulator:</p>
<ol>
<li>Load <code>module-simple-protocol-tcp</code> module into a PulseAudio server.</li>
<li>The module can use any sink you prefer, typically the default one.</li>
<li>Initiate a regular playback using the selected sink (with mpv, vlc,
a web browser, whatever).</li>
<li>Connect to the server from an Android phone using <a href="https://play.google.com/store/apps/details?id=com.kaytat.simpleprotocolplayer">Simple Protocol
Player</a>.</li>
</ol>
<p>For an additional milieu, put the phone in a bucket. $100 saved,
"greater dynamic range" achieved.</p>
<p>The scheme is compatible with PipeWire too.</p>
<p>Coincidentally this can be used as a rustic spying technique: you can
listen to everything a machine with a running
<code>module-simple-protocol-tcp</code> module plays.</p>
<p>A <a href="https://sigwait.org/~alex/blog/2023/11/06/android-pa-share">script</a> (requires dialog(1) and jq(1)) that draws
a menu with available sinks and starts listening on a socket:</p>
<pre><code>#!/usr/bin/make -f

# the standard pulseaudio tcp port
port := 4713
answer := $(shell mktemp -u)

status:; pactl list | grep tcp -B1
stop:; -pactl unload-module module-simple-protocol-tcp

$(answer):
    pactl -f json list sinks \
     | jq -r '.[] | @sh "\(.index) \(.description)"' \
     | xargs dialog --keep-tite --menu "Local sink" 0 0 0 2&gt;$(answer)

start: $(answer) stop
    pactl load-module module-simple-protocol-tcp rate=44100 format=s16le channels=2 source=`cat $&lt;` record=true port=$(port)
    rm $&lt;

.DELETE_ON_ERROR:
</code></pre>
<p>Run it as</p>
<pre><code>$ android-pa-share start
</code></pre>
<p><img src="https://sigwait.org/~alex/blog/2023/11/06/android-pa-share.png" alt=""></p>
<p>Beware that <code>module-simple-protocol-tcp</code> module doesn't support
authentication, hence protect the port (4713 in the script above) from
the WAN.</p>
]]></description>
  
</item>


</channel>
</rss>