Alexander Gromnitsky's Blog

Visualising curl downloads

Air Date:
Latest update:

If downloading from a server is slow, how would you prove to a devops guy that you're experiencing a slowdown? There are a couple of possibilities, the worst of which would be sending a video. What if you send a speed graph using data from curl?

Every second curl prints to stderr the following:

fprintf(tool_stderr,
        "\r"
        "%-3s " /* percent downloaded */
        "%-3s " /* percent uploaded */
        "%s " /* Dled */
        "%s " /* Uled */
        "%5" CURL_FORMAT_CURL_OFF_T " " /* Xfers */
        "%5" CURL_FORMAT_CURL_OFF_T " " /* Live */
        " %s "  /* Total time */
        "%s "  /* Current time */
        "%s "  /* Time left */
        "%s "  /* Speed */
        "%5s" /* final newline */,
        …

Therefore, by replacing \r with \n we can send a download log:

$ curl http://example.com/1.zip -o 1.zip 2>&1 | tr \\r \\n

To draw a graph with gnuplot, we can use Current time and Speed columns. Gnuplot understands time as input data, but I don't know how to persuade it to interpret values like 100k or 200M, thus we need to convert them into 'bytes'. This is a cute little problem for code golf, but amusingly, it was already solved in coreutils > 12 years ago via numfmt(1).

$ echo 1M and 10M | numfmt --from iec --field 3
1M and 10485760

(macOS & FreeBSD both have coreutils package, where the utility executable is prefixed with 'g'.)

#!/usr/bin/env -S stdbuf -o0 bash

set -e -o pipefail
numfmt=`type -p gnumfmt numfmt;:`; test "${numfmt:?}"

cat <<E
set xdata time
set timefmt "%H:%M:%S"
set xlabel "Time, MM:SS or HH:MM:SS"
set format y "%.0s%cB"
set ylabel "Speed, Unit/second" offset -1,0
set grid
plot "-" using 1:2 with lines title ""
E

curl "$@" -fL -o /dev/null 2>&1 | tr \\r \\n | awk '
/[0-9.][kMGTP]?$/ {
  time = index($10, ":") == 0 ? $11 : $10
  if (time != "--:--:--") print time, $NF
}' | tr k K | $numfmt --from iec --field 2

Usage:

$ ./curlbench http://example.com/1.zip | gnuplot -p

Tags: ойті
Authors: ag

The cheapest NAS

Air Date:
Latest update:

I wanted to replace my old trusty 'router' (with an attached HDD)--that was not working as a router, but as a network drive after flashing OpenWRT onto it--I wanter to replace it with an SBC+HDD combo.

This new device should not only preserve all the services the old one provided (samba, git, rsyncd, a dnf repo), but also perform faster, for having a potato instead of a CPU, the ex-router struggled with rsync over ssh &, being gravely RAM limited, choked when I did 'git push' commits containing binaries > 15MB.

Searching for a suitable SBC led me to libre.computer, a company I had never heard of before. At first glance, they had the el cheapo AML-S805X-AC board I needed:

  • LAN port (but 100 Mb only);
  • 2 USB-A (but 2.0 only);
  • 4-core ARM Cortex-A53;
  • 1 GB RAM;
  • booting from an USB;
  • up-to-date Debian;
  • easy to buy without hunting it down.

100Mb may seem like a joke nowadays, but the main purpose of such a toy NAS for me is to keep a copy of a directory with ~200K small files. Having 1Gb would only marginally improve the syncing speed even if the SBC supported USB 3.0.

But this is just a board. I also needed an hdd enclosure with an external power supply (for the device provides up to 900mA for each USB-A), at least 3A power supply & a micro-USB cable that can handle 3A.

ItemPrice, €Comment
SBC20
HDD enclosure12
3A Power Supply5
Micro-USB cable3
4 bolts, 12 nuts0I think the ones I found are older than me
TTL to USB dongle3Optional, the board has an HDMI output
Total43

(I didn't include an HDD in the table, for I presume everyone has a couple of them lying around.)

When I bought the HDD enclosure, I didn't read the description carefully & thought it was going to be a small plastic container for 2.5-inch drives, but when the package arrived, it turned out to be a box for 3.5-inch ones. Hence, I decided to shove the SBC into it too.

After connecting the TTL-to-USB dongle to the board's GPIO

& typing

$ sudo screen /dev/ttyUSB0 115200

one of the 1st readouts appeared as:

U-Boot 2023.07+ (Nov 03 2023 - 15:10:36 -0400) Libre Computer AML-S805X-AC

Model: Libre Computer AML-S805X-AC
SoC:   Amlogic Meson GXL (S805X) Revision 21:d (34:2)
DRAM:  512 MiB (effective 1 GiB)

What does the last line mean exactly? After I dd'ed Debian-12 onto a flash drive, free(1) said it saw 1GB. Anyway, libre.computer has an official OS image, based on stock Debian:

$ fdisk debian-12-base-arm64+arm64.img -l
Disk debian-12-base-arm64+arm64.img: 2.25 GiB, 2415919104 bytes, 4718592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x71f3f7cf

Device                          Boot  Start     End Sectors  Size Id Type
debian-12-base-arm64+arm64.img1 *      2048  524287  522240  255M ef EFI (FAT-12/16/32)
debian-12-base-arm64+arm64.img2      524288 4718591 4194304    2G 83 Linux

Yes, it has an EFI partition with the MBR layout! The 2nd partition is btrfs (supposedly it's faster & more gentle to flash storage than ext4; no idea if both claims are true). You can examine its contents via:

$ sudo mount -o loop,offset=$((524288*512)) debian-12-base-arm64+arm64.img ~/mnt/misc

This partition gets auto-resized on the 1st boot to fill the rest of the free space available on the drive. Doing this on USB dongles proved to be a miserable experience: of the 3 I had available, one permanently got stuck on resizing, and another, despite finishing the operation, became so sluggish afterwards that a 20-year-old PC would've felt snappier.

This is I didn't like at all. There is no repo with from which the OS image gets generated. The explanation is bizarre:

"The distribution builder is a proprietary commercial offering as it involves a lot of customer IP and integrations so it cannot be public."

but with an consolation advice:

"If you want to study them [images], bootstrap and do a diff. We don't make any changes to the standard distros outside of setting a few configs since we're not distro maintainers."

Make of it what you will.

Then I connected the HDD enclosure to the board. This time, the process went much, much faster (though there were still some unexpected delays in random places). Right after logging in, I started getting uas_eh_abort_handler errors from the kernel. It turns out I got one of the worst HDD enclosure innards possible, if you believe reviews from the interwebs:

$ lsusb
Bus 001 Device 002: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. JMS578 SATA 6Gb/s
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

The remedy is to turn UAS off via adding usb-storage.quirks=152d:0578:u to the kernel cmdline. It did help, the delays went away, although 'benchmarks' became hardly thrilling:

$ lsusb -t
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/2p, 480M
    |__ Port 2: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 480M
$ sync; time sh -c "dd if=/dev/urandom of=1 bs=500k count=1k && sync"; rm 1
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 15.1014 s, 34.7 MB/s

real    0m21.876s
user    0m0.001s
sys     0m7.976s

which means 52421.876=23.95 MB/s on an ext4 partition.

Would I recommend this setup? I wouldn't. One of the reasons I've chosen the path with an SBC instead of a common micro-ITX route is to save on power consumption. If you don't have similar problems, I see 0 reasons to struggle with such a finicky Chinese device.


Tags: ойті
Authors: ag

Of flags and keyletters

Air Date:
Latest update:

Date: Fri, 1 Mar 2024 10:49:42 -0500
From: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Newsgroups: gmane.org.unix-heritage.general
Subject: Of flags and keyletters
Message-ID: <CAKH6PiV3ixuwoZ-d31JNXpQpHxAAcfpRKreUcn11msW1yjboLg@mail.gmail.com>

> why did AT&T refer to "flags" as "keyletters" in its SysV documentation?

Bureaucracies beget bureaucratese--polysyllabic obfuscation, witness
APPLICATION USAGE in place of BUGS.

One might argue that replacing "flag" by "option", thus doubling the number
of syllables, was a small step in that direction. In fact it was a
deliberate attempt to discard jargon in favor of normal English usage.

Tags: quote, ойті
Authors: ag

An HTTP client in Bash

Air Date:
Latest update:

I recently saw a tweet where a guy was asking how to download curl within a minimal Debian container that didn't have any scripting language installed except for Bash, no wget, or anything like that.

If such a container has apt-get, but you lack permission to run it, there is a reliable way to force apt-get to download a .deb file with all its dependencies under a regular user, but we won't discuss that here.

I got curious about how hard it would be to write a primitive HTTP get-only client in Bash, as Bash is typically compiled with "network" redirection support:

$ exec 3<> /dev/tcp/www.gnu.org/80
$ printf "%s\r\n" 'HEAD /robots.txt HTTP/1.1' >&3
$ printf "%s\r\n\r\n" 'Host: www.gnu.org' >&3
$ cat <&3
HTTP/1.1 200 OK
Date: Sun, 11 Feb 2024 07:02:40 GMT
Server: Apache/2.4.29
Content-Type: text/plain
Content-Language: non-html
…

This could've been useful before the days of TLS everywhere, but it won't suffice now: to download a statically compiled curl binary from Github, we need TLS support and proper handling of 302 redirections. Certainly, it's possible to cheat: put the binary on our web server and serve it under plain HTTP, but that would be too easy.

What if we use ncat+openssl as a forward TLS proxy? ncat may serve as an initd-like super-server, invoking "openssl s_client" on each connection:

$ cat proxy.sh
#!/bin/sh
read -r host
openssl s_client -quiet -no_ign_eof -verify_return_error "$host"
$ ncat -vk -l 10.10.10.10 1234 -e proxy.sh

The 1st thing we need in the bash-http-get client is URL parsing. It wouldn't have been necessary if Github served files directly from "Releases" pages, but it does so through redirects. Therefore, when we grab Location header from a response, we need to disentangle its hostname from a pathname.

Ideally, it should work like URL() constructor in JavaScript:

$ node -pe 'new URL("https://q.example.com:8080/foo?q=1&w=2#lol")'
URL {
  href: 'https://q.example.com:8080/foo?q=1&w=2#lol',
  origin: 'https://q.example.com:8080',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'q.example.com:8080',
  hostname: 'q.example.com',
  port: '8080',
  pathname: '/foo',
  search: '?q=1&w=2',
  searchParams: URLSearchParams { 'q' => '1', 'w' => '2' },
  hash: '#lol'
}

StackOverflow has various examples of how to achieve that using regular expressions, but none of them were able to parse the example above. I tried asking ChatGPT to repair the regex, but it only made it worse. Miraculously, Google's Gemini supposedly fixed the regex on the second try (I haven't tested it extensively).

$ cat lib.bash
declare -A URL

url_parse() {
    local pattern='^(([^:/?#]+):)?(//((([^:/?#]+)@)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))?(\?([^#]*))?(#(.*))?'
    [[ "$1" =~ $pattern ]] && [ "${BASH_REMATCH[2]}" ] && [ "${BASH_REMATCH[4]}" ] || return 1
    URL=(
        [proto]=${BASH_REMATCH[2]}
        [host]=${BASH_REMATCH[4]}
        [hostname]=${BASH_REMATCH[7]}
        [port]=${BASH_REMATCH[9]}
        [pathname]=${BASH_REMATCH[10]:-/}
        [search]=${BASH_REMATCH[12]}
        [hash]=${BASH_REMATCH[14]}
    )
}

Next, we need to separate headers from a response body. This means looking for the 1st occurrence of \r\n\r\n. Sounds easy,

grep -aobx $'\r' file | head -1

until you decide to port the client to a BusyBox-based system like Alpine Linux. The latter has grep that doesn't support -ab options. There are some advices on employing od(1), but no examples. If we print a file using a 2-column format:

0000000 68
0000001 20
0000002 3a
…

where the left column is a decimal offset, we can convert the 1st 32KB of the response into a single line and search for the pattern using grep -o:

od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
    grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
    awk '{if (NR==1) print $7+0}'

Here's the full version of the client that supports only URLs with the https protocol. It saves the response in a temporary file and looks for the \r\n\r\n offset. If the HTTP status code was 200, it prints the body to stdout. If it was 302, it extracts the value of the Location header and recursively calls itself with a new URL.

#!/usr/bin/env bash

set -e -o pipefail
. "$(dirname "$(readlink -f "$0")")/lib.bash"

tmp=`mktemp fetch.XXXXXX`
trap 'rm -f $tmp' 0 1 2 15
eh() { echo "$*" 1>&2; exit 2; }

[ $# = 3 ] || eh Usage: fetch.bash proxy_host proxy_port url
proxy_host=$1
proxy_port=$2
url=$3

get() {
    url_parse "$1"; [ "${URL[proto]}" = https ] || return 1

    exec 3<> "/dev/tcp/$proxy_host/$proxy_port" || return 1
    echo "${URL[hostname]}:${URL[port]:-443}" >&3
    printf "GET %s HTTP/1.1\r\n" "${URL[pathname]}${URL[search]}${URL[hash]}" >&3
    printf '%s: %s\r\n' Host "${URL[hostname]}" Connection close >&3
    printf '\r\n' >&3
    cat <&3
}

get "$url" > "$tmp" || eh ':('
[ -s "$tmp" ] || eh 'Empty reply, TLS error?'

offset_calc() {
    if echo 1 | grep -aobx 1 >/dev/null 2>&1; then # gnu-like grep
        grep -aobx $'\r' "$tmp" | head -1 | tr -d '\r\n:' | \
            xargs -r expr 1 +
    else                                      # busybox?
        od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
            grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
            awk '{if (NR==1) print $7+0}'
    fi || echo -1
}
offset=`offset_calc`
headers() { head -c "$offset" "$tmp" | tr -d '\r'; }
hdr() { headers | grep -m1 -i "^$1:" | cut -d' ' -f2; }

status=`head -1 "$tmp" | cut -d' ' -f2`
case "$status" in
    200) [ "$offset" = -1 ] && offset=-2 # invalid responce, dump all
         tail -c+$((offset + 2)) "$tmp"
         [ "$offset" -gt 0 ] ;;
    302) headers 1>&2; echo 1>&2
         hdr location | xargs "$0" "$1" "$2" ;;
    *)   headers 1>&2; exit 1
esac

It should work even on Alpine Linux of FreeBSD:

$ ./fetch.bash 10.10.10.10 1234 https://github.com/stunnel/static-curl/releases/download/8.6.0/curl-linux-arm64-8.6.0.tar.xz > curl.tar.xz
HTTP/1.1 302 Found
Location: https://objects.githubusercontent.com/…
…
$ file curl.tar.xz
curl.tar.xz: XZ compressed data, checksum CRC64

Tags: ойті
Authors: ag