Alexander Gromnitsky's Blog

An HTTP client in Bash

Latest update:

I recently saw a tweet where a guy was asking how to download curl within a minimal Debian container that didn't have any scripting language installed except for Bash, no wget, or anything like that.

If such a container has apt-get, but you lack permission to run it, there is a reliable way to force apt-get to download a .deb file with all its dependencies under a regular user, but we won't discuss that here.

I got curious about how hard it would be to write a primitive HTTP get-only client in Bash, as Bash is typically compiled with "network" redirection support:

$ exec 3<> /dev/tcp/www.gnu.org/80
$ printf "%s\r\n" 'HEAD /robots.txt HTTP/1.1' >&3
$ printf "%s\r\n\r\n" 'Host: www.gnu.org' >&3
$ cat <&3
HTTP/1.1 200 OK
Date: Sun, 11 Feb 2024 07:02:40 GMT
Server: Apache/2.4.29
Content-Type: text/plain
Content-Language: non-html
…

This could've been useful before the days of TLS everywhere, but it won't suffice now: to download a statically compiled curl binary from Github, we need TLS support and proper handling of 302 redirections. Certainly, it's possible to cheat: put the binary on our web server and serve it under plain HTTP, but that would be too easy.

What if we use ncat+openssl as a forward TLS proxy? ncat may serve as an initd-like super-server, invoking "openssl s_client" on each connection:

$ cat proxy.sh
#!/bin/sh
read -r host
openssl s_client -quiet -no_ign_eof -verify_return_error "$host"
$ ncat -vk -l 10.10.10.10 1234 -e proxy.sh

The 1st thing we need in the bash-http-get client is URL parsing. It wouldn't have been necessary if Github served files directly from "Releases" pages, but it does so through redirects. Therefore, when we grab Location header from a response, we need to disentangle its hostname from a pathname.

Ideally, it should work like URL() constructor in JavaScript:

$ node -pe 'new URL("https://q.example.com:8080/foo?q=1&w=2#lol")'
URL {
  href: 'https://q.example.com:8080/foo?q=1&w=2#lol',
  origin: 'https://q.example.com:8080',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'q.example.com:8080',
  hostname: 'q.example.com',
  port: '8080',
  pathname: '/foo',
  search: '?q=1&w=2',
  searchParams: URLSearchParams { 'q' => '1', 'w' => '2' },
  hash: '#lol'
}

StackOverflow has various examples of how to achieve that using regular expressions, but none of them were able to parse the example above. I tried asking ChatGPT to repair the regex, but it only made it worse. Miraculously, Google's Gemini supposedly fixed the regex on the second try (I haven't tested it extensively).

$ cat lib.bash
declare -A URL

url_parse() {
    local pattern='^(([^:/?#]+):)?(//((([^:/?#]+)@)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))?(\?([^#]*))?(#(.*))?'
    [[ "$1" =~ $pattern ]] && [ "${BASH_REMATCH[2]}" ] && [ "${BASH_REMATCH[4]}" ] || return 1
    URL=(
        [proto]=${BASH_REMATCH[2]}
        [host]=${BASH_REMATCH[4]}
        [hostname]=${BASH_REMATCH[7]}
        [port]=${BASH_REMATCH[9]}
        [pathname]=${BASH_REMATCH[10]:-/}
        [search]=${BASH_REMATCH[12]}
        [hash]=${BASH_REMATCH[14]}
    )
}

Next, we need to separate headers from a response body. This means looking for the 1st occurrence of \r\n\r\n. Sounds easy,

grep -aobx $'\r' file | head -1

until you decide to port the client to a BusyBox-based system like Alpine Linux. The latter has grep that doesn't support -ab options. There are some advices on employing od(1), but no examples. If we print a file using a 2-column format:

0000000 68
0000001 20
0000002 3a
…

where the left column is a decimal offset, we can convert the 1st 32KB of the response into a single line and search for the pattern using grep -o:

od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
    grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
    awk '{if (NR==1) print $7+0}'

Here's the full version of the client that supports only URLs with the https protocol. It saves the response in a temporary file and looks for the \r\n\r\n offset. If the HTTP status code was 200, it prints the body to stdout. If it was 302, it extracts the value of the Location header and recursively calls itself with a new URL.

#!/usr/bin/env bash

set -e -o pipefail
. "$(dirname "$(readlink -f "$0")")/lib.bash"

tmp=`mktemp fetch.XXXXXX`
trap 'rm -f $tmp' 0 1 2 15
eh() { echo "$*" 1>&2; exit 2; }

[ $# = 3 ] || eh Usage: fetch.bash proxy_host proxy_port url
proxy_host=$1
proxy_port=$2
url=$3

get() {
    url_parse "$1"; [ "${URL[proto]}" = https ] || return 1

    exec 3<> "/dev/tcp/$proxy_host/$proxy_port" || return 1
    echo "${URL[hostname]}:${URL[port]:-443}" >&3
    printf "GET %s HTTP/1.1\r\n" "${URL[pathname]}${URL[search]}${URL[hash]}" >&3
    printf '%s: %s\r\n' Host "${URL[hostname]}" Connection close >&3
    printf '\r\n' >&3
    cat <&3
}

get "$url" > "$tmp" || eh ':('
[ -s "$tmp" ] || eh 'Empty reply, TLS error?'

offset_calc() {
    if echo 1 | grep -aobx 1 >/dev/null 2>&1; then # gnu-like grep
        grep -aobx $'\r' "$tmp" | head -1 | tr -d '\r\n:' | \
            xargs -r expr 1 +
    else                                      # busybox?
        od -N $((32*1024)) -t x1 -Ad -w1 -v "$tmp" | tr '\n' ' ' | \
            grep -o '....... 0d ....... 0a ....... 0d ....... 0a' | \
            awk '{if (NR==1) print $7+0}'
    fi || echo -1
}
offset=`offset_calc`
headers() { head -c "$offset" "$tmp" | tr -d '\r'; }
hdr() { headers | grep -m1 -i "^$1:" | cut -d' ' -f2; }

status=`head -1 "$tmp" | cut -d' ' -f2`
case "$status" in
    200) [ "$offset" = -1 ] && offset=-2 # invalid responce, dump all
         tail -c+$((offset + 2)) "$tmp"
         [ "$offset" -gt 0 ] ;;
    302) headers 1>&2; echo 1>&2
         hdr location | xargs "$0" "$1" "$2" ;;
    *)   headers 1>&2; exit 1
esac

It should work even on Alpine Linux of FreeBSD:

$ ./fetch.bash 10.10.10.10 1234 https://github.com/stunnel/static-curl/releases/download/8.6.0/curl-linux-arm64-8.6.0.tar.xz > curl.tar.xz
HTTP/1.1 302 Found
Location: https://objects.githubusercontent.com/…
…
$ file curl.tar.xz
curl.tar.xz: XZ compressed data, checksum CRC64

Tags: ойті
Authors: ag