Alexander Gromnitsky's Blog

which(1)

Latest update:

After reading about the storm in a teacup regarding the which(1) utility in Debian, I decided to engage in a code golf challenge with myself. I aimed to write a few minimalist implementations of which(1) in different programming languages. Initially, I believed a shell script would be the cleanest solution, but that prediction proved to be incorrect.

The spec:

  1. The util should stop immidiately after the first non-existing executable, e.g.

     $ ./my-which ls BOGUS cat
     /usr/bin/ls
     ./my-which: BOGUS not found in PATH
    
  2. It should report an error to stderr & return with the code > 1 in case of the error.

The programs below are sorted by terseness.

GNU Make

The Make's manual contains a neat example of pathsearch function that abuses the internal wildcard function in a macro. We can use it with a 'match-anything' target:

#!/usr/bin/make -f
f = $(firstword $(wildcard $(addsuffix /$1,$(subst :, ,$(PATH)))))
%:;@echo $(or $(call f,$@),$(error $@ not found in PATH))

It works like this:

$ ./which.mk ls BOGUS cat
/usr/bin/ls
which.mk:3: *** BOGUS not found in PATH.  Stop.
$ echo $?
2

That's it. 2 lines + a shebang. If you're unfamiliar with the Make language, I advise you to try it.

Ruby

A slightly bigger example that still fits in several lines:

#!/usr/bin/env ruby
def f e; (ENV['PATH'] || '').split(?:).map{|d| d+'/'+e}.filter{|p| File.executable?(p)}[0]; end
ARGV.each {|e| puts(f(e) || abort("#{$0}: #{e} not found in PATH")) }

We cheated here a little: there's no check if a file is a directory. Nothing stops you from adding File.directory?(p) but that increases the length of such a toy program by 18 bytes!

sh

I thought it would be shorter:

#!/bin/sh

IFS=:
f() {
    for e in $PATH; do
        [ -x "$e/$1" ] && { echo "$e/$1"; return; }
    done
    return 1
}

for d in "$@"; do
    f "$d" || { echo "$0: $d not found in PATH" 1>&2; exit 1; }
done

If you decide to use f() in your scripts, a cut-&-paste won't do: you'll need to save & restore the value of IFS variable & mark e as the local one.

node: callbacks

Async IO doesn't always make life easier.
-- A philosopher
#!/usr/bin/env node
let fs = require('fs')

let f = (ok, error) => {
    let dirs = (process.env.PATH || '').split(':')
    return function dive(e) {
        let dir = dirs.shift() || error(e); if (!dir) return
        let file = dir+'/'+e
        fs.access(file, fs.constants.X_OK, err => err ? dive(e) : ok(file))
    }
}

let args = process.argv.slice(2)
let main = exe => exe && f( e => (console.log(e), main(args.shift())), e => {
    console.error(`${process.argv[1]}: ${e} not found in PATH`)
    process.exitCode = 1
})(exe)

main(args.shift())

Again, no checks whether a file is a directory.

We could've avoided callbacks, of course--node has fs.accessSync(), but it throws an exception. Also, just to make this slightly more challenging, I decided to avoid process.exit().

node: FP runs amok

sassa_nf didn't like the example above, mainly because of Array.prototype.shift(), & provided an enhanced version:

#!/usr/bin/env node
const fs = require('fs')
const dirs = (process.env.PATH || '').split(':')

const f = (e, cont) => dirs.map(d => d + '/' + e)
      .reduce((p, d) => g => p(f => f ? g(f):
                               fs.access(d, fs.constants.X_OK, err => g(!err && d))),
              f => f())(f => f ? (console.log(f), cont()):
                        (console.error(`${process.argv[1]}: ${e} not found in PATH`), process.exitCode = 1))

process.argv.slice(2).reduce((p, c) => g => p(_ => f(c, g)), f => f())(_ => _)

To understand how it works, you'll need to reformat the arrow function expressions. Nevertheless, I think it serves an artistic purpose as is.

Node, async/await

Certainly, callbacks were an unfortunate chain of events. Thankfully, we have promises for a long time now.

#!/usr/bin/env node

let {access} = require('fs/promises')

let afilter = async (arr, predicate) => {
    return (await Promise.allSettled(arr.map(predicate)))
        .filter( v => v.status === 'fulfilled').map( v => v.value)
}

let f = e => afilter((process.env.PATH || '').split(':'), async p => {
    await access(p+'/'+e, 1)
    return p+'/'+e
})

async function main() {
    let args = process.argv.slice(2).map( async p => {
        return {exe: p, location: await f(p)}
    })

    for await (let r of args) {
        if (!r.location.length) {
            console.error(`${process.argv[1]}: ${r.exe} not found in PATH`)
            process.exitCode = 1
            break
        }
        console.log(r.location[0])
    }
}

main()

This was tested with node v17.0.1.

I leave it up to you to judge which one of the node variants is more idiotic.

C

It was impossible to leave it out. It's the longest one, but I consider all the node examples much worse.

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <err.h>
#include <string.h>
#include <limits.h>
#include <stdbool.h>

bool is_exe(const char *name) {
  struct stat s;
  if (stat(name, &s)) return false;
  return (s.st_mode & S_IFMT) == S_IFREG && (s.st_mode & S_IXUSR);
}

bool exe(const char *dir, const char *e, void *result) {
  char *r = (char*)result;
  snprintf(r, PATH_MAX, "%s/%s", dir, e);
  if (!is_exe(r)) {
    r[0] = '\0';
    return false;
  }
  return true;
}

void f(const char *e, bool (*callback)(const char *, const char *, void *), void *result) {
  char *path = strdup(getenv("PATH") ? getenv("PATH") : "");
  char *PATH = path;
  char *dir, *saveptr;
  while ( (dir = strtok_r((char*)PATH, ":", &saveptr))) {
    PATH = NULL;
    if (callback(dir, e, result)) break;
  }
  free(path);
}

int main(int argc, char **argv) {
  for (int idx = 1; idx < argc; idx++) {
    char e[PATH_MAX];
    f(argv[idx], exe, e);
    strlen(e) ? (void)printf("%s\n", e) : errx(1, "%s not found in PATH", argv[idx]);
  }
}

Coincidently, this version is the most correct one: it won't confuse a directory with an executable.


Tags: ойті
Authors: ag