BOM & exec
Latest update:
Recently, I’ve stumbled
upon
a post about an accidental BOM in a shell script file. tl;dr
for those who don’t read Ukrainian:
- A guy had a typical shell script that got corrupted by some Windows
editor by prefixing the first line of the file (the shebang line)
with the BOM.
- The shell was trying to execute the script.
- Everybody got upset.
I got curious why bash tries to run scripts w/ BOM in the first
place. I’ve looked into the latest bash-4.3 & tcsh-6.19.00 on Fedora
24. Everywhere in the text below we draw the BOM w/ the replacement
character (codepoint U+FFFD): �.
Some findings:
- I was wrong about the bloody shebang lines for I thought that no
shell ever reads them.
- bash & tcsh don’t use libc properly & both invent their own
rigmarole instead of using the provided routine.
- bash is a mess! (Which is hardly a discovery.)
With shebang
If a file contains a valid shebang line, everything is easy: when
you pass the file name to any of execv
, execve
, execlp
,
etc. functions, the kernel steps in, reads the shebang line and
executes the interpreter, that was mentioned in the shebang, with the
file in question as its argument.
This picture falls to pieces, when the file contains the petty BOM,
for the kernel fails to recognize that �#!/omg/lol
should be (in our
naïve mind) an equivalent to #!/omg/lol
.
Both tcsh & bash have a backup plan for systems w/o the shebang
support in the kernel. Besides the obvious win32 candidate, tcsh lists
2 other systems: os390 & bs2000 (I wonder who on earth still have
them). bash uses autoconf & therefore doesn’t have a pre hard-coded
build configuration set. Unfortunately, I believe the autoconf test
for the shebang line support is bogus:
$ cat ac_sys_interpreter
#! /bin/cat
exit 69
Presumably, the thinking was: if you run it on any modern system, the
kernel will run /bin/cat ac_sys_interpreter
which will just print
the file, but on prehistoric time-sharing machines a simple-minded
/bin/sh will execute it as a shell script & then you can test if the
exit code ==
69. (For why it would do so–read the next section.) The trouble is,
that the old system may very well have /bin/sh that does its own
shebang processing in case kernel doesn’t, alas rendering the test
useless, & henceforth compiling bash w/o shebang support.
Without shebang
As long as the kernel flops at the invalid first line, the whole
commotion becomes the case of a file w/o the shebang.
This is how we were all taught about interpreter files back in the
day:
“the shell reads the command and tries to execlp
the
filename. Because the shell script is an executable file but isn’t a
machine executable, an error is returned and execlp
assumes that
the file is a shell script (which it is). Then /bin/sh
is executed
with the pathname of the shell script as its argument.”
(from APUE, the 3rd ed)
E.g. suppose we have
$ cat demo2.sh
echo Діти, це їжачок!
ps -p $$ # print the shell the script is running under
If we run it, the shell
- checks if the script has executable bits (suppose it has)
- tries to exec the file
- which fails with
ENOEXEC
, for it’s not a ELF
- [a tcsh/bash dance]
- exec again but this time it’s
/bin/sh
with demo2.sh
as an argument
The last item is important & may be not quite apparent, for if you
have a csh-script
$ cp demo2.sh demo2.csh
you may expect that tcsh will not run it as sh-one:
$ tcsh -f
> ./demo2.csh
Діти, це їжачок!
PID TTY TIME CMD
102213 pts/21 00:00:00 sh
which is false, for tcsh follows the standards here.
Expectations vs. reality
APUE says a shell is ought to use execlp
that in turn is supposed to
do all the dirty work for us. As it
happens
execlp does exactly that,
at least in Linux glibc. Of course, both bash/tcsh ignore the advice &
use their own scheme.
tcsh does a plain execv
then, after failure, peeks into the first 2
bytes to see (w/ the help of iswprint(3)
) if they are
“printable”. Here, if tcsh (a) finds the file “acceptable” & (b) tries
to run the script with the shebang line in it on a system w/o kernel
support for such a line, it processes that line by itself.
If we poison our script with the BOM:
$ uconv --add-signature demo2.sh > demo2.bom.sh
$ chmod +x !$
$ head -c 37 !$ | hexdump -c
0000000 357 273 277 e c h o 320 224 321 226 321 202 320 270
0000010 , 321 206 320 265 321 227 320 266 320 260 321 207 320
0000020 276 320 272 ! \n
0000025
tcsh doesn’t try to re-execv
& aborts:
> ./demo2.bom.sh
./demo2.bom.sh: Exec format error. Wrong Architecture.
bash, on the other hand, tries to be more clever, failing
spectacularly. After execve
it goes into a journey of figuring out
why the exec has failed. It:
-
opens the file & analyses the shebang line! In the example above
we didn’t have one, but if we did, bash would have produced a
message:
$ cat demo3.invalid.awk
#!/usr/bin/awwwwwwwk -f
BEGIN { print "this is awk" }
$ ./demo3.invalid.awk
sh: ./demo3.invalid.awk: /usr/bin/awwwwwwwk: bad interpreter: No such file or directory
tcsh won’t do anything like that & will print
./demo3.invalid.awk: Command not found.
.
-
checks if the file has an ELF header & tries to find out what is
wrong w/ it;
-
reports the “success” of the execution, if the file has the length
of 0.
-
checks if the file is “binary”. I use quotes here, for this is an
example of how the good intentions don’t always turn into
reality. Instead of a simple 2 bytes check, like it’s done in tcsh,
bash reads 80 bytes & calls a certain check_binary_file()
function that is a good example of why you should not blindly
trust the comments in the code:
/* Return non-zero if the characters from SAMPLE are not all valid
characters to be found in the first line of a shell script. We
check up to the first newline, or SAMPLE_LEN, whichever comes first.
All of the characters must be printable or whitespace. */
int
check_binary_file (sample, sample_len)
char *sample;
int sample_len;
{
register int i;
unsigned char c;
for (i = 0; i < sample_len; i++)
{
c = sample[i];
if (c == '\n')
return (0);
if (c == '\0')
return (1);
}
return (0);
}
Despite of the resolution for all of the characters must be
printable or whitespace, the function returns 1
only in case
when sample
contains the NULL
character. Our BOM-example
doesn’t have one, thus the script runs, albeit with a somewhat
cryptic error if you have no idea about the existence of the BOM in
the file:
$ ./demo2.bom.sh
./demo2.bom.sh: line 1: �echo: command not found
PID TTY TIME CMD
115569 pts/26 00:00:00 sh
What if we do have the NULL
character?
$ hexdump -c demo4.null.sh
0000000 e c h o \0 \n e c h o 320 224 321 226
0000010 321 202 320 270 , 321 206 320 265 321 227 320 266 320
0000020 260 321 207 320 276 320 272 ! \n p s - p $
0000030 $ \n
0000032
Here NULL
is an argument to echo
command, which should be
totally legal, but not w/ bash!
$ ./demo4.null.sh
sh: ./demo4.null.sh: cannot execute binary file: Exec format error
Which of course wouldn’t be an issue had the file had the shebang
line.
-
If bash finds the file “acceptable” on a system w/o kernel support
for the shebang line when the file indeed contains one, it does the
same thing tcsh does: tries to process it by itself.
Conclusion
The most popular shells are too bloated, bizarre & have many
undocumented features.
Some hints:
- The shebang line isn’t necessary if you target
/bin/sh
, but the
shell does less work if you provide it.
- To view BOMs, use less(1) or hexdump(1).
- To test for the BOM, use file(1).
- To remove the BOM manually, use
M-x find-file-literally
in Emacs.
Tags: ойті
Authors: ag