'indent-region' in the Emacs batch mode
Latest update:
I had an old project of mine that was using an archaic tab-based
indentation w/ the assumed tab width of 4. The mere opening of its
files in editors other than Emacs was causing such pain that I decided to do what you should rarely do--reindent the whole project.
(A side note: if you're setting the tab width to a value other than 8
& simultaneously is using tabs for indentation, you're a bad person.)
The project had ~40 files. Manually using Emacs indent-region
for each file would have been too wearisome. Then I remembered that Emacs has the batch mode.
A quick googling gave me a recipe akin to:
$ emacs -Q -batch FILE --eval '(indent-region (point-min) (point-max))' \
-f save-buffer
It would have worked but it had several drawbacks as a general
solution, for it:
- modifies a file in-place;
- can't read from the stdin;
- doesn't work w/ multiple files forcing a user to use xargs(1) of
smthg;
- if FILE doesn't exist, Emacs quietly creates an empty file,
whereas a user probably expects to see an error message.
Whereas an ideal little script should:
- never overwrite the source file;
- read the input from the stdin or from its arguments;
- spit out the result to the stdout;
- handle the missing input reasonably well.
Everything else should be accommodated by the shell, including the
item #2 from the 'drawbacks' list above.
Using Emacs for such a task is tempting, for it supports a big number
of programming modes, giving us the ability to employ the editor as a
universal reindenting tool, for example, in makefiles or Git hooks.
Writing to the stdout
We can slightly modify the above recipe to:
$ emacs -Q -batch FILE --eval '(indent-region (point-min) (point-max))' \
--eval '(princ (buffer-string))'
In -f stead we are using --eval twice. buffer-string
function does exactly what it says: returns the contents of the
current buffer as a string.
$ cat 1.html
<p>
hey, <i>
what's up</i>?
</p>
$ emacs -Q -batch 1.html --eval '(indent-region (point-min) (point-max))' \
--eval '(princ (buffer-string))'
Indenting region...
Indenting region...done
<p>
hey, <i>
what's up</i>?
</p>
The "Indenting region" lines come from Emacs message function
(which the progress reporter uses). In the batch mode message
prints the lines to the stderr.
The solution also address the item #3 from the "drawbacks" list--Emacs
doesn't create an empty file on the invalid input, although it doesn't
indicate the error properly, i.e., it fails w/ the exit code 0.
Processing all files at once
If you know what are you doing & the files you're going to process are
under Git, overwriting the source it not a big deal. Perhaps for a
quick hack this script will do:
:; exec emacs -Q --script "$0" -- "$@" # -*- emacs-lisp -*-
(defun indent(file)
(set-buffer "*scratch*")
(if (not (file-directory-p file))
(when (and (file-exists-p file) (file-writable-p file))
(message "Indenting `%s`" file)
(find-file file)
(indent-region (point-min) (point-max))
(save-buffer))))
(setq args (cdr argv)) ; rm --
(dolist (val args)
(indent val))
If you put the src above into a file named emacs-indent-overwrite
& add executable bits to it, then the shell thinks it's a usual
sh-script that doesn't have a shebang line. A colon in
sh is a noop, but on stumbling upon exec the shell replaces itself
with the command
emacs -Q --script emacs-indent-overwrite -- arg1 arg2 ...
When Emacs reads the script, it doesn't expect it to be a sh one, but
hopefully the file masks itself as a true Elisp, for in Elisp a
symbol whose name starts with a colon is called a keyword symbol.
: is a symbol w/o a name (a more usual constant would be
:omg) that passes the check because it satisfies keywordp
predicate:
ELISP> (keywordp :omg)
t
ELISP> (keywordp :)
t
Everything else in the file is usual Elisp code. Here's how it works:
$ emacs-indent-overwrite src/*
Indenting ‘src/1.html‘
Indenting region...
Indenting region...done
Indenting ‘src/2.html‘
Indenting region...
Indenting region...done
The only thing worth mentioning about the script is why indent
procedure has (set-buffer "*scratch*") call. It's an easy way to
switch the current directory to the directory from which the script
was started. This is required, for find-file modifies the default
directory of the current buffer (via modifying a buffer-local
default-directory variable). The other way is to modify args
w/ smthg like
(setq args (mapcar 'expand-file-name args))
& just pass the file argument as a full path.
Reading from the stdin
There is next to none info about the stdin in the Emacs Elisp
manual. The section about minibuffers hints us that for reading from
the standard input in the batch mode we ought to seek out for
functions that normally read kbd input from the minibuffer.
One of such functions is read-from-minibuffer.
$ cat read-from-minibuffer
:; exec emacs -Q --script "$0"
(message "-%s-" (read-from-minibuffer ""))
$ printf hello | ./read-from-minibuffer
-hello-
$ printf 'hello\nworld\n' | ./read-from-minibuffer
-hello-
The function only read the input up to the 1st newline which it also
impertinently ate. This leaves us w/ a dilemma: if we read multiple
lines in a loop, should we unequivocally assume that the last line
contained the newline?
read-from-minibuffer has another peculiarity: on receiving the EOF
character it raises an unhelpful error:
$ ./read-from-minibuffer
^D
Error reading from stdin
$ echo $?
255
The simplest Elisp cat program must watch out for that:
$ ./cat.el < cat.el
:; exec emacs -Q --script "$0"
(while (setq line (ignore-errors (read-from-minibuffer "")))
(princ (concat line "\n")))
Next, if we read the stdin & print the result to the stdout, our
"ideal" reindent script cannot rely on find-file anymore, unless
we save the input in a tmp file. Or we can leave out "heavy"
find-file altogether & just create a temp buffer & copy the input
there for the further processing. The latter implies we must manually
set the proper major mode for the buffer, otherwise indent-region
won't do anything good.
set-auto-mode function does the major mode auto-detection. One of
the 1st hints it looks for is the file extension, but the stdin has
none. We can ask the user to provide one in the arguments of the
script.
:; exec emacs -Q --script "$0" -- "$@" # -*- emacs-lisp -*-
(defun indent(mode text)
(with-temp-buffer
(set-visited-file-name mode)
(insert text)
(set-auto-mode t)
(message "`%s` major mode: %s" mode major-mode)
(indent-region (point-min) (point-max))
(buffer-string)))
(defun read-file(file)
(with-temp-buffer
(insert-file-contents file)
(buffer-string)))
(defun read-stdin()
(let (line lines)
(while (setq line (ignore-errors (read-from-minibuffer "")))
(push line lines))
(push "" lines)
(mapconcat 'identity (reverse lines) "\n")
))
(setq args (cdr argv)) ; rm --
(setq mode (car args))
(if (equal "-" mode)
(progn
(setq mode (nth 1 args))
(if (not mode) (error "No filename hint argument, like .js"))
(setq text (read-stdin)))
(setq text (read-file mode)))
(princ (indent mode text))
If the 1st argument to the script is '-', we look for the hint in the
next argument & then start reading the stdin. It the 1st arg isn't '-'
we assume it's a file that we insert into a temp buffer & return its
contents as a string.
The
(mapconcat 'identity (reverse lines) "\n")
line in read-stdin procedure is an equivalent of
['world', 'hello'].reverse().join("\n")
in JavaScript.
Some examples:
$ printf "<p>\nhey, what's up?\n</p>" | ./emacs-indent - .html
‘.html‘ major mode: html-mode
Indenting region...
Indenting region...done
<p>
hey, what's up?
</p>
$ printf "<p>\nhey, what's up?\n</p>" | ./emacs-indent - .txt
‘.txt‘ major mode: text-mode
Indenting region...
Indenting region...done
<p>
hey, what's up?
</p>
$ ./emacs-indent src/2.html 2>/dev/null
<p>
not much
</p>
Tags: ойті
Authors: ag