Writing a podcast client in GNU Make
Latest update:
Why? First, because I wanted parallel downloads & my old podcacher
didn't support that. Second, because it sounded like a joke.
The result is gmakepod.
Evidently, some ingredients for such a client are practically
impossible to write using plain Make (like xml parser). Would the
client then be considered a truly Make program? E.g, there is a clever
bash json parser, but when you look in its src you see that it uses awk & grep.
At least we can try to write in Make as many components as possible,
even if (when) they become a bottleneck. Again, why?
Overview
Using Make means constructing a proper DAG. gmakepod uses 6 vertices:
run
→ .download.mk
→ .files.new
→ .files
→ .enclosures
→
.feeds
. All except the 1st one are file targets.
target |
desc |
.feeds (phony) |
parse a config file to extract feeds names & urls |
.enclosures |
fetch & parse each feed to extract enclosures urls |
.files |
generate a proper output file name for each url |
.files.new |
check if we've already downloaded a url in the past, filter out |
.download.mk |
generate a makefile, where we list all the rules for all the enclosures |
run (default) |
run the makefile |
Every time a user runs gmakepod, it remakes all those files anew.
Config file
A user needs to keep a list of feed subscriptions somewhere. The 1st
thing that comes to mind is to use a list of newline-separated urls,
but what if we want to have diff options for each feed? E.g., a
enclosures filter of some sort? We can just add 'options' to the end
of line (Idk, like url=http://example.com!filer.type=audio
) but then
we need to choose a record sep that isn't a space, which means we the
escaping of the record sep in urls or living w/ the notion 'no !
char is allowed' or similar nonsense.
The next question is: how does a makefile process a single record? It
turns out, we can eval foo=bar
inside of a recipe, so if we pass
make -f feed_parse.mk 'url=http://example.com!filer.type=audio'
where feed_parse.mk
looks like
parse-record = ... # replace ! w/ a newline
%:
$(eval $(call parse-record,$*))
@echo $(url)
@echo $(filter.type)
then every 'option' becomes a variable! This sounds good but is actually a
gimcrack.
Make will think that url=http://example.com!filer.type=audio
is a
variable override & complain about missing targets. To ameliorate that
we can prefix the line w/ #
or :
. Sounds easy, but than we need
to slice the line in parse-record
macro. This is the easiest job
in any lang except Make--you won't do it correctly w/o invoking awk or
any other external tool.
If we use an external tool for parsing a mere config line, why use a
self-inflicted parody of the config file instead of a human-readable
one?
Ini format would perfectly fit. E.g.,
[JS Party]
url = https://changelog.com/jsparty/feed
lines are self-explanatory to anyone who has seen a computer once. We
can use Ruby, for example, to convert the lines to
:name=JS_Party!url=http://changelog.com...
or even better to
:\{\".name\":\"JS_Party\",\".url\":\"https://changelog.com/jsparty/feed\"\}
(notice the amount of shell escaping) & use Ruby again in makefile to
transform that json to name=val
pairs, that we eval in the recipe
later on.
How do we pass them to the makefile? If we escape each line correctly,
xargs will suffice:
ruby ini-parse.rb subs.ini | xargs make -f feed-parse.mk
Parsing xml
Ruby, of course, has a full fledged rss parser in its stdlib, but do
we need it? A fancy podcast client (that tracks your every inhalation
& exhalation) would display all metadata from an rss it can obtain,
but I don't want the fancy podcast client, what I want is a program that
reliably downloads the last N enclosures from a list of feeds.
Thus the minimal parser looks like
$ curl -s https://emacsel.com/mp3.xml | \
nokogiri -e 'puts $_.css("enclosure,link[rel=\"enclosure\"]").\
map{|e| e["url"] || e["href"]}' \
| head -2
https://cdn.emacsel.com/episodes/emacsel-ep7.mp3
https://cdn.emacsel.com/episodes/emacsel-ep6.mp3
Options
One of the obviously helpful user options is the number of enclosures
he wants to download. E.g, when the user types
$ gmakepod g=emacs e=5
the client produces .files
file that has a list of 5 shell-escaped
json 'records'. e=5
option could also appear in an .ini file. To
distinguish options passed from the CL from options read from the
.ini, we prefix options from the .ini w/ a dot. The opt
macro is
used to get the final value:
opt = $(or $($1),$(.$1),$2)
E.g.: $(call opt,e,2)
checks the CL opt first, then the .ini opt, &,
as a last resort, returns the def value 2
.
Output file names
Not every enclosure url has a nice path name. What file name should we
assign to an .mp3 from the url below?
https://play.podtrac.com/npr-510289/npr.mc.tritondigital.com/NPR_510289/media/anon.npr-mp3/npr/pmoney/2018/05/20180504_pmoney_pmpod839v2.mp3?orgId=1&d=1606&p=510289&story=608577210&t=podcast&e=608577210&ft=pod&f=510289
Maybe we can use the URI path, 20180504_pmoney_pmpod839v2.mp3
in
this case. Is it possible to extract it in pure Make?
In the most extreme case, the uri path may not be even unique. Say a
feed has 2 entries & each article has 1 enclosure, than they both may
have the same path name:
<entry>
<title>Foo</title>
<link rel="enclosure" type="audio/mpeg" length="1234"
href="http://example.com/podcast?episode=2"/>
<id>2ba2a6ee-52fb-11e8-9176-000c2945132f</id>
</entry>
<entry>
<title>Bar</title>
<link rel="enclosure" type="audio/mpeg" length="5678"
href="http://example.com/podcast?episode=1"/>
<id>3f66b198-52fb-11e8-a2a8-000c2945132f</id>
</entry>
In addition, the output name must be 'safe' in terms of Make. This
means no spaces or $
, :
, %
, ?
, *
, [
, ~
, \
, #
chars.
All of this leads us to another use of Ruby in Make stead. We extract
the uri path from the url, strip out the extension, prefix the path w/
a name of a feed (listed in the .ini), append a random string + an
extension name, so the output file from the above url looks similar
to:
media/NPR_Planet_Money/20180504_pmoney_pmpod839v2.84a8b172.mp3
A homework question: what should we do if a uri path lacks a file
extension?
History
If we successfully downloaded an enclosure, there is rarely a need to
download it again. A 'real' podcast client would look at id/guid (the
date is usually useless) to determine if the entry has any updated
enclosures; our Mickey Mouse parser relies on urls only.
Make certainly doesn't have any key/value store. We could try
employing the sqlite CL interface or dig out gdbm or just append a
url+'\n' to some history.txt
file.
The last one is a tempting one, for grep is uber-fast. As the history
file becomes a shared resource, we might get ourselves in trouble
during parallel downloads, though. lockfile rubygem provides a CL
wrapper around a user specified command, hence can protect our 'db':
rlock history.lock -- ruby -e 'IO.write "history.txt", ARGV[0]+"\n", mode: "a"' 'http://example.com/file.mp3'
It works similarly to flock(1), but supposedly is more portable.
Makefile generation
The last but one step is to generate a makefile named
.download.mk
. After we collected all enclosure urls, we write to the
.mk file a set of rules like
media/Foobar_Podcast/file.84a8b172.mp3
@mkdir -p $(dir $@)
curl 'http://example.com/file.mp3' > $@
@rlock history.lock -- ruby -e 'IO.write "history.txt", ARGV[0]+"\n", mode: "a"' 'http://example.com/file.mp3'
Our last step is to run
make -f .download.mk -k -j1 -Oline
The number of jobs is 1 by default, but is controllable via the CL
param (gmakepod j=4
).
Tags: ойті
Authors: ag