Claudio Dekker
July 18, 2020

Demystifying xargs

Today I ran into a situation where I needed to run a single command tons of times, and didn't want to sit around waiting for each one to be done.

I knew xargs existed, but because it doesn't just run a single command and call it a day, I always kind of felt that it was a big scary dangerous tool that I'd be better off just avoiding.

Today, I decided: no more. Let's demystify xargs!

One thing that I was already aware of, is that xargs basically takes the stdin input, splits it by newline characters (as well as all unescaped blanks), and then loops over them, appending the value to the command given and running that.

For example, the following takes each JSON file in a directory, and essentially loops over it, running cat <filename>.json on each and every one of them:

1ls -1 *.json | xargs cat

This wasn't what I needed, but I did need something similar, and I knew xargs to be more powerful. To give a bit more context: I needed to call an API for each filename in a list.

So, after referencing the manual, I quickly discovered it's ability to use placeholder values in commands, similar to how translation placeholders work in (web)apps. An example:

1# V-- Define the placeholder V-- This'll get replaced
2cat files.txt | xargs -I % http -f POST filename="%" token="secret" --ignore-stdin

Awesome. That works. It's still executing things one by one (which is actually what I needed it to do), but another useful thing that I discovered while reading the manual, is that we can actually split the work over multiple processes, which will then run in parallel. To do this, simply add the -P 2 flag, and done, now there's two workers!

1# VVVV-- Defines the maximum amount of concurrent worker processes
2cat files.txt | xargs -P 2 -I % http -f POST filename="%" token="secret" --ignore-stdin

Looking back, I realized that I could've of course just created a bash file and done a for-loop, but hey, today I learned an awesome new thing, and it'll definitely stay in my toolkit!