Using GNU Parallel with ia
On this page
Using GNU Parallel with ia#
GNU Parallel is a shell tool for executing jobs in parallel.
It is a very useful tool to use with ia for bulk jobs.
It can be installed via many OS package managers.
For example, it can be installed via homebrew on Mac OS:
brew install parallel
Refer to the GNU Parallel homepage for more details on available packages, source code, installation, and other documentation and tutorials.
Basic Usage#
You can use parallel to retrieve metadata from archive.org items concurrently:
$ cat itemlist.txt
jj-test-2020-09-17-1
jj-test-2020-09-17-2
jj-test-2020-09-17-3
$ cat itemlist.txt | parallel 'ia metadata {}' | jq .metadata.date
"1999"
"1999"
"1999"
You can run parallel with --dry-run to check your commands before running them:
$ cat itemlist.txt | parallel --dry-run 'ia metadata {}'
ia metadata jj-test-2020-09-17-2
ia metadata jj-test-2020-09-17-1
ia metadata jj-test-2020-09-17-3
Logging and retrying with Parallel#
Parallel also offers an easy way to log and retry failed commands.
Here’s an example of a job that is retrieving metadata for all of the items in the file named itemlist.txt, and outputting the metadata to a file named output.jsonl.
It uses the --joblog option to log all commands and their exit value to /tmp/my_ia_job.log:
$ cat itemlist.txt | parallel --joblog /tmp/my_ia_job.log 'ia metadata {}' > output.jsonl
You can now retry any commands that failed by using the --retry-failed option (don’t forget to switch > to >> in this example, so you don’t overwrite output.jsonl! >> means to append to the output file, rather than clobber it):
$ parallel --retry-failed --joblog /tmp/my_ia_job.log 'ia metadata {}' >> output.jsonl
If there were no failed commands, nothing will be rerun.
You can rerun this command until it exits with 0.
You can check the exit code by running echo $? directly after the parallel command finishes.
Resources#
Intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Cheat sheet: https://www.gnu.org/software/parallel/parallel_cheat.pdf
Examples from the man page: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-xargs–n1.-Argument-appending