Run a Set of Jupyter Notebooks from the Command Line
2018-06-17
You’ve got a load of Jupyter notebooks in a directory. You’re going to put them on Github to share with the students in your class, users of your library, readers of your textbook or whoever. Do they all actually run through without errors? Are you sure? Even though you just ran conda update
on that dependency?
You could go into each notebook, hit ‘Restart Kernel and Run All Cells…’ and scroll down to make sure there are no exceptions. It’d be nicer to batch run a set of notebooks from the command line. Here’s a script to do that.
# ! python
# coding: utf-8
import os
import argparse
import glob
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from nbconvert.preprocessors.execute import CellExecutionError
# Parse args
= argparse.ArgumentParser(description="Runs a set of Jupyter \
parser notebooks.")
= """ Notebook file(s) to be run, e.g. '*.ipynb' (default),
file_text 'my_nb1.ipynb', 'my_nb1.ipynb my_nb2.ipynb', 'my_dir/*.ipynb'
"""
'file_list', metavar='F', type=str, nargs='*',
parser.add_argument(help=file_text)
'-t', '--timeout', help='Length of time (in secs) a cell \
parser.add_argument( can run before raising TimeoutError (default 600).', default=600,
=False)
required'-p', '--run-path', help='The path the notebook will be \
parser.add_argument( run from (default pwd).', default='.', required=False)
= parser.parse_args()
args print('Args:', args)
if not args.file_list: # Default file_list
= glob.glob('*.ipynb')
args.file_list
# Check list of notebooks
= []
notebooks print('Notebooks to run:')
for f in args.file_list:
# Find notebooks but not notebooks previously output from this script
if f.endswith('.ipynb') and not f.endswith('_out.ipynb'):
print(f[:-6])
-6]) # Want the filename without '.ipynb'
notebooks.append(f[:
# Execute notebooks and output
= len(notebooks)
num_notebooks print('*****')
for i, n in enumerate(notebooks):
= n + '_out'
n_out with open(n + '.ipynb') as f:
= nbformat.read(f, as_version=4)
nb = ExecutePreprocessor(timeout=int(args.timeout), kernel_name='python3')
ep try:
print('Running', n, ':', i, '/', num_notebooks)
= ep.preprocess(nb, {'metadata': {'path': args.run_path}})
out except CellExecutionError:
= None
out = 'Error executing the notebook "%s".\n' % n
msg += 'See notebook "%s" for the traceback.' % n_out
msg print(msg)
except TimeoutError:
= 'Timeout executing the notebook "%s".\n' % n
msg print(msg)
finally:
# Write output file
with open(n_out + '.ipynb', mode='wt') as f:
nbformat.write(nb, f)
You can use it to run all cells in a single notebook from the command line with
python run_notebooks.py my_nb1.ipynb
You’ll get a new notebook my_nb1_out.ipynb
for you to check the output. I’ve chosen not to overwrite the existing notebook because this can introduce git diffs you didn’t want on notebooks that don’t need fixing.
Run a set of notebooks with
python run_notebooks.py my_nb1.ipynb my_nb2.ipynb my_nb3.ipynb
Again you’ll get notebooks my_nb[1,2,3]_out.ipynb
to check.
Run all the notebooks in a directory with
/*.ipynb python run_notebooks.py notebooks
The default is to run all notebooks in the working directory so
python run_notebooks.py
is the same as
/*.ipynb python run_notebooks.py .
Flags
--help
gives help.--timeout
sets a limit in seconds for cell execution. The default is 600. The script will give up and skip to the next notebook after printing a message.--run-path
sets the path the notebook will be executed from. If you have relative paths in your notebook, you’ll probably want the run path to be set to the notebook path, so call e.g./*.ipynb --run-path=notebooks/ python run_notebooks.py notebooks
References
- Executing Notebooks from the nbconvert Docs
- Testing Jupyter Notebooks by Christian Moscardi