JupyterHub

Jupyter notebooks are an excellent resource for interactive development and data analysis using Python, R, and other languages. Jupyter notebooks can contain live code, equations, visualizations, and explanatory text which provide an excellent enviornment to use, learn, and teach interactive data analysis.

CU Research Computing (CURC) operates a JupyterHub server that enables users to run Jupyter notebooks on Summit or Blanca for serial (single core) and shared-memory parallel (single node) workflows. The CURC JupyterHub runs atop of Anaconda. Additional documentation on the CURC Anaconda distribution is available and may be a good pre-requisite for the following documentation outlining use of the CURC JupyterHub.

Step 1: Log in to CURC JupyterHub

CURC JupyterHub is available at https://jupyter.rc.colorado.edu. To log in use your RC credentials. If you do not have an RC account, please request an account before continuing.

Step 2: Start a notebook server

To start a notebook server, select one of the available options in the Select job profile menu under Spawner Options and click Spawn. Available options are:

  • Anaconda-based servers (recommended)
    • Summit interactive (12hr) (a 12-hour, 1 core job on a Summit “shas” node)
    • Summit Haswell (1 node, 12hr) (a 12-hour, 24 core job on a Summit “shas” node)
    • Blanca (12hr) (A 12-hour, 1 core job on your default Blanca partition; only available to Blanca users)
    • Blanca CSDMS (12hr) (A 12-hour, 1 core job on the Blanca CSDMS partition; only available to Blanca CSDMS users)
  • Module-based servers (legacy; no longer supported)
    • Legacy - Summit Haswell - 2hr (a 2-hour, 1 core job on a Summit “shas” node)
    • Legacy - Summit Haswell - 12hr (a 12-hour, 1 core job on a Summit “shas” node)
    • Legacy - Summit Knight’s Landing (a 2-hour, full node job on a Summit “sknl” node)
    • Legacy - Blanca CSDMS (A 12-hour, 1 core job on the Blanca CSDMS partition; only available to Blanca CSDMS users)
    • Legacy - Blanca Sol (A 12-hour, 1 core job on the Blanca Sol partition; only available to Blanca Sol users)
    • Legacy - Blanca APPM (A 12-hour, 1 core job on the Blanca APPM partition; only available to Blanca APPM users)

The server will take a few moments to start. When it does, you will be taken to the Jupyter home screen, which will show the contents of your CURC /home directory under the Files tab. You will also see the following buttons in the upper right of the screen:

  • Quit: Will terminate your notebook server (i.e., terminates the job you just started).
  • Logout: Will log you out of CURC Jupyterhub and terminate your notebook server.
  • Control Panel: Will enable you to manually terminate and (if desired) restart your server.
  • Upload: Enables you to upload files from your local computer to your CURC /home directory.
  • New: Enables you to open a new notebook via a chosen kernel (e.g., Python2, Python3, bash, R)
    • documentation on opening new notebooks is provided in “Step 3” below

Default Notebook Features

  • Access to standard RC file systems:
    • /home
    • /projects/
    • /pl/active (for users with PetaLibrary allocations)
    • /scratch/summit (Summit only)
    • /rc_scratch (Blanca only)
  • Access to the following default kernels in the CURC Anaconda distribution (Note: documentation on creating and importing your own custom kernels is provided in the “Additional Documentation” below):
    • Python 2 (idp): Python2 notebook (Intel Python distribution)
    • Python 3 (idp): Python3 notebook (Intel Python distribution)
    • Bash: BASH notebook
    • R: R notebook
  • IPyParallel/IPython clusters

Step 3: Open a notebook

There are two ways to open a notebook:

  • To open a new notebook: click on the New button on the right hand side of the Jupyter home screen, and select one of the available options (kernels) under “Notebook”, depending on the programming language you wish to use in the notebook (e.g., python, R, bash). Once you are in the notebook, you can save it to myfilename.ipynb using the File -> Save as.. option.
  • To open an existing notebook: Click on the myfilename.ipynb notebook that you want to work in. This will open the notebook in the appropriate kernel (assuming that kernel is available on CURC Jupyterhub).

Tip: The Python 2 (idp) and Python 3 (idp) notebook environments have many preinstalled packages. To query a list of available packages from a python notebook, you can use the following nomenclature:

from pip._internal import main as pipmain 
pipmain(['freeze'])

If the packages you need are not available, you can create your own custom environment and Jupyter kernel.

Step 4: Shut down a Notebook Server

Use the Stop My Server button in the Control Panel to shut down the Jupyter notebook server when finished (this cancels the job you are running on Summit or Blanca). You also have the option to restart a server if desired (for example, if you want to change from a “shas” to a “sknl” server).

Alternately, you can use the Quit button from the Jupyter home page to shut down the Jupyter notebook server.

Using the Logout button will log you out of CURC JupyterHub. It will not shut down your notebook server if one happens to be running.

Additional Documentation

Creating your own custom Jupyter kernels

The CURC JupyterHub runs on top of the CURC Anaconda distribution. Anaconda is an open-source python and R distribution that uses the conda package manager to easily install software and packages. Software and associated Jupyter kernels other than python and R can also be installed using conda. The following steps describe how to create your own custom Anaconda environments and associated Jupyter kernels for use on RC JupyterHub.

Follow these steps from a terminal session. You can get a new terminal session directly from Jupyter using New-> Terminal.

1. Activate the CURC Anaconda environment

For python2:

[johndoe@shas0137 ~]$ source /curc/sw/anaconda2/2019.03/bin/activate

For python3:

[johndoe@shas0137 ~]$ source /curc/sw/anaconda3/2019.03/bin/activate

You will know that you have properly activated the environment because you should see (base) in front of your prompt. E.g.:

(base) [johndoe@shas0137 ~]$

2. Modify your ~/.condarc file so that packages are downloaded to your /projects directory

By default, conda downloads packages to your home/$USER directory when creating a new environment. Your /home/$USER directory (also denoted with ~) is small – only 2 GB. The steps here modify the conda configration file, called ~/.condarc, to change the default location of pkgs_dirs so that the packages are downloaed to your (much bigger) /projects directory.

Open your ~/.condarc file in your favorite text editor (e.g., nano, vim): (note: this file may not exist yet – if not, just create a new file with this name)

(base) [johndoe@shas0137]$ nano ~/.condarc

…and add the following two lines:

pkgs_dirs:
  - /projects/$USER/.conda_pkgss

…then save and exit the file. You won’t need to perform this step again – it’s permanent unless you change pkgs_dirs by editing ~/.condarc again.

Note: You can customize a variety of jupyter settings using the ~/.condarc file.

3. Create a new environment in a predetermined location in your /projects directory.

*Note: In the examples below the environment is created in /projects/$USER/software/anaconda/envs. This assumes that the software, anaconda, and envs directories already exist in /projects/$USER. Environments can be installed in any writable location the user chooses.

a. Create a custom environment “from scratch”: Here we create a new environment called mycustomenv:

``` You will know that you have properly activated the environment because you should see (base) in front of your prompt. E.g.:

(base) [johndoe@shas0137 ~]$ conda create --prefix /projects/$USER/software/anaconda/envs/mycustomenv

or if you want a specific version of python other than the default installed in the CURC Anaconda base environment:

(base) [johndoe@shas0137 ~]$ conda create --prefix /projects/$USER/software/anaconda/envs/mycustomenv python==2.7.16

or…

b. Create a custom environment by cloning a preexisting environment: Here we clone the preexisting Intel Python3 distribution in the CURC Anaconda environment, creating a new environment called mycustomenv:

(base) [johndoe@shas0137 ~]$ conda create --clone idp --prefix /projects/$USER/software/anaconda/envs/mycustomenv

4. Activate your new environment

(base) [johndoe@shas0137 ~]$ conda activate /projects/$USER/software/anaconda/envs/mycustomenv

5. Create your own custom kernel, which will enable you to use this environment in CURC Jupyterhub:

(mycustomenv) [johndoe@shas0137 ~]$ python -m ipykernel install --user --name mycustomenv --display-name mycustomenv

This command will create a kernel with the name mycustomenv and the Jupyter display name mycustomenv (note that the name and display-name are not required to match the environment name – call them anything you want). By specifying the --user flag, the kernel will be in /home/$USER/.local/share/jupyter/kernels (a directory that is in the default JUPYTER_PATH) and will ensure your new kernel is available to you the next time you use CURC JupyterHub.

Notes on creating environments:

  • You can create an environment in any directory location you prefer (as long as you have access to that directory). We recommend using your /projects directory because it is much larger than your /home directory).

  • Although we don’t show it here, it is expected that you will be installing whatever software and packages you need in this environment, as you normally would with conda).

  • We [strongly recommend] cloning the Intel Python distribution (idp) if you will be doing any computationally-intensive work, or work that requires parallelization. The Intel Python distribution will run more efficiently on our Intel architecture than other python distributions.

  • If you have already installed your own version of Anaconda or Miniconda, it is possible to create Jupyter kernels for your preexisting environments by following Step 4 above from within the active environment.

  • If you need to use custom kernels that are in a location other than /home/$USER/.local/share/jupyter (for example, if your research team has a group installation of Anaconda environments located in /pl/active/<some_env>), you can create a file in your home directory named ~/.jupyterrc containing the following line:

    export JUPYTER_PATH=/pl/active/<some_env>/share/jupyter

  • If you need assistance creating or installing environments or Jupyter kernels, contact us at rc-help@colorado.edu.

Troubleshooting

Jupyter notebook servers spawned on RC compute resources log to ~/.jupyterhub-spawner.log. Watching the contents of this file provides useful information regarding any problems encountered during notebook startup or execution.