XDMoD (CURC system metrics)

A portal for viewing metrics at the system-, partition- and user-levels.

Overview: Would you like to know average queue wait times? Do you need to better understand your and historical resource utilization, or utilization of your project account by user? The XDMoD (XD Metrics on Demand) web-based tool provides users with the ability to easily obtain detailed metrics for high performance computing resources. This open-source tool was developed by the University at Buffalo Center for Computational Research (CCR). CU Boulder Research Computing runs its own instance, CURC XDMoD that enables users to query metrics for the RMACC Summit and Blanca computing resources.

Getting started with XDMoD

All CURC users have access to XDMoD (CU Boulder, CSU and RMACC). At this time, login is only supported for CU Boulder users. Non-CU Boulder users may still query all of the statistics available to CU Boulder users, they just won’t have the ability to personalize metrics.

Step 1: Navigate to the CURC XDMoD instance

In your browser navigate to https://xdmod.rc.colorado.edu. Upon reaching there you will see a summary screen similar to the following image.

../_images/xdmod_homescreen.png

This screen provides some “quick stats” and summary plots that address some of the most common user questions, such as average wait times and recent resource usage by system (Summit or Blanca) and partition. These metrics may be all you need. If you want to personalize metrics you can login with your CURC username and password (currently supported for CU Boulder users only).

Step 2: Login (CU Boulder users only)

Choose the Sign In option near the upper left of the screen. This will initiate a pop-up window that gives you the option to “Sign in with CU Boulder Research Computing” or “Sign in with a local XDMoD account”.

../_images/xdmod_sign_in.png

Choose the option for “Sign in with CU Boulder Research Computing” and enter your CURC username and password. The portal uses 2-factor authentication, so you will need to accept the Duo push to your phone to complete login.

Step 3: Familiarize yourself with XDMoD

Whether or not you login, you’ll start on the “Summary” screen.

../_images/xdmod_post_login.png

The following tabs will be available, depending on whether you are logged in

  • Summary (the screen you are on when you login)
  • Usage (provides access to an expansive set of resource-wide metrics)
  • Metrics Explorer** (similar to the Usage tab, but with additional functionality)
  • Data Export** (enables raw data to be output in csv or json format for use in other apps)
  • Report Generator** (facilitates the creation of reports that can be saved and shared)
  • Job Viewer** (enables users to search for and view jobs that meet specified criteria)
  • About (provide general information on the XDMoD software)

** - only available to users who are logged in.

Notes on XDMoD Syntax

  • a “CPU Hour” is a “core hour” (e.g., for a single job, this would be the number of ntasks a user specifies in their job script multipled by how long the job runs)
  • a “PI” is a project account (e.g., ucb-general or ucb124_summit1)

Step 4: Become a pro!

XDMoD can query a seemingly endless number of metrics, more than could ever be described in this documentation. To learn how to query specific metrics, customize your views, etc., please refer to the XDMoD documentation:

https://xdmod.rc.colorado.edu/user_manual/index.php

Example use case

Let’s say you want to see how many core hours you project account has used over time, including the usage by user.

  • Go to the Usage tab.
  • In the “Metrics and Options” menu, choose CPU Hours: Total to create a graph of total CPU hours consumed over a default period. In XDMoD syntax a “CPU Hour” refers to a “core hour” (for a single job, this would be the number of ntasks a user chooses in their job script multipled by how long the job runs).
  • Click anywhere on the blue line in the graph to expose the “Drill Down” menu:

../_images/xdmod_cpuhrs_total.png

  • Choose the “PI” option. In XDMoD syntax a “PI” is a project account (e.g., ucb-general or ucb124_summit1).
  • This will revise the graph to show CPU usage for different “PIs” (accounts), showing only the accounts with the greatest usage. Your account may not be shown. To find it click the Filter tab at the top and search for your project (e.g., ucb-general).
  • You will now see a graph showing only core hours used by your account. To see core hours used for each user of the account, click anywhere on the line to expose the “Drill Down” menu and choose the User option.
  • This will revise the graph to show CPU usage by user. If you don’t see your user of interest, you can use the Filter tab at the top to find them.
  • You can change the time range of the x-axis by specifying the dates in the “Start” and “End” boxes near the top of the screen.