Skip to content

HPC Configuration Checklist

This checklist walks through all the steps required to configure the fw-hpc-client for your HPC environment. Check off each item as you complete it.

Prerequisites

System Requirements

  • [ ] RAM: 32 GB minimum
  • [ ] Storage: 64 GB minimum (or ~2 TB if no shared storage)
  • [ ] CPUs: 4 minimum (16 recommended for building SIF files)
  • [ ] OS: Linux (tested on Ubuntu 20.04.3 LTS)
  • [ ] Python: 3.8 or later
  • [ ] Singularity: 3.8.1 or later
  • [ ] Scheduler: Slurm, LSF, or SGE
  • [ ] Git

Access Requirements

  • [ ] HPC Access: Connect to your interactive HPC node
  • [ ] Flywheel API Key: Created for Singularity authentication

Phase 1: Planning and Preparation

1.1 Directory Structure Planning

1.2 Integration Method Selection

  • [ ] Read: Tracking Changes Privately
  • [ ] Create: Private Git repository (GitHub/GitLab)
  • [ ] Generate: SSH deploy key
  • [ ] Configure: Repository access for collaborators
  • [ ] Clone: Repository to your HPC system

Phase 2: Environment Setup

2.1 Python Environment

  • [ ] Connect: To your interactive HPC node
  • [ ] Install/Load: Python 3.8+ (via module system or package manager)
  • [ ] Verify: python3 --version shows correct version
  • [ ] Record: Commands needed to load Python (for later use)

2.2 Configuration Directory

  • [ ] Create: Configuration directory
  • [ ] Navigate: cd <configuration-directory>
  • [ ] Initialize: Git repository (if using Git tracking)

2.3 Install fw-hpc-client

  • [ ] Choose: Installation method (user site or system-wide)
  • [ ] Install: pip3 install --user fw-hpc-client (or chosen method)
  • [ ] Verify: fw-hpc-client --version works

2.4 Initial Setup

  • [ ] Run: fw-hpc-client setup in configuration directory
  • [ ] Verify: settings/ and logs/ directories created
  • [ ] Secure: chmod 0600 ./settings/credentials.sh

Phase 3: Configuration

3.1 Edit cast.yml

  • [ ] Open: ./settings/cast.yml in text editor
  • [ ] Set: cluster to your scheduler type (slurm, lsf, or sge)
  • [ ] Set: admin_contact_email to your email address
  • [ ] Configure: group_whitelist (true/false based on your needs)
  • [ ] Configure: cast_on_tag (true/false based on your needs)
  • [ ] Set: scheduler_ram default
  • [ ] Set: scheduler_cpu default
  • [ ] Review: Other settings and customize if needed

3.2 Edit credentials.sh

  • [ ] Open: ./settings/credentials.sh in text editor
  • [ ] Set: ENGINE_CACHE_DIR to appropriate directory path
  • [ ] Set: ENGINE_TEMP_DIR to appropriate directory path
  • [ ] Set: SCITRAN_RUNTIME_HOST to your Flywheel site domain
  • [ ] Set: SCITRAN_CORE_DRONE_SECRET (get from Flywheel staff)
  • [ ] Configure: Singularity directories if needed:
  • [ ] SINGULARITY_WORKDIR (if custom location needed)
  • [ ] SINGULARITY_CACHEDIR (if custom location needed)
  • [ ] SINGULARITY_TMPDIR (if custom location needed)
  • [ ] Configure: SINGULARITY_BIND if additional mounts needed
  • [ ] Update: PATH if Singularity is in non-standard location

3.3 Edit start-cast.sh

  • [ ] Open: ./settings/start-cast.sh in text editor
  • [ ] Add: Any cluster-specific commands (module loads, etc.)
  • [ ] Add: Python environment activation (if needed)
  • [ ] Verify: Script sources credentials correctly

Phase 4: Singularity Configuration

4.1 Singularity Remote Endpoint

  • [ ] Read: Singularity API Key Configuration
  • [ ] Create: Flywheel API key (if not already done)
  • [ ] Run: singularity registry login (or singularity remote login)
  • [ ] Enter: Flywheel API key when prompted
  • [ ] Verify: Authentication successful
  • [ ] Secure: chmod 0600 ~/.singularity/docker-config.json

Phase 5: Flywheel Engine Installation

5.1 Collaborate with Flywheel Staff

Phase 6: Integration Method Implementation

6.1 Implement Chosen Method

If using cron:

  • [ ] Run: crontab -e
  • [ ] Add: Cron job entry (e.g., */1 * * * * ~/hpc-client-config/settings/start-cast.sh)
  • [ ] Save: Crontab file

If using tmux:

  • [ ] Start: tmux new -s cast
  • [ ] Run: while true; do <configuration directory>/settings/start-cast.sh; sleep 60; done
  • [ ] Detach: Ctrl+B then d

If using ssh:

  • [ ] Plan: Custom implementation with Flywheel staff

Phase 7: Testing and Validation

7.1 Initial Testing

  • [ ] Run: fw-hpc-client run manually to test configuration
  • [ ] Check: ./logs/cast.log for any errors
  • [ ] Test: Dry run mode (dry_run: true in cast.yml)

7.2 Integration Testing

  • [ ] Monitor: <configuration directory>/logs/cast.log for regular execution
  • [ ] Verify: Integration method is running fw-hpc-client regularly

7.3 End-to-End Testing

  • [ ] Test: "Stress Test" (stress-test) gear from Gear Exchange
  • [ ] Test: GPU capabilities with fw-nvidia-cuda-test (if applicable)

Phase 8: Final Steps

8.1 Final Verification

  • [ ] Confirm: All configuration files are properly secured
  • [ ] Verify: Git repository is up to date (if using)
  • [ ] Complete: Integration method from step 1
  • [ ] Monitor: <configuration directory>/logs/cast.log and Flywheel user interface

Support Contacts