HPC Configuration Checklist
This checklist walks through all the steps required to configure the fw-hpc-client for your HPC environment. Check off each item as you complete it.
Prerequisites
System Requirements
- [ ] RAM: 32 GB minimum
- [ ] Storage: 64 GB minimum (or ~2 TB if no shared storage)
- [ ] CPUs: 4 minimum (16 recommended for building SIF files)
- [ ] OS: Linux (tested on Ubuntu 20.04.3 LTS)
- [ ] Python: 3.8 or later
- [ ] Singularity: 3.8.1 or later
- [ ] Scheduler: Slurm, LSF, or SGE
- [ ] Git
Access Requirements
- [ ] HPC Access: Connect to your interactive HPC node
- [ ] Flywheel API Key: Created for Singularity authentication
Phase 1: Planning and Preparation
1.1 Directory Structure Planning
- [ ] Read: Directory Configuration and Setup
- [ ] Choose: Configuration directory location
- [ ] Verify: Write access to chosen location
1.2 Integration Method Selection
- [ ] Read: Choose an Integration Method
- [ ] Choose: Integration method (cron, tmux, or ssh)
1.3 Git Repository Setup (Recommended)
- [ ] Read: Tracking Changes Privately
- [ ] Create: Private Git repository (GitHub/GitLab)
- [ ] Generate: SSH deploy key
- [ ] Configure: Repository access for collaborators
- [ ] Clone: Repository to your HPC system
Phase 2: Environment Setup
2.1 Python Environment
- [ ] Connect: To your interactive HPC node
- [ ] Install/Load: Python 3.8+ (via module system or package manager)
- [ ] Verify:
python3 --versionshows correct version - [ ] Record: Commands needed to load Python (for later use)
2.2 Configuration Directory
- [ ] Create: Configuration directory
- [ ] Navigate:
cd <configuration-directory> - [ ] Initialize: Git repository (if using Git tracking)
2.3 Install fw-hpc-client
- [ ] Choose: Installation method (user site or system-wide)
- [ ] Install:
pip3 install --user fw-hpc-client(or chosen method) - [ ] Verify:
fw-hpc-client --versionworks
2.4 Initial Setup
- [ ] Run:
fw-hpc-client setupin configuration directory - [ ] Verify:
settings/andlogs/directories created - [ ] Secure:
chmod 0600 ./settings/credentials.sh
Phase 3: Configuration
3.1 Edit cast.yml
- [ ] Open:
./settings/cast.ymlin text editor - [ ] Set:
clusterto your scheduler type (slurm,lsf, orsge) - [ ] Set:
admin_contact_emailto your email address - [ ] Configure:
group_whitelist(true/false based on your needs) - [ ] Configure:
cast_on_tag(true/false based on your needs) - [ ] Set:
scheduler_ramdefault - [ ] Set:
scheduler_cpudefault - [ ] Review: Other settings and customize if needed
3.2 Edit credentials.sh
- [ ] Open:
./settings/credentials.shin text editor - [ ] Set:
ENGINE_CACHE_DIRto appropriate directory path - [ ] Set:
ENGINE_TEMP_DIRto appropriate directory path - [ ] Set:
SCITRAN_RUNTIME_HOSTto your Flywheel site domain - [ ] Set:
SCITRAN_CORE_DRONE_SECRET(get from Flywheel staff) - [ ] Configure: Singularity directories if needed:
- [ ]
SINGULARITY_WORKDIR(if custom location needed) - [ ]
SINGULARITY_CACHEDIR(if custom location needed) - [ ]
SINGULARITY_TMPDIR(if custom location needed) - [ ] Configure:
SINGULARITY_BINDif additional mounts needed - [ ] Update:
PATHif Singularity is in non-standard location
3.3 Edit start-cast.sh
- [ ] Open:
./settings/start-cast.shin text editor - [ ] Add: Any cluster-specific commands (module loads, etc.)
- [ ] Add: Python environment activation (if needed)
- [ ] Verify: Script sources credentials correctly
Phase 4: Singularity Configuration
4.1 Singularity Remote Endpoint
- [ ] Read: Singularity API Key Configuration
- [ ] Create: Flywheel API key (if not already done)
- [ ] Run:
singularity registry login(orsingularity remote login) - [ ] Enter: Flywheel API key when prompted
- [ ] Verify: Authentication successful
- [ ] Secure:
chmod 0600 ~/.singularity/docker-config.json
Phase 5: Flywheel Engine Installation
5.1 Collaborate with Flywheel Staff
- [ ] Contact: Flywheel support (support@flywheel.io)
- [ ] Request: Latest Flywheel engine binary
- [ ] Request: Your site's drone secret
- [ ] Follow: Engine Installation Guide
Phase 6: Integration Method Implementation
6.1 Implement Chosen Method
If using cron:
- [ ] Run:
crontab -e - [ ] Add: Cron job entry (e.g.,
*/1 * * * * ~/hpc-client-config/settings/start-cast.sh) - [ ] Save: Crontab file
If using tmux:
- [ ] Start:
tmux new -s cast - [ ] Run:
while true; do <configuration directory>/settings/start-cast.sh; sleep 60; done - [ ] Detach:
Ctrl+Bthend
If using ssh:
- [ ] Plan: Custom implementation with Flywheel staff
Phase 7: Testing and Validation
7.1 Initial Testing
- [ ] Run:
fw-hpc-client runmanually to test configuration - [ ] Check:
./logs/cast.logfor any errors - [ ] Test: Dry run mode (
dry_run: truein cast.yml)
7.2 Integration Testing
- [ ] Monitor:
<configuration directory>/logs/cast.logfor regular execution - [ ] Verify: Integration method is running fw-hpc-client regularly
7.3 End-to-End Testing
- [ ] Test: "Stress Test" (
stress-test) gear from Gear Exchange - [ ] Test: GPU capabilities with
fw-nvidia-cuda-test(if applicable)
Phase 8: Final Steps
8.1 Final Verification
- [ ] Confirm: All configuration files are properly secured
- [ ] Verify: Git repository is up to date (if using)
- [ ] Complete: Integration method from step 1
- [ ] Monitor:
<configuration directory>/logs/cast.logand Flywheel user interface
Support Contacts
- Flywheel Support: support@flywheel.io
- Documentation: fw-hpc-client docs