Managing File Count (Inode) Limits¶
Most HPC systems enforce a file count (inode) quota that is shared across the project. Hitting it stops jobs from creating new files — even if storage space is available.
| System | Quota | Scope |
|---|---|---|
NCI Gadi /scratch/pg06 |
202,000 files | Project-wide |
Pawsey /software/pawsey1339 |
Project limit | Project-wide |
M3 /fs04/mh42 |
No published file limit | — |
Check your current usage:
Why It Happens¶
A single conda/pip environment typically contains 30,000–80,000 files. When multiple team members each install their own environment, the project quota fills up fast. Large datasets stored as individual files (images, CSVs) make it worse.
Strategies¶
1. Use containers instead of conda environments¶
A Singularity/Apptainer .sif image is 1 file regardless of how many packages are inside.
# NCI Gadi and Pawsey
module load singularity
# M3 and Virga
module load apptainer # same tool, rebranded
singularity pull pytorch.sif docker://pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
singularity exec --nv pytorch.sif python3 train.py
Containers are read-only — workarounds for extra packages:
Option A — bind mount a writable directory (simpler, no persistent image change):
mkdir -p $SCRATCH/pip_extra
singularity exec \
--bind $SCRATCH/pip_extra:/pip_extra \
pytorch.sif pip install --target=/pip_extra my_package
export PYTHONPATH=/pip_extra:$PYTHONPATH
Option B — overlay image (changes persist, still only 2 files total):
singularity overlay create --size 2048 overlay.img
singularity exec --overlay overlay.img pytorch.sif pip install my_package
Editable code (pip install -e .): bind-mount your repo instead — no install needed:
2. Share environments with teammates¶
If several people need the same packages, one shared .sif costs 1 inode instead of N. Put it somewhere accessible to the project group and point everyone at the same file.
3. Datasets with many small files — zip and extract to RAM or local storage¶
Store datasets as a single archive. Extract to a RAM disk or job-local NVMe at job start — zero impact on the project inode quota.
# Extract to RAM disk (fast; limited by node RAM)
tar -xzf /scratch/mydata/dataset.tar.gz -C /dev/shm/
# Extract to job-local NVMe (NCI Gadi — request with #PBS -l jobfs=50gb)
tar -xzf /scratch/mydata/dataset.tar.gz -C $PBS_JOBFS/
# Extract to job-local temp (Pawsey Setonix — request with --gres=tmp:200G)
tar -xzf $MYSCRATCH/dataset.tar.gz -C $TMPDIR/
4. PyTorch DataLoader settings¶
Use num_workers > 0 with pin_memory=True. Workers preload batches in parallel into pinned memory, reducing reliance on OS-level caching of many small files: