Managing File Count (Inode) Limits¶

Most HPC systems enforce a file count (inode) quota that is shared across the project. Hitting it stops jobs from creating new files — even if storage space is available.

System	Quota	Scope
NCI Gadi `/scratch/pg06`	202,000 files	Project-wide
Pawsey `/software/pawsey1339`	Project limit	Project-wide
M3 `/fs04/mh42`	No published file limit	—

Check your current usage:

# NCI Gadi
lquota

# Pawsey Setonix
lfs quota -g pawsey1339 /software

# M3
user_info

Why It Happens¶

A single conda/pip environment typically contains 30,000–80,000 files. When multiple team members each install their own environment, the project quota fills up fast. Large datasets stored as individual files (images, CSVs) make it worse.

Strategies¶

1. Use containers instead of conda environments¶

A Singularity/Apptainer .sif image is 1 file regardless of how many packages are inside.

# NCI Gadi and Pawsey
module load singularity

# M3 and Virga
module load apptainer   # same tool, rebranded

singularity pull pytorch.sif docker://pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
singularity exec --nv pytorch.sif python3 train.py

Containers are read-only — workarounds for extra packages:

Option A — bind mount a writable directory (simpler, no persistent image change):

mkdir -p $SCRATCH/pip_extra
singularity exec \
  --bind $SCRATCH/pip_extra:/pip_extra \
  pytorch.sif pip install --target=/pip_extra my_package
export PYTHONPATH=/pip_extra:$PYTHONPATH

Option B — overlay image (changes persist, still only 2 files total):

singularity overlay create --size 2048 overlay.img
singularity exec --overlay overlay.img pytorch.sif pip install my_package

Editable code (pip install -e .): bind-mount your repo instead — no install needed:

singularity exec --nv \
  --bind /path/to/myrepo:/myrepo \
  pytorch.sif python /myrepo/train.py

If several people need the same packages, one shared .sif costs 1 inode instead of N. Put it somewhere accessible to the project group and point everyone at the same file.

3. Datasets with many small files — zip and extract to RAM or local storage¶

Store datasets as a single archive. Extract to a RAM disk or job-local NVMe at job start — zero impact on the project inode quota.

# Extract to RAM disk (fast; limited by node RAM)
tar -xzf /scratch/mydata/dataset.tar.gz -C /dev/shm/

# Extract to job-local NVMe (NCI Gadi — request with #PBS -l jobfs=50gb)
tar -xzf /scratch/mydata/dataset.tar.gz -C $PBS_JOBFS/

# Extract to job-local temp (Pawsey Setonix — request with --gres=tmp:200G)
tar -xzf $MYSCRATCH/dataset.tar.gz -C $TMPDIR/

4. PyTorch DataLoader settings¶

Use num_workers > 0 with pin_memory=True. Workers preload batches in parallel into pinned memory, reducing reliance on OS-level caching of many small files:

DataLoader(dataset, num_workers=4, pin_memory=True, persistent_workers=True)

5. Clean up unused environments regularly¶

# Conda
conda env list
conda env remove -n <env_name>

# pip cache
pip cache purge

# uv cache
rm -rf ~/.cache/uv          # or $SCRATCH/.uv_cache if redirected