Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Runtime Configuration

Environment Variables

VariableSourceUsed ByDescription
CDSW_READONLY_PORTCML runtimeutils/dask_utils.pyPort for the Dask dashboard. Injected by CML into every session automatically.

No additional environment variables are required. The project does not use API keys, database connections, or external service credentials.

Session Resources

Notebook Session

The JupyterLab session from which you run the notebooks:

ResourceMinimumRecommended
vCPU11
Memory2 GiB2 GiB
GPU00

Dask Cluster

Resources allocated via run_dask_cluster(). Each component runs as a separate CML worker session.

ComponentCountCPUMemoryGPU
Scheduler11 vCPU2 GiB0
WorkersUser-specified (default: 2)User-specifiedUser-specifiedOptional

Total footprint with 2 workers: 4 CML sessions (1 notebook + 1 scheduler + 2 workers), 4 vCPU, 8 GiB RAM.

The scheduler always uses 1 vCPU / 2 GiB. Worker resources are passed as arguments to run_dask_cluster() and can be scaled up for larger datasets.

Pinned Dependencies

All library versions are locked in requirements.txt:

PackageVersionRole
dask[complete]2022.5.1Distributed computing framework (all optional dependencies included)
dask-ml2022.1.22Machine learning extensions for Dask (train/test split, etc.)
matplotlib3.3.4Plotting (precision-recall curves, class distributions)
numpy1.22.4Numerical arrays (inference input/output)
pandas1.4.2DataFrames (hyperparameter history, data exploration)
scikit-learn1.0.2Baseline models, preprocessing (StandardScaler), metrics
scipy1.8.1Statistical distributions for hyperparameter search
seaborn0.11.2Statistical visualization
tqdm4.64.0Progress bars for hyperparameter search
xgboost1.6.1Gradient boosted trees (distributed training and inference)
-e .(local)Installs the utils package from setup.py in editable mode

Local Package

The -e . entry installs cdsw-dask-utils (version 0.1.0) from setup.py. This makes utils.dask_utils importable as a regular Python package. The editable install means changes to dask_utils.py take effect immediately without reinstallation.