Runtime Configuration
Environment Variables
| Variable | Source | Used By | Description |
|---|---|---|---|
CDSW_READONLY_PORT | CML runtime | utils/dask_utils.py | Port for the Dask dashboard. Injected by CML into every session automatically. |
No additional environment variables are required. The project does not use API keys, database connections, or external service credentials.
Session Resources
Notebook Session
The JupyterLab session from which you run the notebooks:
| Resource | Minimum | Recommended |
|---|---|---|
| vCPU | 1 | 1 |
| Memory | 2 GiB | 2 GiB |
| GPU | 0 | 0 |
Dask Cluster
Resources allocated via run_dask_cluster(). Each component runs as a separate CML worker session.
| Component | Count | CPU | Memory | GPU |
|---|---|---|---|---|
| Scheduler | 1 | 1 vCPU | 2 GiB | 0 |
| Workers | User-specified (default: 2) | User-specified | User-specified | Optional |
Total footprint with 2 workers: 4 CML sessions (1 notebook + 1 scheduler + 2 workers), 4 vCPU, 8 GiB RAM.
The scheduler always uses 1 vCPU / 2 GiB. Worker resources are passed as arguments to run_dask_cluster() and can be scaled up for larger datasets.
Pinned Dependencies
All library versions are locked in requirements.txt:
| Package | Version | Role |
|---|---|---|
dask[complete] | 2022.5.1 | Distributed computing framework (all optional dependencies included) |
dask-ml | 2022.1.22 | Machine learning extensions for Dask (train/test split, etc.) |
matplotlib | 3.3.4 | Plotting (precision-recall curves, class distributions) |
numpy | 1.22.4 | Numerical arrays (inference input/output) |
pandas | 1.4.2 | DataFrames (hyperparameter history, data exploration) |
scikit-learn | 1.0.2 | Baseline models, preprocessing (StandardScaler), metrics |
scipy | 1.8.1 | Statistical distributions for hyperparameter search |
seaborn | 0.11.2 | Statistical visualization |
tqdm | 4.64.0 | Progress bars for hyperparameter search |
xgboost | 1.6.1 | Gradient boosted trees (distributed training and inference) |
-e . | (local) | Installs the utils package from setup.py in editable mode |
Local Package
The -e . entry installs cdsw-dask-utils (version 0.1.0) from setup.py. This makes utils.dask_utils importable as a regular Python package. The editable install means changes to dask_utils.py take effect immediately without reinstallation.