This page defines the complete set of validation rules for a CML AMP project conforming to the Distributed XGBoost with Dask pattern. Rules are categorized by domain and severity.
Severity levels:
ERROR — the artifact is invalid and will fail at runtime.
WARNING — the artifact may work but deviates from the expected contract.
Rule Severity Description
S-001ERROR Repository must contain .project-metadata.yaml at root
S-002ERROR .project-metadata.yaml must parse as valid YAML
S-003ERROR requirements.txt must exist at root
S-004ERROR scripts/predict_fraud.py must exist
S-005ERROR model/best-xgboost-model must exist (post-training only)
S-006ERROR cdsw-build.sh must exist
S-007ERROR utils/dask_utils.py must exist
S-008ERROR setup.py must exist at root
Rule Severity Description
D-001ERROR requirements.txt must include xgboost
D-002ERROR requirements.txt must include dask (with or without [complete] extra)
D-003ERROR requirements.txt must include scikit-learn
D-004ERROR requirements.txt must include -e . (local utils package)
D-005ERROR requirements.txt must include numpy
D-W01WARNING All packages should have pinned versions (==) for reproducibility
Rule Severity Description
E-001ERROR scripts/predict_fraud.py must parse as valid Python (no syntax errors)
E-002ERROR scripts/predict_fraud.py must define a function named predict_fraud
E-003ERROR predict_fraud must accept exactly one parameter (args)
E-004ERROR Module must load a model from /home/cdsw/model/best-xgboost-model
E-005ERROR Module must define a threshold variable
E-W01WARNING threshold should be a float between 0.0 and 1.0
E-W02WARNING predict_fraud should return an integer (0 or 1)
Rule Severity Description
A-001ERROR .project-metadata.yaml must contain a runtimes list with at least one entry
A-002ERROR Runtime kernel must specify Python 3.9 or higher
A-003ERROR Runtime editor must be JupyterLab
A-004ERROR At least one task must reference scripts/install_dependencies.py
A-W01WARNING specification_version should be 1.0
Rule Severity Description
C-W01WARNING utils/dask_utils.py should define run_dask_cluster
C-W02WARNING run_dask_cluster should accept num_workers, cpu, memory parameters
C-W03WARNING run_dask_cluster should return a dict with keys: scheduler, workers, scheduler_address, dashboard_address
C-W04WARNING Scheduler should listen on TCP port 8786
Rule Severity Description
B-001ERROR cdsw-build.sh must be a valid shell script (starts with a command, not a syntax error)
B-W01WARNING cdsw-build.sh should install dependencies from requirements.txt