Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Validation Rules Reference

This page defines the complete set of validation rules for a CML AMP project conforming to the Distributed XGBoost with Dask pattern. Rules are categorized by domain and severity.

Severity levels:

  • ERROR — the artifact is invalid and will fail at runtime.
  • WARNING — the artifact may work but deviates from the expected contract.

Structural Rules

RuleSeverityDescription
S-001ERRORRepository must contain .project-metadata.yaml at root
S-002ERROR.project-metadata.yaml must parse as valid YAML
S-003ERRORrequirements.txt must exist at root
S-004ERRORscripts/predict_fraud.py must exist
S-005ERRORmodel/best-xgboost-model must exist (post-training only)
S-006ERRORcdsw-build.sh must exist
S-007ERRORutils/dask_utils.py must exist
S-008ERRORsetup.py must exist at root

Dependency Rules

RuleSeverityDescription
D-001ERRORrequirements.txt must include xgboost
D-002ERRORrequirements.txt must include dask (with or without [complete] extra)
D-003ERRORrequirements.txt must include scikit-learn
D-004ERRORrequirements.txt must include -e . (local utils package)
D-005ERRORrequirements.txt must include numpy
D-W01WARNINGAll packages should have pinned versions (==) for reproducibility

Model Endpoint Rules

RuleSeverityDescription
E-001ERRORscripts/predict_fraud.py must parse as valid Python (no syntax errors)
E-002ERRORscripts/predict_fraud.py must define a function named predict_fraud
E-003ERRORpredict_fraud must accept exactly one parameter (args)
E-004ERRORModule must load a model from /home/cdsw/model/best-xgboost-model
E-005ERRORModule must define a threshold variable
E-W01WARNINGthreshold should be a float between 0.0 and 1.0
E-W02WARNINGpredict_fraud should return an integer (0 or 1)

AMP Configuration Rules

RuleSeverityDescription
A-001ERROR.project-metadata.yaml must contain a runtimes list with at least one entry
A-002ERRORRuntime kernel must specify Python 3.9 or higher
A-003ERRORRuntime editor must be JupyterLab
A-004ERRORAt least one task must reference scripts/install_dependencies.py
A-W01WARNINGspecification_version should be 1.0

Cluster Utility Rules

RuleSeverityDescription
C-W01WARNINGutils/dask_utils.py should define run_dask_cluster
C-W02WARNINGrun_dask_cluster should accept num_workers, cpu, memory parameters
C-W03WARNINGrun_dask_cluster should return a dict with keys: scheduler, workers, scheduler_address, dashboard_address
C-W04WARNINGScheduler should listen on TCP port 8786

Build Script Rules

RuleSeverityDescription
B-001ERRORcdsw-build.sh must be a valid shell script (starts with a command, not a syntax error)
B-W01WARNINGcdsw-build.sh should install dependencies from requirements.txt