System Overview
Fine Tuning Studio is a three-layer application running inside a single CML Application pod. A Streamlit frontend communicates with a gRPC backend over localhost; the backend persists metadata to SQLite and dispatches CML Jobs for training and evaluation workloads.
Component Topology
Layer Summary
Presentation Layer
Entry point: main.py. Page modules live in pgs/. Two navigation modes are controlled by the IS_COMPOSABLE environment variable:
- Composable mode (
IS_COMPOSABLEset): Horizontal navbar with dropdown menus for Home, Database, Resources, Experiments, AI Workbench, Examples, and Feedback. - Standard mode (default): Sidebar navigation with section headers and Material Design icons.
Pages obtain shared gRPC and CML client instances through @st.cache_resource decorators defined in pgs/streamlit_utils.py. See Streamlit Presentation Layer for full details.
Application Layer
A gRPC server runs on port 50051, started by bin/start-grpc-server.py as a background subprocess. The service class FineTuningStudioApp in ft/service.py implements FineTuningStudioServicer (generated from protobuf). It is a pure router – each RPC method delegates to a domain function in the corresponding module:
| Module | Domain |
|---|---|
ft/datasets.py | Dataset import, listing, removal |
ft/models.py | Model registration, export |
ft/adapters.py | Adapter management, dataset split lookup |
ft/prompts.py | Prompt template CRUD |
ft/jobs.py | Fine-tuning job dispatch and tracking |
ft/evaluation.py | Evaluation job dispatch and tracking |
ft/configs.py | Configuration blob management |
ft/databse_ops.py | Database export/import operations |
The servicer holds a cmlapi.default_client() and a FineTuningStudioDao instance, passing both to every domain function call. See gRPC Service Design for the full API surface.
Data Layer
SQLite at .app/state.db via SQLAlchemy ORM. Seven tables: models, datasets, adapters, prompts, fine_tuning_jobs, evaluation_jobs, configs. The DAO manages sessions with connection pooling (pool_size=5, max_overflow=10, pool_timeout=30, pool_recycle=1800). See Data Tier for schemas and the DAO API.
Initialization Sequence
The startup sequence is defined in .project-metadata.yaml and executed by bin/start-app-script.sh:
- Install dependencies –
bin/install-dependencies-uv.pyinstalls fromrequirements.txtand performspip install -e .to install theftpackage in dev mode. - Create template CML Jobs –
Accel_Finetuning_Base_JobandMlflow_Evaluation_Base_Jobare created as reusable job templates for fine-tuning and evaluation dispatch. - Initialize project defaults –
bin/initialize-project-defaults-uv.pypopulates default datasets, prompts, models, and adapters fromdata/project_defaults.json. - Start gRPC server –
bin/start-grpc-server.pylaunches as a background process (&), binds to port 50051 with aThreadPoolExecutor(max_workers=10), and setsFINE_TUNING_SERVICE_IPandFINE_TUNING_SERVICE_PORTas CML project environment variables viacmlapi. - Start Streamlit –
uv run -m streamlit run main.py --server.port $CDSW_APP_PORT --server.address 127.0.0.1.
Both processes (gRPC server and Streamlit) run in the same pod. The gRPC server is the subprocess; Streamlit is the foreground process that keeps the CML Application alive.
Environment Variables
| Variable | Purpose | Default |
|---|---|---|
FINE_TUNING_SERVICE_IP | gRPC server IP address | Set at startup from CDSW_IP_ADDRESS |
FINE_TUNING_SERVICE_PORT | gRPC server port | 50051 |
FINE_TUNING_STUDIO_SQLITE_DB | SQLite database file path | .app/state.db |
CDSW_PROJECT_ID | CML project identifier | Set by CML runtime |
CDSW_APP_PORT | Streamlit server port | Set by CML runtime |
HUGGINGFACE_ACCESS_TOKEN | HuggingFace Hub token for gated models | Optional (empty string) |
IS_COMPOSABLE | Enable horizontal navbar mode | Optional (unset = sidebar) |
CUSTOM_LORA_ADAPTERS_DIR | Directory for custom LoRA adapters | data/adapters/ |
FINE_TUNING_STUDIO_PROJECT_DEFAULTS | Path to project defaults JSON | data/project_defaults.json |
Key Takeaway for Harness Builders
The gRPC API is the sole interface to application logic. The Streamlit UI is one client of this API, not the source of truth. Any external harness, CLI tool, or automation script should instantiate a FineTuningStudioClient (or use the generated gRPC stub directly) and interact through the protobuf contract. The database is an implementation detail behind the DAO – never access .app/state.db directly from external code.
To build a custom training harness:
- Import
FineTuningStudioClientfromft.client. - Register resources (datasets, models, prompts) via
Add*RPCs. - Dispatch training via
StartFineTuningJobwith the desired resource IDs and compute configuration. - Poll job status via
GetFineTuningJoborListFineTuningJobs. - Evaluate results via
StartEvaluationJob.
All resource IDs are UUIDs assigned by the service. Pass them by value between RPCs.