Data Tier

All Fine Tuning Studio metadata is persisted in a SQLite database at .app/state.db (configurable via FINE_TUNING_STUDIO_SQLITE_DB). The ORM layer uses SQLAlchemy declarative models defined in ft/db/model.py. Access is managed through FineTuningStudioDao in ft/db/dao.py.

Schema Topology

Table Schemas

All primary keys are String type (UUIDs assigned by domain logic). All columns are nullable except id. ORM classes are defined in ft/db/model.py.

models

Column	Type	Constraints	Description
`id`	String	PK, NOT NULL	UUID
`type`	String		Source type (e.g., `huggingface`, `cml`)
`framework`	String		Model framework identifier
`name`	String		Display name
`description`	String		Human-readable description
`huggingface_model_name`	String		HuggingFace Hub model ID
`location`	String		Local filesystem path
`cml_registered_model_id`	String		CML Model Registry ID
`mlflow_experiment_id`	String		Associated MLflow experiment
`mlflow_run_id`	String		Associated MLflow run

datasets

Column	Type	Constraints	Description
`id`	String	PK, NOT NULL	UUID
`type`	String		Source type (e.g., `huggingface`, `local`)
`name`	String		Display name
`description`	Text		Long-form description
`huggingface_name`	String		HuggingFace Hub dataset ID
`location`	Text		Local filesystem path
`features`	Text		JSON string of dataset feature names

adapters

Column	Type	Constraints	Description
`id`	String	PK, NOT NULL	UUID
`type`	String		Source type
`name`	String		Display name
`description`	String		Human-readable description
`huggingface_name`	String		HuggingFace Hub adapter ID
`model_id`	String	FK -> `models.id`	Base model this adapter targets
`location`	Text		Local filesystem path to adapter weights
`fine_tuning_job_id`	String	FK -> `fine_tuning_jobs.id`	Job that produced this adapter
`prompt_id`	String	FK -> `prompts.id`	Prompt template used during training
`cml_registered_model_id`	String		CML Model Registry ID
`mlflow_experiment_id`	String		Associated MLflow experiment
`mlflow_run_id`	String		Associated MLflow run

prompts

Column	Type	Constraints	Description
`id`	String	PK, NOT NULL	UUID
`type`	String		Prompt type
`name`	String		Display name
`description`	String		Human-readable description
`dataset_id`	String	FK -> `datasets.id`	Dataset this prompt is designed for
`prompt_template`	String		Full prompt format string
`input_template`	String		Input portion template
`completion_template`	String		Completion portion template

fine_tuning_jobs

Column	Type	Constraints	Description
`id`	String	PK, NOT NULL	UUID
`base_model_id`	String	FK -> `models.id`	Base model to fine-tune
`dataset_id`	String	FK -> `datasets.id`	Training dataset
`prompt_id`	String	FK -> `prompts.id`	Prompt template
`num_workers`	Integer		Number of worker processes
`cml_job_id`	String		CML Job ID for tracking
`adapter_id`	String	FK -> `adapters.id`	Resulting adapter
`num_cpu`	Integer		CPU allocation
`num_gpu`	Integer		GPU allocation
`num_memory`	Integer		Memory allocation (GB)
`num_epochs`	Integer		Training epochs
`learning_rate`	Double		Learning rate
`out_dir`	String		Output directory for adapter weights
`training_arguments_config_id`	String	FK -> `configs.id`	Training arguments config
`model_bnb_config_id`	String	FK -> `configs.id`	Model BitsAndBytes quantization config
`adapter_bnb_config_id`	String	FK -> `configs.id`	Adapter BitsAndBytes quantization config
`lora_config_id`	String	FK -> `configs.id`	LoRA hyperparameters config
`training_arguments_config`	String		Serialized training arguments (snapshot)
`model_bnb_config`	String		Serialized model BnB config (snapshot)
`adapter_bnb_config`	String		Serialized adapter BnB config (snapshot)
`lora_config`	String		Serialized LoRA config (snapshot)
`dataset_fraction`	Double		Fraction of dataset to use
`train_test_split`	Double		Train/test split ratio
`user_script`	String		Custom user training script path
`user_config_id`	String	FK -> `configs.id`	Custom user config
`framework_type`	String		Training framework (`legacy`, `axolotl`, etc.)
`axolotl_config_id`	String	FK -> `configs.id`	Axolotl YAML config
`gpu_label_id`	Integer		GPU label selector
`adapter_name`	String		Name assigned to the output adapter

The fine_tuning_jobs table stores both config ID references (foreign keys to configs) and serialized config snapshots (plain string columns). This allows job records to remain self-describing even if the referenced config is later deleted.

evaluation_jobs

Column	Type	Constraints	Description
`id`	String	PK, NOT NULL	UUID
`type`	String		Evaluation type
`cml_job_id`	String		CML Job ID for tracking
`parent_job_id`	String		Parent fine-tuning job (if derived)
`base_model_id`	String	FK -> `models.id`	Model under evaluation
`dataset_id`	String	FK -> `datasets.id`	Evaluation dataset
`prompt_id`	String	FK -> `prompts.id`	Prompt template
`num_workers`	Integer		Number of worker processes
`adapter_id`	String	FK -> `adapters.id`	Adapter under evaluation
`num_cpu`	Integer		CPU allocation
`num_gpu`	Integer		GPU allocation
`num_memory`	Integer		Memory allocation (GB)
`evaluation_dir`	String		Output directory for evaluation artifacts
`model_bnb_config_id`	String	FK -> `configs.id`	Model BnB quantization config
`adapter_bnb_config_id`	String	FK -> `configs.id`	Adapter BnB quantization config
`generation_config_id`	String	FK -> `configs.id`	Generation config for inference
`model_bnb_config`	String		Serialized model BnB config (snapshot)
`adapter_bnb_config`	String		Serialized adapter BnB config (snapshot)
`generation_config`	String		Serialized generation config (snapshot)

configs

Column	Type	Constraints	Description
`id`	String	PK, NOT NULL	UUID
`type`	String		Config type (`training_arguments`, `bnb`, `lora`, `generation`, `axolotl`)
`description`	String		Human-readable description
`config`	Text		JSON or YAML content stored as string
`model_family`	String		Model family this config targets
`is_default`	Integer		`1` = shipped default, `0` = user-created

ORM Mix-ins

All ORM model classes inherit from three bases: Base (SQLAlchemy declarative base), MappedProtobuf, and MappedDict. These mix-ins provide bidirectional serialization.

MappedProtobuf

Converts between protobuf messages and ORM instances.

# Protobuf message -> ORM instance
adapter_orm = Adapter.from_message(adapter_proto_msg)

# ORM instance -> Protobuf message
adapter_proto = adapter_orm.to_protobuf(AdapterMetadata)

from_message() uses ListFields() (protobuf >= 3.15) to extract only fields that were explicitly set in the message, avoiding default-value contamination. to_protobuf() iterates the ORM instance’s non-null columns and sets matching fields on a new protobuf message.

MappedDict

Converts between Python dictionaries and ORM instances.

# Dict -> ORM instance
model_orm = Model.from_dict({"id": "abc", "name": "llama-2"})

# ORM instance -> Dict (non-null fields only)
model_dict = model_orm.to_dict()

Table-Model Registry

ft/db/model.py exports two lookup dictionaries for programmatic table access:

TABLE_TO_MODEL_REGISTRY = {
    'datasets': Dataset,
    'models': Model,
    'prompts': Prompt,
    'adapters': Adapter,
    'fine_tuning_jobs': FineTuningJob,
    'evaluation_jobs': EvaluationJob,
    'configs': Config
}

MODEL_TO_TABLE_REGISTRY = {v: k for k, v in TABLE_TO_MODEL_REGISTRY.items()}

These are used by the database import/export logic to iterate all application tables.

DAO

FineTuningStudioDao in ft/db/dao.py manages SQLAlchemy engine and session lifecycle.

Constructor

class FineTuningStudioDao:
    def __init__(self, engine_url=None, echo=False, engine_args={}):
        if engine_url is None:
            engine_url = f"sqlite+pysqlite:///{get_sqlite_db_location()}"
        self.engine = create_engine(engine_url, echo=echo, **engine_args)
        self.Session = sessionmaker(bind=self.engine, autoflush=True, autocommit=False)
        Base.metadata.create_all(self.engine)

The servicer instantiates the DAO with connection pool parameters:

Parameter	Value	Description
`pool_size`	5	Persistent connections in the pool
`max_overflow`	10	Additional connections beyond pool_size
`pool_timeout`	30	Seconds to wait for a connection
`pool_recycle`	1800	Seconds before a connection is recycled

Tables are auto-created on first initialization via Base.metadata.create_all(engine).

Session Context Manager

All domain functions access the database through dao.get_session():

@contextmanager
def get_session(self):
    session = self.Session()
    try:
        yield session
        session.commit()
    except Exception as e:
        session.rollback()
        raise e
    finally:
        session.close()

Usage in domain code:

def list_datasets(request, cml, dao):
    with dao.get_session() as session:
        datasets = session.query(Dataset).all()
        # ... convert and return

The context manager guarantees: commit on success, rollback on exception, close in all cases.

Database Export and Import

ft/db/db_import_export.py provides DatabaseJsonConverter for full database serialization.

Export

export_to_json(output_path=None) iterates all non-system tables (excluding sqlite_* internal tables), captures the CREATE TABLE schema and all row data, and returns a JSON string:

{
  "models": {
    "schema": "CREATE TABLE IF NOT EXISTS models (...)",
    "data": [
      {"id": "abc-123", "name": "llama-2", "type": "huggingface", ...}
    ]
  },
  "datasets": { ... },
  ...
}

If output_path is provided, the JSON is also written to that file.

Import

import_from_json(json_path) reads a JSON file in the export format, executes each table’s CREATE TABLE IF NOT EXISTS statement, and inserts all rows. Rows that fail to insert (e.g., due to duplicate primary keys) are logged but do not abort the import.

Alembic Migrations

Schema migrations are managed by Alembic. Configuration is at alembic.ini with migration scripts in db_migrations/. When adding or modifying columns, generate a new migration with:

alembic revision --autogenerate -m "description of change"
alembic upgrade head

The DAO’s create_all() call handles initial table creation, but column additions and type changes on existing databases require Alembic migrations.

Cross-References

System Overview – initialization sequence and environment variables
gRPC Service Design – how domain functions receive the DAO
Configuration Specification – config type taxonomy and validation

Keyboard shortcuts

Fine Tuning Studio Developer's Guide