SIGDG Financial Transaction Ontology

The Signals Data Governance ontology (prefix SIGDG) is grounded in BFO 2020 (Basic Formal Ontology, ISO/IEC 21838-2:2021). BFO provides the upper-level categories that make ontologies from separate teams composable — preventing the class of integration failures Barry Smith illustrates with the 2006 Airbus A380 wiring debacle, where incompatible data representations across engineering teams cost $6 billion to correct. A shared top-level ontology is the mechanism that makes data descriptions joinable without ad-hoc reconciliation.

This page defines the financial transaction risk extension of SIGDG. It expands the existing SIGDG:0050 TransactionInformation node — which the Signals-360 metadata classification project defines as a leaf — into a full subtree covering fraud detection, transaction context, risk indicators, and regulatory scope. Both ontologies share the same CURIE prefix and BFO grounding, so categories from either project can be composed in a single governance policy.

BFO Grounding

BFO:0000031 generically dependent continuant
  └── SIGDG:0001 InformationEntity             ── "data that can be stored and transferred"
        └── SIGDG:0050 TransactionInformation   ── "event records: orders, sessions, timestamps"
              └── SIGDG:0100 FinancialTransaction  ── "monetary transfer event between parties"

BFO:0000019 quality
  ├── SIGDG:1001 SensitivityLevel              ── "inheres in an information entity"
  └── SIGDG:1050 TransactionRiskLevel          ── "inheres in a financial transaction record"

BFO:0000023 role
  ├── SIGDG:2001 DataSubjectRole               ── "externally grounded in governance context"
  └���─ SIGDG:2040 RegulatoryFrameworkRole       ── "externally grounded in regulatory scope"

BFO:0000015 process
  ├── SIGDG:3001 DataLifecycleProcess          ── "transformation, migration, classification"
  └── SIGDG:3040 FraudDetectionProcess         ── "scoring, retraining, threshold calibration"

BFO’s generically dependent continuant (GDC) is the natural home for information entities — they depend on some material carrier (disk, memory, wire) but can be copied and transferred between carriers. A financial transaction record is a GDC: it describes a real-world monetary event but exists independently of any single storage medium. Risk levels are qualities that inhere in transaction records. Regulatory scope is a role — the same transaction carries different obligations under PCI-DSS vs. AML/KYC.

CURIE Prefix

SIGDG → https://signals360.dev/ontology/dg/
BFO   → http://purl.obolibrary.org/obo/BFO_

The SIGDG prefix is shared with the Signals-360 metadata governance ontology. Code ranges are partitioned to avoid collision:

Range	Owner	Domain
`0001`–`0091`	Signals-360	Metadata column classification
`0100`–`0199`	This project	Financial transaction risk
`1010`–`1040`	Signals-360	Data sensitivity levels
`1050`–`1059`	This project	Transaction risk levels
`2010`–`2030`	Signals-360	Data subject roles
`2040`–`2049`	This project	Regulatory framework roles
`3010`–`3030`	Signals-360	Data lifecycle processes
`3040`–`3049`	This project	Fraud detection processes

Financial Transaction Hierarchy

SIGDG:0050  TransactionInformation                 (⊑ SIGDG:0001, from Signals-360)
└── SIGDG:0100  FinancialTransaction               "monetary transfer event between parties"
    ├── SIGDG:0110  TransactionRiskClassification   "risk assessment outcome for a transaction"
    │   ├── SIGDG:0111  LegitimateTransaction       "transaction consistent with cardholder behavior"
    │   └── SIGDG:0112  FraudulentTransaction        "unauthorized or deceptive transaction"
    │       ├── SIGDG:0113  CardNotPresentFraud      "fraud via remote channel without physical card"
    │       ├── SIGDG:0114  CounterfeitCardFraud     "fraud using cloned or fabricated card"
    │       ├── SIGDG:0115  LostStolenCardFraud      "fraud using a card obtained without consent"
    │       └── SIGDG:0116  AccountTakeoverFraud     "fraud via compromised account credentials"
    ├── SIGDG:0120  TransactionContext               "observable properties of a transaction event"
    │   ├── SIGDG:0121  MonetaryContext              "amount, currency, denomination"
    │   ├── SIGDG:0122  TemporalContext              "timestamp, time-of-day, day-of-week patterns"
    │   ├── SIGDG:0123  BehavioralContext            "latent patterns derived from transaction history"
    │   └── SIGDG:0124  MerchantContext              "merchant category, location, channel"
    └── SIGDG:0130  RiskIndicator                    "anomalous pattern suggesting elevated risk"
        ├── SIGDG:0131  AmountAnomaly                "amount deviates from cardholder norm"
        ├── SIGDG:0132  VelocityAnomaly              "transaction frequency exceeds cardholder baseline"
        ├── SIGDG:0133  GeographicAnomaly            "location inconsistent with cardholder history"
        └���─ SIGDG:0134  BehavioralAnomaly            "latent behavioral pattern deviates from norm"

Hierarchy Rationale

TransactionRiskClassification (0110) models the outcome of the fraud detection model. In the current binary implementation, only LegitimateTransaction and FraudulentTransaction are used. The four fraud subtypes (0113–0116) are provided for future multi-class models and for downstream governance policies that distinguish fraud mechanisms — a lost-card fraud may trigger a card replacement workflow while an account takeover triggers a credential reset.

TransactionContext (0120) models what the features describe. The credit card fraud dataset’s 29 features map to this subtree:

Feature(s)	SIGDG Category	Notes
`Amount`	`SIGDG:0121 MonetaryContext`	Raw transaction amount; the only non-PCA feature
`Time` (dropped)	`SIGDG:0122 TemporalContext`	Seconds since first transaction; dropped in preprocessing
`V1`–`V28`	`SIGDG:0123 BehavioralContext`	PCA-transformed latent dimensions; original features undisclosed
(not in dataset)	`SIGDG:0124 MerchantContext`	Placeholder for merchant data in future datasets

The PCA features are deliberately anonymous — the dataset curators applied PCA for confidentiality. Each V-component encodes a linear combination of original transaction attributes. We classify them collectively as BehavioralContext rather than attempting to reverse-engineer individual semantics, because the ontology should describe what the features represent (behavioral patterns), not their mathematical form (principal components).

RiskIndicator (0130) models the evidence supporting a risk classification. These correspond to interpretable signals that a human analyst would recognize:

Indicator	What it captures	Dataset proxy
`AmountAnomaly`	Unusually large or small transaction	Extreme values in `Amount`
`VelocityAnomaly`	Burst of transactions in short window	(requires `Time`; not used in current model)
`GeographicAnomaly`	Transaction from unexpected location	(encoded in PCA components; not directly observable)
`BehavioralAnomaly`	Deviation from established spending pattern	High-magnitude PCA components

Not all indicators are directly observable in the current dataset. The ontology defines them for completeness — a production fraud system would populate all four from raw transaction data.

Transaction Risk Levels

Risk level is modeled as a BFO quality (BFO:0000019) that inheres in a financial transaction record. This parallels the sensitivity levels in Signals-360 (SIGDG:1010–1040), but describes fraud risk rather than data sensitivity.

CURIE	Code	Label	Definition	Threshold Guidance
`SIGDG:1051`	1051	LowRisk	Transaction consistent with all behavioral baselines	Model score ≤ 0.10
`SIGDG:1052`	1052	ElevatedRisk	Transaction deviates on one or more risk indicators	0.10 < score ≤ 0.35
`SIGDG:1053`	1053	HighRisk	Transaction strongly suggests fraudulent intent	0.35 < score ≤ 0.80
`SIGDG:1054`	1054	ConfirmedFraud	Transaction verified as unauthorized post-investigation	score > 0.80 or manual label

The current binary model applies a single threshold at 0.35 (the boundary between ElevatedRisk and HighRisk). The four-level scheme supports graduated response:

LowRisk: Approve automatically.
ElevatedRisk: Approve but flag for batch review.
HighRisk: Require step-up authentication or manual approval.
ConfirmedFraud: Block and initiate chargeback/investigation.

Relationship to Sensitivity Levels

Transaction risk levels and data sensitivity levels are orthogonal qualities that can co-occur on the same entity:

Entity	Risk Level	Sensitivity Level	Implication
Transaction record with PAN	HighRisk	Restricted (PCI-DSS)	Block transaction AND encrypt PAN at rest
Transaction record without PII	LowRisk	Internal	Approve; standard access controls
Aggregated fraud statistics	N/A	Confidential	No per-transaction risk; protect business intelligence

Regulatory Framework Roles

The same transaction may be subject to multiple regulatory frameworks simultaneously. These are modeled as BFO roles (BFO:0000023) — externally grounded in the legal/regulatory context rather than intrinsic to the data.

CURIE	Code	Label	Definition
`SIGDG:2041`	2041	PCIDSSScope	Transaction involves payment card data; subject to PCI-DSS requirements
`SIGDG:2042`	2042	AMLKYCScope	Transaction subject to Anti-Money Laundering / Know Your Customer rules
`SIGDG:2043`	2043	EMVLiabilityScope	Transaction subject to EMV chip liability shift rules

A single transaction can carry all three roles. The fraud detection model’s output feeds into each framework differently:

PCI-DSS (2041): A HighRisk classification triggers additional logging and encryption requirements for the transaction’s card data fields.
AML/KYC (2042): Patterns of ElevatedRisk transactions across accounts trigger Suspicious Activity Report (SAR) filing obligations.
EMV Liability (2043): The party that failed to support chip-based authentication bears fraud liability; the risk classification determines which party initiates the chargeback.

Fraud Detection Processes

Classification and model maintenance are modeled as BFO processes (BFO:0000015):

CURIE	Code	Label	Definition
`SIGDG:3041`	3041	RealTimeScoring	Sub-second risk scoring at transaction authorization time
`SIGDG:3042`	3042	BatchRetraining	Periodic model retraining on accumulated labeled transactions
`SIGDG:3043`	3043	ThresholdCalibration	Adjusting the decision boundary to optimize precision-recall tradeoff

In the current AMP implementation:

RealTimeScoring corresponds to the predict_fraud(args) model endpoint.
BatchRetraining corresponds to the notebook-driven distributed XGBoost/CatBoost training pipeline.
ThresholdCalibration corresponds to the precision-recall curve analysis that selected the 0.35 threshold.

Mapping to the Credit Card Fraud Dataset

The Kaggle credit card fraud dataset maps to this ontology as follows:

Class Label Mapping

The binary Class column in the dataset maps to the first two children of TransactionRiskClassification:

Class Value	SIGDG Category	Count	Rate
0	`SIGDG:0111 LegitimateTransaction`	94,777	99.84%
1	`SIGDG:0112 FraudulentTransaction`	149	0.16%

The fraud subtypes (0113–0116) are not distinguishable in this dataset — the Kaggle data does not include fraud mechanism labels. A production deployment would label confirmed fraud cases with the appropriate subtype during investigation, enabling future multi-class models.

Extension Points

This ontology is intentionally minimal — it covers exactly what the current dataset and model support, with clearly marked placeholders for production extensions.

Near-term extensions (Phase 2–3 of CatBoost migration)

Category	What it enables
`MerchantContext` (`0124`) subtypes	MCC-based risk stratification when merchant data is available
Fraud subtype labels (`0113`–`0116`)	Multi-class classification instead of binary
Additional `RiskIndicator` subtypes	Device fingerprint anomaly, IP geolocation anomaly

Cross-ontology composition with Signals-360

Because both ontologies share the SIGDG prefix and BFO grounding, they compose naturally. A governance policy can reference both:

“Any column classified as SIGDG:0070 PaymentCardData (from Signals-360) whose transaction records are scored SIGDG:1053 HighRisk (from this extension) must be encrypted at rest and trigger a Suspicious Activity Report.”

This is the integration that BFO’s top-level alignment makes possible — the column-level metadata classification from Signals and the row-level transaction risk classification from this project are two views of the same governed data, connected through a shared formal ontology.

Keyboard shortcuts

Distributed XGBoost with Dask on CML — Developer's Guide