SIGDG Financial Transaction Ontology
The Signals Data Governance ontology (prefix SIGDG) is grounded in BFO 2020 (Basic Formal Ontology, ISO/IEC 21838-2:2021). BFO provides the upper-level categories that make ontologies from separate teams composable — preventing the class of integration failures Barry Smith illustrates with the 2006 Airbus A380 wiring debacle, where incompatible data representations across engineering teams cost $6 billion to correct. A shared top-level ontology is the mechanism that makes data descriptions joinable without ad-hoc reconciliation.
This page defines the financial transaction risk extension of SIGDG. It expands the existing SIGDG:0050 TransactionInformation node — which the Signals-360 metadata classification project defines as a leaf — into a full subtree covering fraud detection, transaction context, risk indicators, and regulatory scope. Both ontologies share the same CURIE prefix and BFO grounding, so categories from either project can be composed in a single governance policy.
BFO Grounding
BFO:0000031 generically dependent continuant
└── SIGDG:0001 InformationEntity ── "data that can be stored and transferred"
└── SIGDG:0050 TransactionInformation ── "event records: orders, sessions, timestamps"
└── SIGDG:0100 FinancialTransaction ── "monetary transfer event between parties"
BFO:0000019 quality
├── SIGDG:1001 SensitivityLevel ── "inheres in an information entity"
└── SIGDG:1050 TransactionRiskLevel ── "inheres in a financial transaction record"
BFO:0000023 role
├── SIGDG:2001 DataSubjectRole ── "externally grounded in governance context"
└���─ SIGDG:2040 RegulatoryFrameworkRole ── "externally grounded in regulatory scope"
BFO:0000015 process
├── SIGDG:3001 DataLifecycleProcess ── "transformation, migration, classification"
└── SIGDG:3040 FraudDetectionProcess ── "scoring, retraining, threshold calibration"
BFO’s generically dependent continuant (GDC) is the natural home for information entities — they depend on some material carrier (disk, memory, wire) but can be copied and transferred between carriers. A financial transaction record is a GDC: it describes a real-world monetary event but exists independently of any single storage medium. Risk levels are qualities that inhere in transaction records. Regulatory scope is a role — the same transaction carries different obligations under PCI-DSS vs. AML/KYC.
CURIE Prefix
SIGDG → https://signals360.dev/ontology/dg/
BFO → http://purl.obolibrary.org/obo/BFO_
The SIGDG prefix is shared with the Signals-360 metadata governance ontology. Code ranges are partitioned to avoid collision:
| Range | Owner | Domain |
|---|---|---|
0001–0091 | Signals-360 | Metadata column classification |
0100–0199 | This project | Financial transaction risk |
1010–1040 | Signals-360 | Data sensitivity levels |
1050–1059 | This project | Transaction risk levels |
2010–2030 | Signals-360 | Data subject roles |
2040–2049 | This project | Regulatory framework roles |
3010–3030 | Signals-360 | Data lifecycle processes |
3040–3049 | This project | Fraud detection processes |
Financial Transaction Hierarchy
SIGDG:0050 TransactionInformation (⊑ SIGDG:0001, from Signals-360)
└── SIGDG:0100 FinancialTransaction "monetary transfer event between parties"
├── SIGDG:0110 TransactionRiskClassification "risk assessment outcome for a transaction"
│ ├── SIGDG:0111 LegitimateTransaction "transaction consistent with cardholder behavior"
│ └── SIGDG:0112 FraudulentTransaction "unauthorized or deceptive transaction"
│ ├── SIGDG:0113 CardNotPresentFraud "fraud via remote channel without physical card"
│ ├── SIGDG:0114 CounterfeitCardFraud "fraud using cloned or fabricated card"
│ ├── SIGDG:0115 LostStolenCardFraud "fraud using a card obtained without consent"
│ └── SIGDG:0116 AccountTakeoverFraud "fraud via compromised account credentials"
├── SIGDG:0120 TransactionContext "observable properties of a transaction event"
│ ├── SIGDG:0121 MonetaryContext "amount, currency, denomination"
│ ├── SIGDG:0122 TemporalContext "timestamp, time-of-day, day-of-week patterns"
│ ├── SIGDG:0123 BehavioralContext "latent patterns derived from transaction history"
│ └── SIGDG:0124 MerchantContext "merchant category, location, channel"
└── SIGDG:0130 RiskIndicator "anomalous pattern suggesting elevated risk"
├── SIGDG:0131 AmountAnomaly "amount deviates from cardholder norm"
├── SIGDG:0132 VelocityAnomaly "transaction frequency exceeds cardholder baseline"
├── SIGDG:0133 GeographicAnomaly "location inconsistent with cardholder history"
└���─ SIGDG:0134 BehavioralAnomaly "latent behavioral pattern deviates from norm"
Hierarchy Rationale
TransactionRiskClassification (0110) models the outcome of the fraud detection model. In the current binary implementation, only LegitimateTransaction and FraudulentTransaction are used. The four fraud subtypes (0113–0116) are provided for future multi-class models and for downstream governance policies that distinguish fraud mechanisms — a lost-card fraud may trigger a card replacement workflow while an account takeover triggers a credential reset.
TransactionContext (0120) models what the features describe. The credit card fraud dataset’s 29 features map to this subtree:
| Feature(s) | SIGDG Category | Notes |
|---|---|---|
Amount | SIGDG:0121 MonetaryContext | Raw transaction amount; the only non-PCA feature |
Time (dropped) | SIGDG:0122 TemporalContext | Seconds since first transaction; dropped in preprocessing |
V1–V28 | SIGDG:0123 BehavioralContext | PCA-transformed latent dimensions; original features undisclosed |
| (not in dataset) | SIGDG:0124 MerchantContext | Placeholder for merchant data in future datasets |
The PCA features are deliberately anonymous — the dataset curators applied PCA for confidentiality. Each V-component encodes a linear combination of original transaction attributes. We classify them collectively as BehavioralContext rather than attempting to reverse-engineer individual semantics, because the ontology should describe what the features represent (behavioral patterns), not their mathematical form (principal components).
RiskIndicator (0130) models the evidence supporting a risk classification. These correspond to interpretable signals that a human analyst would recognize:
| Indicator | What it captures | Dataset proxy |
|---|---|---|
AmountAnomaly | Unusually large or small transaction | Extreme values in Amount |
VelocityAnomaly | Burst of transactions in short window | (requires Time; not used in current model) |
GeographicAnomaly | Transaction from unexpected location | (encoded in PCA components; not directly observable) |
BehavioralAnomaly | Deviation from established spending pattern | High-magnitude PCA components |
Not all indicators are directly observable in the current dataset. The ontology defines them for completeness — a production fraud system would populate all four from raw transaction data.
Transaction Risk Levels
Risk level is modeled as a BFO quality (BFO:0000019) that inheres in a financial transaction record. This parallels the sensitivity levels in Signals-360 (SIGDG:1010–1040), but describes fraud risk rather than data sensitivity.
| CURIE | Code | Label | Definition | Threshold Guidance |
|---|---|---|---|---|
SIGDG:1051 | 1051 | LowRisk | Transaction consistent with all behavioral baselines | Model score ≤ 0.10 |
SIGDG:1052 | 1052 | ElevatedRisk | Transaction deviates on one or more risk indicators | 0.10 < score ≤ 0.35 |
SIGDG:1053 | 1053 | HighRisk | Transaction strongly suggests fraudulent intent | 0.35 < score ≤ 0.80 |
SIGDG:1054 | 1054 | ConfirmedFraud | Transaction verified as unauthorized post-investigation | score > 0.80 or manual label |
The current binary model applies a single threshold at 0.35 (the boundary between ElevatedRisk and HighRisk). The four-level scheme supports graduated response:
- LowRisk: Approve automatically.
- ElevatedRisk: Approve but flag for batch review.
- HighRisk: Require step-up authentication or manual approval.
- ConfirmedFraud: Block and initiate chargeback/investigation.
Relationship to Sensitivity Levels
Transaction risk levels and data sensitivity levels are orthogonal qualities that can co-occur on the same entity:
| Entity | Risk Level | Sensitivity Level | Implication |
|---|---|---|---|
| Transaction record with PAN | HighRisk | Restricted (PCI-DSS) | Block transaction AND encrypt PAN at rest |
| Transaction record without PII | LowRisk | Internal | Approve; standard access controls |
| Aggregated fraud statistics | N/A | Confidential | No per-transaction risk; protect business intelligence |
Regulatory Framework Roles
The same transaction may be subject to multiple regulatory frameworks simultaneously. These are modeled as BFO roles (BFO:0000023) — externally grounded in the legal/regulatory context rather than intrinsic to the data.
| CURIE | Code | Label | Definition |
|---|---|---|---|
SIGDG:2041 | 2041 | PCIDSSScope | Transaction involves payment card data; subject to PCI-DSS requirements |
SIGDG:2042 | 2042 | AMLKYCScope | Transaction subject to Anti-Money Laundering / Know Your Customer rules |
SIGDG:2043 | 2043 | EMVLiabilityScope | Transaction subject to EMV chip liability shift rules |
A single transaction can carry all three roles. The fraud detection model’s output feeds into each framework differently:
- PCI-DSS (
2041): AHighRiskclassification triggers additional logging and encryption requirements for the transaction’s card data fields. - AML/KYC (
2042): Patterns ofElevatedRisktransactions across accounts trigger Suspicious Activity Report (SAR) filing obligations. - EMV Liability (
2043): The party that failed to support chip-based authentication bears fraud liability; the risk classification determines which party initiates the chargeback.
Fraud Detection Processes
Classification and model maintenance are modeled as BFO processes (BFO:0000015):
| CURIE | Code | Label | Definition |
|---|---|---|---|
SIGDG:3041 | 3041 | RealTimeScoring | Sub-second risk scoring at transaction authorization time |
SIGDG:3042 | 3042 | BatchRetraining | Periodic model retraining on accumulated labeled transactions |
SIGDG:3043 | 3043 | ThresholdCalibration | Adjusting the decision boundary to optimize precision-recall tradeoff |
In the current AMP implementation:
- RealTimeScoring corresponds to the
predict_fraud(args)model endpoint. - BatchRetraining corresponds to the notebook-driven distributed XGBoost/CatBoost training pipeline.
- ThresholdCalibration corresponds to the precision-recall curve analysis that selected the 0.35 threshold.
Mapping to the Credit Card Fraud Dataset
The Kaggle credit card fraud dataset maps to this ontology as follows:
Class Label Mapping
The binary Class column in the dataset maps to the first two children of TransactionRiskClassification:
| Class Value | SIGDG Category | Count | Rate |
|---|---|---|---|
| 0 | SIGDG:0111 LegitimateTransaction | 94,777 | 99.84% |
| 1 | SIGDG:0112 FraudulentTransaction | 149 | 0.16% |
The fraud subtypes (0113–0116) are not distinguishable in this dataset — the Kaggle data does not include fraud mechanism labels. A production deployment would label confirmed fraud cases with the appropriate subtype during investigation, enabling future multi-class models.
Extension Points
This ontology is intentionally minimal — it covers exactly what the current dataset and model support, with clearly marked placeholders for production extensions.
Near-term extensions (Phase 2–3 of CatBoost migration)
| Category | What it enables |
|---|---|
MerchantContext (0124) subtypes | MCC-based risk stratification when merchant data is available |
Fraud subtype labels (0113–0116) | Multi-class classification instead of binary |
Additional RiskIndicator subtypes | Device fingerprint anomaly, IP geolocation anomaly |
Cross-ontology composition with Signals-360
Because both ontologies share the SIGDG prefix and BFO grounding, they compose naturally. A governance policy can reference both:
“Any column classified as
SIGDG:0070 PaymentCardData(from Signals-360) whose transaction records are scoredSIGDG:1053 HighRisk(from this extension) must be encrypted at rest and trigger a Suspicious Activity Report.”
This is the integration that BFO’s top-level alignment makes possible — the column-level metadata classification from Signals and the row-level transaction risk classification from this project are two views of the same governed data, connected through a shared formal ontology.