Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SIGDG Financial Transaction Ontology

The Signals Data Governance ontology (prefix SIGDG) is grounded in BFO 2020 (Basic Formal Ontology, ISO/IEC 21838-2:2021). BFO provides the upper-level categories that make ontologies from separate teams composable — preventing the class of integration failures Barry Smith illustrates with the 2006 Airbus A380 wiring debacle, where incompatible data representations across engineering teams cost $6 billion to correct. A shared top-level ontology is the mechanism that makes data descriptions joinable without ad-hoc reconciliation.

This page defines the financial transaction risk extension of SIGDG. It expands the existing SIGDG:0050 TransactionInformation node — which the Signals-360 metadata classification project defines as a leaf — into a full subtree covering fraud detection, transaction context, risk indicators, and regulatory scope. Both ontologies share the same CURIE prefix and BFO grounding, so categories from either project can be composed in a single governance policy.

BFO Grounding

BFO:0000031 generically dependent continuant
  └── SIGDG:0001 InformationEntity             ── "data that can be stored and transferred"
        └── SIGDG:0050 TransactionInformation   ── "event records: orders, sessions, timestamps"
              └── SIGDG:0100 FinancialTransaction  ── "monetary transfer event between parties"

BFO:0000019 quality
  ├── SIGDG:1001 SensitivityLevel              ── "inheres in an information entity"
  └── SIGDG:1050 TransactionRiskLevel          ── "inheres in a financial transaction record"

BFO:0000023 role
  ├── SIGDG:2001 DataSubjectRole               ── "externally grounded in governance context"
  └���─ SIGDG:2040 RegulatoryFrameworkRole       ── "externally grounded in regulatory scope"

BFO:0000015 process
  ├── SIGDG:3001 DataLifecycleProcess          ── "transformation, migration, classification"
  └── SIGDG:3040 FraudDetectionProcess         ── "scoring, retraining, threshold calibration"

BFO’s generically dependent continuant (GDC) is the natural home for information entities — they depend on some material carrier (disk, memory, wire) but can be copied and transferred between carriers. A financial transaction record is a GDC: it describes a real-world monetary event but exists independently of any single storage medium. Risk levels are qualities that inhere in transaction records. Regulatory scope is a role — the same transaction carries different obligations under PCI-DSS vs. AML/KYC.

CURIE Prefix

SIGDG → https://signals360.dev/ontology/dg/
BFO   → http://purl.obolibrary.org/obo/BFO_

The SIGDG prefix is shared with the Signals-360 metadata governance ontology. Code ranges are partitioned to avoid collision:

RangeOwnerDomain
00010091Signals-360Metadata column classification
01000199This projectFinancial transaction risk
10101040Signals-360Data sensitivity levels
10501059This projectTransaction risk levels
20102030Signals-360Data subject roles
20402049This projectRegulatory framework roles
30103030Signals-360Data lifecycle processes
30403049This projectFraud detection processes

Financial Transaction Hierarchy

SIGDG:0050  TransactionInformation                 (⊑ SIGDG:0001, from Signals-360)
└── SIGDG:0100  FinancialTransaction               "monetary transfer event between parties"
    ├── SIGDG:0110  TransactionRiskClassification   "risk assessment outcome for a transaction"
    │   ├── SIGDG:0111  LegitimateTransaction       "transaction consistent with cardholder behavior"
    │   └── SIGDG:0112  FraudulentTransaction        "unauthorized or deceptive transaction"
    │       ├── SIGDG:0113  CardNotPresentFraud      "fraud via remote channel without physical card"
    │       ├── SIGDG:0114  CounterfeitCardFraud     "fraud using cloned or fabricated card"
    │       ├── SIGDG:0115  LostStolenCardFraud      "fraud using a card obtained without consent"
    │       └── SIGDG:0116  AccountTakeoverFraud     "fraud via compromised account credentials"
    ├── SIGDG:0120  TransactionContext               "observable properties of a transaction event"
    │   ├── SIGDG:0121  MonetaryContext              "amount, currency, denomination"
    │   ├── SIGDG:0122  TemporalContext              "timestamp, time-of-day, day-of-week patterns"
    │   ├── SIGDG:0123  BehavioralContext            "latent patterns derived from transaction history"
    │   └── SIGDG:0124  MerchantContext              "merchant category, location, channel"
    └── SIGDG:0130  RiskIndicator                    "anomalous pattern suggesting elevated risk"
        ├── SIGDG:0131  AmountAnomaly                "amount deviates from cardholder norm"
        ├── SIGDG:0132  VelocityAnomaly              "transaction frequency exceeds cardholder baseline"
        ├── SIGDG:0133  GeographicAnomaly            "location inconsistent with cardholder history"
        └���─ SIGDG:0134  BehavioralAnomaly            "latent behavioral pattern deviates from norm"

Hierarchy Rationale

TransactionRiskClassification (0110) models the outcome of the fraud detection model. In the current binary implementation, only LegitimateTransaction and FraudulentTransaction are used. The four fraud subtypes (01130116) are provided for future multi-class models and for downstream governance policies that distinguish fraud mechanisms — a lost-card fraud may trigger a card replacement workflow while an account takeover triggers a credential reset.

TransactionContext (0120) models what the features describe. The credit card fraud dataset’s 29 features map to this subtree:

Feature(s)SIGDG CategoryNotes
AmountSIGDG:0121 MonetaryContextRaw transaction amount; the only non-PCA feature
Time (dropped)SIGDG:0122 TemporalContextSeconds since first transaction; dropped in preprocessing
V1V28SIGDG:0123 BehavioralContextPCA-transformed latent dimensions; original features undisclosed
(not in dataset)SIGDG:0124 MerchantContextPlaceholder for merchant data in future datasets

The PCA features are deliberately anonymous — the dataset curators applied PCA for confidentiality. Each V-component encodes a linear combination of original transaction attributes. We classify them collectively as BehavioralContext rather than attempting to reverse-engineer individual semantics, because the ontology should describe what the features represent (behavioral patterns), not their mathematical form (principal components).

RiskIndicator (0130) models the evidence supporting a risk classification. These correspond to interpretable signals that a human analyst would recognize:

IndicatorWhat it capturesDataset proxy
AmountAnomalyUnusually large or small transactionExtreme values in Amount
VelocityAnomalyBurst of transactions in short window(requires Time; not used in current model)
GeographicAnomalyTransaction from unexpected location(encoded in PCA components; not directly observable)
BehavioralAnomalyDeviation from established spending patternHigh-magnitude PCA components

Not all indicators are directly observable in the current dataset. The ontology defines them for completeness — a production fraud system would populate all four from raw transaction data.

Transaction Risk Levels

Risk level is modeled as a BFO quality (BFO:0000019) that inheres in a financial transaction record. This parallels the sensitivity levels in Signals-360 (SIGDG:10101040), but describes fraud risk rather than data sensitivity.

CURIECodeLabelDefinitionThreshold Guidance
SIGDG:10511051LowRiskTransaction consistent with all behavioral baselinesModel score ≤ 0.10
SIGDG:10521052ElevatedRiskTransaction deviates on one or more risk indicators0.10 < score ≤ 0.35
SIGDG:10531053HighRiskTransaction strongly suggests fraudulent intent0.35 < score ≤ 0.80
SIGDG:10541054ConfirmedFraudTransaction verified as unauthorized post-investigationscore > 0.80 or manual label

The current binary model applies a single threshold at 0.35 (the boundary between ElevatedRisk and HighRisk). The four-level scheme supports graduated response:

  • LowRisk: Approve automatically.
  • ElevatedRisk: Approve but flag for batch review.
  • HighRisk: Require step-up authentication or manual approval.
  • ConfirmedFraud: Block and initiate chargeback/investigation.

Relationship to Sensitivity Levels

Transaction risk levels and data sensitivity levels are orthogonal qualities that can co-occur on the same entity:

EntityRisk LevelSensitivity LevelImplication
Transaction record with PANHighRiskRestricted (PCI-DSS)Block transaction AND encrypt PAN at rest
Transaction record without PIILowRiskInternalApprove; standard access controls
Aggregated fraud statisticsN/AConfidentialNo per-transaction risk; protect business intelligence

Regulatory Framework Roles

The same transaction may be subject to multiple regulatory frameworks simultaneously. These are modeled as BFO roles (BFO:0000023) — externally grounded in the legal/regulatory context rather than intrinsic to the data.

CURIECodeLabelDefinition
SIGDG:20412041PCIDSSScopeTransaction involves payment card data; subject to PCI-DSS requirements
SIGDG:20422042AMLKYCScopeTransaction subject to Anti-Money Laundering / Know Your Customer rules
SIGDG:20432043EMVLiabilityScopeTransaction subject to EMV chip liability shift rules

A single transaction can carry all three roles. The fraud detection model’s output feeds into each framework differently:

  • PCI-DSS (2041): A HighRisk classification triggers additional logging and encryption requirements for the transaction’s card data fields.
  • AML/KYC (2042): Patterns of ElevatedRisk transactions across accounts trigger Suspicious Activity Report (SAR) filing obligations.
  • EMV Liability (2043): The party that failed to support chip-based authentication bears fraud liability; the risk classification determines which party initiates the chargeback.

Fraud Detection Processes

Classification and model maintenance are modeled as BFO processes (BFO:0000015):

CURIECodeLabelDefinition
SIGDG:30413041RealTimeScoringSub-second risk scoring at transaction authorization time
SIGDG:30423042BatchRetrainingPeriodic model retraining on accumulated labeled transactions
SIGDG:30433043ThresholdCalibrationAdjusting the decision boundary to optimize precision-recall tradeoff

In the current AMP implementation:

  • RealTimeScoring corresponds to the predict_fraud(args) model endpoint.
  • BatchRetraining corresponds to the notebook-driven distributed XGBoost/CatBoost training pipeline.
  • ThresholdCalibration corresponds to the precision-recall curve analysis that selected the 0.35 threshold.

Mapping to the Credit Card Fraud Dataset

The Kaggle credit card fraud dataset maps to this ontology as follows:

Class Label Mapping

The binary Class column in the dataset maps to the first two children of TransactionRiskClassification:

Class ValueSIGDG CategoryCountRate
0SIGDG:0111 LegitimateTransaction94,77799.84%
1SIGDG:0112 FraudulentTransaction1490.16%

The fraud subtypes (01130116) are not distinguishable in this dataset — the Kaggle data does not include fraud mechanism labels. A production deployment would label confirmed fraud cases with the appropriate subtype during investigation, enabling future multi-class models.

Extension Points

This ontology is intentionally minimal — it covers exactly what the current dataset and model support, with clearly marked placeholders for production extensions.

Near-term extensions (Phase 2–3 of CatBoost migration)

CategoryWhat it enables
MerchantContext (0124) subtypesMCC-based risk stratification when merchant data is available
Fraud subtype labels (01130116)Multi-class classification instead of binary
Additional RiskIndicator subtypesDevice fingerprint anomaly, IP geolocation anomaly

Cross-ontology composition with Signals-360

Because both ontologies share the SIGDG prefix and BFO grounding, they compose naturally. A governance policy can reference both:

“Any column classified as SIGDG:0070 PaymentCardData (from Signals-360) whose transaction records are scored SIGDG:1053 HighRisk (from this extension) must be encrypted at rest and trigger a Suspicious Activity Report.”

This is the integration that BFO’s top-level alignment makes possible — the column-level metadata classification from Signals and the row-level transaction risk classification from this project are two views of the same governed data, connected through a shared formal ontology.