Use Cases Overview

Our clients are impacted by regulations and risk-based compliance that requires them to meet certain data management standards. Examples of regulation include:

  • Bank Secrecy Act, regulated by the Financial Crimes Enforcement Network (FinCEN)
  • Qualified Financial Contracts (QFCs) reporting, regulated by the Federal Deposit Insurance Corporation (FDIC)
  • FR 2052a Complex Institution Liquidity Monitoring Report, regulated by the Federal Reserve Board
  • AnaCredit, regulated by the European Central Bank

Data Refinery Designer and Workbench allows our clients to comply with these standards by enacting data quality controls to produce a more accurate, consistent, and complete data set. The following use cases demonstrate how our clients can use Data Refinery Designer and Workbench to comply with reporting needs, evaluate counterparty risk, and ultimately improve the accuracy of their data.

Table of contents

Data Consistency

Data consistency improves the quality of data by ensuring values are structured, aligned, and uniform within or across similar fields. Consistent data can be trusted and relied upon for making informed decisions, while inconsistencies can lead to poor business decisions and undecided results. Data Refinery Designer is used for data analysis in order to find inconsistencies. Then, Data Refinery Designer or Workbench are used for remediation.

Classification Consistency

Our clients face regulatory compliance when dealing with industry classifications for clients, accounts, or counterparties. Clients use industry classifications for risk-based reporting and analysis to make well-informed decisions and increase revenue streams. In addition, clients leverage industry classifications to create and supplement their own internal entity classifications.

With global regulations requiring different industry classifications, it is common practice for our clients to manage more than one industry classification. When doing so, it is crucial that classifications are consistent with one another. When inconsistencies arise, it becomes increasingly challenging to determine what classification is accurate and what classification to report on.

Overview

This use case demonstrates a client managing three industry classifications: North American Industry Classification System (NAICS), Standard Industrial Classification (SIC), and Nomenclature of Economic Activities (NACE) based on reporting needs. The client is non-compliant related to industry classifications and needs to identify where they have issues. The Data Refinery offering includes a Classification Consistency Model to help identify classification inconsistencies. It features an expected cross-reference amongst three industry classifications. The Classification Consistency Model performs the following:

  • Compare to Cross Reference. The client classifications are compared to the Data Refinery cross reference.
  • Results. Results are shown in a dashboard and each result is provided with a consistency of high, medium, or low.
  • Remediate. Results with a consistency of medium or low should be remediated.
    • If the model is able to recommend a classification, then the remediation can be automated with Data Refinery Designer.
    • Otherwise, classification results can be queued in Data Refinery Workbench to manually review and remediate.

The consistency model performs a check on the following industry classifications:

  • NAICS. North American Industry Classification System is the standard used by Federal statistical agencies for classifying businesses for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy.
  • SIC. Standard Industrial Classification is an outdated industry classification, replaced by NAICS, used to classify businesses. However, it is still widely used in the industry as a supplemental classification.
  • NACE. Nomenclature of Economic Activities is a European classification system used to classify businesses.

Compare to Cross Reference

Data uploaded to Data Refinery Designer can be cross referenced to compare the NAICS, SIC, and NACE classifications.
The classifications are similar, but not identical. This means that a 1:1 mapping of classifications does not always exist and that one classification can be aligned to multiple classifications of another type. The following table provides an example:

NAICS NAICS Description NACE NACE Description SIC SIC Description
221122 Electric Power Distribution 35.13 Distribution of Electricity 4911 Electric Services
        4931 Electric and Other Services Combined
        4939 Combination Utilities, NEC
    35.14 Trade of Electricity 4911 Electric Services
        4931 Electric and Other Services Combined
        4939 Combination Utilities, NEC

Consistency Results

Results are shown in a dashboard and each result contains a consistency of high, medium, or low. If the consistency model is able to determine which classification is inconsistent, then the classification is provided. If able, a recommended classification is provided to help automate the remediation effort.

  • High. All classifications are aligned in the cross reference and are considered consistent.
ID Name Consistency NAICS NAICS Description NACE NACE Description SIC SIC Description Inconsistent Classification Recommended Classification Recommended Description
203871484 SP Distribution PLC High 221122 Electric Power Distribution 35.13 Distribution of Electricity 4911 Electric Services
  • Medium. One or more of the classifications are not aligned in the cross reference and are considered inconsistent.
ID Name Consistency NAICS NAICS Description NACE NACE Description SIC SIC Description Inconsistent Classification Recommended Classifications Recommended Descriptions
945733216 FirstEnergy Corp Medium 221122 Electric Power Distribution 35.13 Distribution of Electricity 6282 Investment Advice SIC 4911 Electric Services
                    4931 Electric and Other Services Combined
                    4939 Combination Utilities, NEC
  • Low. None of the classifications are aligned in the cross reference and are all considered inconsistent.
ID Name Consistency NAICS NAICS Description NACE NACE Description SIC SIC Description Inconsistent Classification Recommended Classification Recommended Description
329865418 OGE Energy Corp Low 221122 Electric Power Distribution 85.11 Hospital activities 6282 Investment Advice ALL

Remediation

The client should consider remediation via Data Refinery Designer for consistency results of medium or low. If results have a recommended classification provided by the model, then the client can choose to accept the result and apply the update. For results without a recommended classification, the client can queue the results in Data Refinery Workbench to manually review and remediate issues.

Address Consistency

Our clients face regulatory and risk-based compliance when dealing with address data as it is used to identify a client, account, or counterparty. The location of the entity can also dictate specific regulations, restrictions, and reporting standards based on the country or region. Data Refinery Designer is used in the data analysis to find address consistency issues. Data Refinery Designer or Workbench can be used to remediate these issues.

Most clients follow ISO standards, such as ISO 3166-1 (country) and ISO 3166-2 (state province), to conform to international standards and meet regulatory compliance. However, addresses are complicated and contain multiple fields that need to remain consistent. In addition, some address fields are free text, which can lead to inconsistent terms, abbreviations, and values across similar or identical addresses. Clients may also manage more than one address type, such as physical and registered addresses, which adds a layer of complexity.

Overview

This use case demonstrates a client managing two address types: physical and registered. They have found to be non-compliant on having incorrect locations on counterparties and need to identify where they have issues. The Data Refinery offering includes an Address Consistency Model to help identify address inconsistencies. It compares United States addresses to United States Postal Service data and evaluates international addresses. The Address Consistency Model performs the following:

  • Compare Address to USPS. United States addresses are matched to the USPS external data source to determine inconsistencies.
  • Evaluate International City & State. International addresses are evaluated based on city and state to determine inconsistencies.
  • Evaluate International City & Postal. International addresses are evaluated based on city and postal to determine inconsistencies.
  • Results. Results are shown in a dashboard for both United States and International address inconsistencies.
  • Remediate. USPS addresses can be used to remediate and automate the United States address results using Data Refinery Designer. Data Refinery Workbench can be used to manually review and remediate both United States and International addresses.

Compare Address to USPS

The client addresses in the United States are compared to the USPS external data source by matching on postal. The results determine inconsistencies with the address based on the combination of city, state, and postal. One or more of the address fields is inaccurate and needs to be updated. The following table provides an example:

Note. The Entity Count provides the number of impacted entities that contain the address.

Address Type State City Postal Code USPS State USPS City USPS Postal Code Entity Count
Physical Alabama Birmingham 35223 Alabama Vestavia 35223 9
      35243   Vestavia 35243 12
      35244   Hoover 35243 2

Evaluate International City & State

International addresses are evaluated to determine where a city is aligned to more than one state province within the same country. In the example below, it is recommended to use “Lombardia” as the state province based on 98% of the results have this value.

Address Type Country City State Provinces Entity Count
Registered Italy Milano Abruzzo 1
      Lazio 1
      Lombardia 120
      Veneto 1

Evaluate International City & Postal

International addresses are evaluated to determine where a postal code is aligned to more than one city within the same country. In the example below, it is recommended to use “New South Wales” as the state province and “Sydney” as the city based on 99% of the results having this combination.

Address Type Country Postal State Province City/Cities Entity Counts
Physical Australia 2000 Australian Capital Territory Sydney 1
      New South Wales Barangaroo 1
        North Sydney 1
        Sydney 260

Consistency Results

Results are shown in a dashboard for both United States and International address inconsistencies. A breakdown by address type is also provided.

Remediation

The client should consider remediation of all inconsistencies found. The client can use the USPS address data to automate the remediation for United States inconsistencies. The client can leverage the entity count to determine the most likely result to automate for International addresses. Otherwise, the results can be queued in Data Refinery Workbench to manually review and remediate.

Data Enrichment

Data enrichment improves the usefulness of data by leveraging external data sources and analysis techniques to add new fields or values that are otherwise not available. Enrichment provides a better overall picture of the data making it more reliable and accurate. Changes in regulation or reporting compliance may require a client to quickly shift to onboarding a new source or attribute, where data enrichment can efficiently streamline this process.

Classification Enrichment

Our clients must be flexible when new regulation and compliance requires them to report on additional classifications and attributes. This forces our clients to enrich their data and onboard new classifications. Even though the client may already manage industry classifications, it is challenging, time consuming, and can lead to inaccuracies when mapping an existing classification to a new one. In addition, it requires existing classifications to be accurate, otherwise any inaccuracies are then persisted to the new classification.

Overview

This use case demonstrates a client needing to comply with new regulation related to the FR 2052a Complex Institution Liquidity Monitoring Report. The client only manages NAICS as an industry classification, and needs to onboard FR 2502a to be compliant. The Data Refinery offering includes a Classification Enrichment Model that incorporates NAICS recommendations to help onboard FR 2052a classifications. The Classification Enrichment Model takes the client’s current data and recommends FR 2052a classifications based on the following criteria:

  • External Data Sources. Client data is matched to a number of external data sources that are used to determine a classification.
  • Evaluate Name. The entity name is evaluated using analysis techniques and methods, such as name parsing and keyword identification to help determine a classification.
  • Evaluate Data Points. Other classifications and data points, such as NAICS, SIC, NACE, and entity type are used in determining a classification.
  • Recommendations. A confidence level of high, medium, or low is provided for each recommendation based on the strength of criteria used for evaluation.
  • Enrich Data. The client may use Designer to choose what recommendations to accept based on confidence, and may queue lower confidence recommendations in Data Refinery Workbench for manual remediation.

The enrichment model produces recommendations for the following industry classifications:

  • FR 2052a. Federal Reserve 2052a report contains counterparty classification types used as additional information to monitor the liquidity data for banks.
  • NAICS. North American Industry Classification System is the standard used by Federal statistical agencies for classifying businesses for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy.

External Data Sources

The Classification Enrichment Model uses regulatory data sets and publicly available sources to provide a FR 2052a recommendation. This requires matching the client data to the external data source using name, address, and(or) company identifier. Standardization is performed on matched data points to ensure the highest possible match rate. This includes removing spaces, punctuation, special characters, standardizing common abbreviations and legal structures for entity name.

Evaluate Name

The name of the entity is evaluated using analysis methods, such as name parsing and keyword identification. The enrichment model identifies key terms and naming conventions that are specific to an industry classification type. Inversely, it looks for terms that would suggest a different classification to help combat false positives.

Evaluate Data Points

The Classification Enrichment Model leverages other data points, such as additional classifications and entity type, to help refine and strengthen the results. However, this does rely on the client having reliable accuracy with other classifications, otherwise it can skew the recommendation results.

Recommendations

Classification recommendations are shown in a dashboard and each recommendation has a confidence of high, medium, or low. If the Classification Enrichment Model is unable to recommend a classification based on limited or conflicting information, then no recommendation is provided.

  • High. A recommendation must be matched to a trusted regulatory source or there is a strong agreement between the evaluated criteria.
    • For example, Blackrock Advisors LLC is matched to the regulatory source IAPD Advisors and is a registered investment advisor.
ID Name Confidence NAICS NAICS Description Recommended FR 2052a Recommended NAICS Recommended Description
986415584 Blackrock Advisors, LLC High 523940 Portfolio Management and Investment Advice Investment Company or Advisor 523940 Portfolio Management and Investment Advice
  • Medium. A recommendation is matched to a third-party data source or there is a moderate agreement between the evaluated criteria.
    • For example, Neva Plastics Manufacturing Corp contains the keyword of “Manufacturing” and the NAICS is in the manufacturing sector.
ID Name Confidence NAICS NAICS Description Recommended FR 2052a Recommended NAICS Recommended Description
836529547 Neva Plastics Manufacturing Corp Medium 311511 Fluid Milk Manufacturing Non-Financial Corporate 311511 Fluid Milk Manufacturing
  • Low. A recommendation has minimal agreement between the evaluated criteria.
    • For example, Brahman C.P.F. Partners, L.P. has naming conventions and structuring similar to a private equity or hedge fund. However, the NAICS of “525910“ is conflicting and suggests it is a mutual fund. A recommended NAICS of “525990” is provided suggesting the current NAICS is inaccurate.
ID Name Confidence NAICS NAICS Description Recommended FR 2052a Recommended NAICS Recommended Description
622781863 Brahman C.P.F. Partners, L.P. Low 525910 Open-End Investment Funds Non-regulated Fund 525990 Other Financial Vehicles

Enrich Data

The client can enrich their data model by adding the recommended FR 2052a classification. They may choose to accept the recommendations without review for the higher confidence levels and may choose to queue lower confidence recommendations in Data Refinery Workbench to manually review. In addition, recommended NAICS is provided and the client can choose to remediate any differences by accepting the recommended NAICS.


Copyright © 2025 Kingland Systems LLC