Use Cases Overview

Our clients are impacted by regulations and risk-based compliance that requires them to meet certain data management standards. Examples of regulation include:

Bank Secrecy Act, regulated by the Financial Crimes Enforcement Network (FinCEN)
Qualified Financial Contracts (QFCs) reporting, regulated by the Federal Deposit Insurance Corporation (FDIC)
FR 2052a Complex Institution Liquidity Monitoring Report, regulated by the Federal Reserve Board
AnaCredit, regulated by the European Central Bank

Data Refinery Designer and Workbench allows our clients to comply with these standards by enacting data quality controls to produce a more accurate, consistent, and complete data set. The following use cases demonstrate how our clients can use Data Refinery Designer and Workbench to comply with reporting needs, evaluate counterparty risk, and ultimately improve the accuracy of their data.

Table of contents

Data Consistency
- Classification Consistency
- Address Consistency
Data Enrichment
- Classification Enrichment

Data Consistency

Data consistency improves the quality of data by ensuring values are structured, aligned, and uniform within or across similar fields. Consistent data can be trusted and relied upon for making informed decisions, while inconsistencies can lead to poor business decisions and undecided results. Data Refinery Designer is used for data analysis in order to find inconsistencies. Then, Data Refinery Designer or Workbench are used for remediation.

Classification Consistency

Our clients face regulatory compliance when dealing with industry classifications for clients, accounts, or counterparties. Clients use industry classifications for risk-based reporting and analysis to make well-informed decisions and increase revenue streams. In addition, clients leverage industry classifications to create and supplement their own internal entity classifications.

With global regulations requiring different industry classifications, it is common practice for our clients to manage more than one industry classification. When doing so, it is crucial that classifications are consistent with one another. When inconsistencies arise, it becomes increasingly challenging to determine what classification is accurate and what classification to report on.

Overview

This use case demonstrates a client managing three industry classifications: North American Industry Classification System (NAICS), Standard Industrial Classification (SIC), and Nomenclature of Economic Activities (NACE) based on reporting needs. The client is non-compliant related to industry classifications and needs to identify where they have issues. The Data Refinery offering includes a Classification Consistency Model to help identify classification inconsistencies. It features an expected cross-reference amongst three industry classifications. The Classification Consistency Model performs the following:

Compare to Cross Reference. The client classifications are compared to the Data Refinery cross reference.
Results. Results are shown in a dashboard and each result is provided with a consistency of high, medium, or low.
Remediate. Results with a consistency of medium or low should be remediated.
- If the model is able to recommend a classification, then the remediation can be automated with Data Refinery Designer.
- Otherwise, classification results can be queued in Data Refinery Workbench to manually review and remediate.

The consistency model performs a check on the following industry classifications:

NAICS. North American Industry Classification System is the standard used by Federal statistical agencies for classifying businesses for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy.
SIC. Standard Industrial Classification is an outdated industry classification, replaced by NAICS, used to classify businesses. However, it is still widely used in the industry as a supplemental classification.
NACE. Nomenclature of Economic Activities is a European classification system used to classify businesses.

Compare to Cross Reference

Data uploaded to Data Refinery Designer can be cross referenced to compare the NAICS, SIC, and NACE classifications.
The classifications are similar, but not identical. This means that a 1:1 mapping of classifications does not always exist and that one classification can be aligned to multiple classifications of another type. The following table provides an example:

NAICS	NAICS Description	NACE	NACE Description	SIC	SIC Description
221122	Electric Power Distribution	35.13	Distribution of Electricity	4911	Electric Services
				4931	Electric and Other Services Combined
				4939	Combination Utilities, NEC
		35.14	Trade of Electricity	4911	Electric Services
				4931	Electric and Other Services Combined
				4939	Combination Utilities, NEC

Consistency Results

Results are shown in a dashboard and each result contains a consistency of high, medium, or low. If the consistency model is able to determine which classification is inconsistent, then the classification is provided. If able, a recommended classification is provided to help automate the remediation effort.

High. All classifications are aligned in the cross reference and are considered consistent.

ID	Name	Consistency	NAICS	NAICS Description	NACE	NACE Description	SIC	SIC Description	Inconsistent Classification	Recommended Classification	Recommended Description
203871484	SP Distribution PLC	High	221122	Electric Power Distribution	35.13	Distribution of Electricity	4911	Electric Services	–	–	–

Medium. One or more of the classifications are not aligned in the cross reference and are considered inconsistent.

ID	Name	Consistency	NAICS	NAICS Description	NACE	NACE Description	SIC	SIC Description	Inconsistent Classification	Recommended Classifications	Recommended Descriptions
945733216	FirstEnergy Corp	Medium	221122	Electric Power Distribution	35.13	Distribution of Electricity	6282	Investment Advice	SIC	4911	Electric Services
										4931	Electric and Other Services Combined
										4939	Combination Utilities, NEC

Low. None of the classifications are aligned in the cross reference and are all considered inconsistent.

ID	Name	Consistency	NAICS	NAICS Description	NACE	NACE Description	SIC	SIC Description	Inconsistent Classification	Recommended Classification	Recommended Description
329865418	OGE Energy Corp	Low	221122	Electric Power Distribution	85.11	Hospital activities	6282	Investment Advice	ALL	–	–

Remediation

The client should consider remediation via Data Refinery Designer for consistency results of medium or low. If results have a recommended classification provided by the model, then the client can choose to accept the result and apply the update. For results without a recommended classification, the client can queue the results in Data Refinery Workbench to manually review and remediate issues.

Address Consistency

Our clients face regulatory and risk-based compliance when dealing with address data as it is used to identify a client, account, or counterparty. The location of the entity can also dictate specific regulations, restrictions, and reporting standards based on the country or region. Data Refinery Designer is used in the data analysis to find address consistency issues. Data Refinery Designer or Workbench can be used to remediate these issues.

Most clients follow ISO standards, such as ISO 3166-1 (country) and ISO 3166-2 (state province), to conform to international standards and meet regulatory compliance. However, addresses are complicated and contain multiple fields that need to remain consistent. In addition, some address fields are free text, which can lead to inconsistent terms, abbreviations, and values across similar or identical addresses. Clients may also manage more than one address type, such as physical and registered addresses, which adds a layer of complexity.

Overview

This use case demonstrates a client managing two address types: physical and registered. They have found to be non-compliant on having incorrect locations on counterparties and need to identify where they have issues. The Data Refinery offering includes an Address Consistency Model to help identify address inconsistencies. It compares United States addresses to United States Postal Service data and evaluates international addresses. The Address Consistency Model performs the following:

Compare Address to USPS. United States addresses are matched to the USPS external data source to determine inconsistencies.
Evaluate International City & State. International addresses are evaluated based on city and state to determine inconsistencies.
Evaluate International City & Postal. International addresses are evaluated based on city and postal to determine inconsistencies.
Results. Results are shown in a dashboard for both United States and International address inconsistencies.
Remediate. USPS addresses can be used to remediate and automate the United States address results using Data Refinery Designer. Data Refinery Workbench can be used to manually review and remediate both United States and International addresses.

Compare Address to USPS

The client addresses in the United States are compared to the USPS external data source by matching on postal. The results determine inconsistencies with the address based on the combination of city, state, and postal. One or more of the address fields is inaccurate and needs to be updated. The following table provides an example:

Note. The Entity Count provides the number of impacted entities that contain the address.

Address Type	State	City	Postal Code	USPS State	USPS City	USPS Postal Code	Entity Count
Physical	Alabama	Birmingham	35223	Alabama	Vestavia	35223	9
			35243		Vestavia	35243	12
			35244		Hoover	35243	2

Evaluate International City & State

International addresses are evaluated to determine where a city is aligned to more than one state province within the same country. In the example below, it is recommended to use “Lombardia” as the state province based on 98% of the results have this value.

Address Type	Country	City	State Provinces	Entity Count
Registered	Italy	Milano	Abruzzo	1
			Lazio	1
			Lombardia	120
			Veneto	1

Evaluate International City & Postal

International addresses are evaluated to determine where a postal code is aligned to more than one city within the same country. In the example below, it is recommended to use “New South Wales” as the state province and “Sydney” as the city based on 99% of the results having this combination.

Address Type	Country	Postal	State Province	City/Cities	Entity Counts
Physical	Australia	2000	Australian Capital Territory	Sydney	1
			New South Wales	Barangaroo	1
				North Sydney	1
				Sydney	260

Consistency Results

Results are shown in a dashboard for both United States and International address inconsistencies. A breakdown by address type is also provided.

Remediation

The client should consider remediation of all inconsistencies found. The client can use the USPS address data to automate the remediation for United States inconsistencies. The client can leverage the entity count to determine the most likely result to automate for International addresses. Otherwise, the results can be queued in Data Refinery Workbench to manually review and remediate.

Data Enrichment

Data enrichment improves the usefulness of data by leveraging external data sources and analysis techniques to add new fields or values that are otherwise not available. Enrichment provides a better overall picture of the data making it more reliable and accurate. Changes in regulation or reporting compliance may require a client to quickly shift to onboarding a new source or attribute, where data enrichment can efficiently streamline this process.

Classification Enrichment

Our clients must be flexible when new regulation and compliance requires them to report on additional classifications and attributes. This forces our clients to enrich their data and onboard new classifications. Even though the client may already manage industry classifications, it is challenging, time consuming, and can lead to inaccuracies when mapping an existing classification to a new one. In addition, it requires existing classifications to be accurate, otherwise any inaccuracies are then persisted to the new classification.

Overview

This use case demonstrates a client needing to comply with new regulation related to the FR 2052a Complex Institution Liquidity Monitoring Report. The client only manages NAICS as an industry classification, and needs to onboard FR 2502a to be compliant. The Data Refinery offering includes a Classification Enrichment Model that incorporates NAICS recommendations to help onboard FR 2052a classifications. The Classification Enrichment Model takes the client’s current data and recommends FR 2052a classifications based on the following criteria:

External Data Sources. Client data is matched to a number of external data sources that are used to determine a classification.
Evaluate Name. The entity name is evaluated using analysis techniques and methods, such as name parsing and keyword identification to help determine a classification.
Evaluate Data Points. Other classifications and data points, such as NAICS, SIC, NACE, and entity type are used in determining a classification.
Recommendations. A confidence level of high, medium, or low is provided for each recommendation based on the strength of criteria used for evaluation.
Enrich Data. The client may use Designer to choose what recommendations to accept based on confidence, and may queue lower confidence recommendations in Data Refinery Workbench for manual remediation.

The enrichment model produces recommendations for the following industry classifications:

FR 2052a. Federal Reserve 2052a report contains counterparty classification types used as additional information to monitor the liquidity data for banks.
NAICS. North American Industry Classification System is the standard used by Federal statistical agencies for classifying businesses for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy.

External Data Sources

The Classification Enrichment Model uses regulatory data sets and publicly available sources to provide a FR 2052a recommendation. This requires matching the client data to the external data source using name, address, and(or) company identifier. Standardization is performed on matched data points to ensure the highest possible match rate. This includes removing spaces, punctuation, special characters, standardizing common abbreviations and legal structures for entity name.

Evaluate Name

The name of the entity is evaluated using analysis methods, such as name parsing and keyword identification. The enrichment model identifies key terms and naming conventions that are specific to an industry classification type. Inversely, it looks for terms that would suggest a different classification to help combat false positives.

Evaluate Data Points

The Classification Enrichment Model leverages other data points, such as additional classifications and entity type, to help refine and strengthen the results. However, this does rely on the client having reliable accuracy with other classifications, otherwise it can skew the recommendation results.

Recommendations

Classification recommendations are shown in a dashboard and each recommendation has a confidence of high, medium, or low. If the Classification Enrichment Model is unable to recommend a classification based on limited or conflicting information, then no recommendation is provided.

High. A recommendation must be matched to a trusted regulatory source or there is a strong agreement between the evaluated criteria.
- For example, Blackrock Advisors LLC is matched to the regulatory source IAPD Advisors and is a registered investment advisor.

ID	Name	Confidence	NAICS	NAICS Description	Recommended FR 2052a	Recommended NAICS	Recommended Description
986415584	Blackrock Advisors, LLC	High	523940	Portfolio Management and Investment Advice	Investment Company or Advisor	523940	Portfolio Management and Investment Advice

Medium. A recommendation is matched to a third-party data source or there is a moderate agreement between the evaluated criteria.
- For example, Neva Plastics Manufacturing Corp contains the keyword of “Manufacturing” and the NAICS is in the manufacturing sector.

ID	Name	Confidence	NAICS	NAICS Description	Recommended FR 2052a	Recommended NAICS	Recommended Description
836529547	Neva Plastics Manufacturing Corp	Medium	311511	Fluid Milk Manufacturing	Non-Financial Corporate	311511	Fluid Milk Manufacturing

Low. A recommendation has minimal agreement between the evaluated criteria.
- For example, Brahman C.P.F. Partners, L.P. has naming conventions and structuring similar to a private equity or hedge fund. However, the NAICS of “525910“ is conflicting and suggests it is a mutual fund. A recommended NAICS of “525990” is provided suggesting the current NAICS is inaccurate.

ID	Name	Confidence	NAICS	NAICS Description	Recommended FR 2052a	Recommended NAICS	Recommended Description
622781863	Brahman C.P.F. Partners, L.P.	Low	525910	Open-End Investment Funds	Non-regulated Fund	525990	Other Financial Vehicles

Enrich Data

The client can enrich their data model by adding the recommended FR 2052a classification. They may choose to accept the recommendations without review for the higher confidence levels and may choose to queue lower confidence recommendations in Data Refinery Workbench to manually review. In addition, recommended NAICS is provided and the client can choose to remediate any differences by accepting the recommended NAICS.