Export Source Data

This document will explain how to export Source information from the Data Warehouse and how to re-upload the exported data to a new Source. The export of this data will require setting up a database tool, such as DBeaver. See the Query the Data Warehouse page for help on how to do this.

Additionally, to export the Source data, Sources must be uploaded to the Data Warehouse. See Set-up and Query Source Versions for further instruction on this.

Table of contents

About the Example

Two versions of the same Source customer_table will be used for this example. The two versions of the customer data uploaded have the same number and naming of columns. However, the first version does not have language data and the second version does not have any job_title data.

Moreover, not all of the last names match. So assume, for this example, the first version is more accurate than the second. However, the first version will not always have last names populated. So, if a last_name exists in the first version, then that will be used. Otherwise, the last_name from second version will be used.

Query the Source Data Versions

  1. From the database tool, open the table and view the two versions of your Source. Sort by id to easily view the versions.

    Query the Source Data Versions

  2. Create a query that compares the table to itself.

    Query the Source Data Versions

  3. Adjust the query to alter the result set. In this query, there are three changes being made to the data.

    • First, the last_name field is being populated by the first version, unless the field is null. In which case, the second version last_name field would be used. In this case, the query above Matias (id = 5), has a different last_name value for each version and this query has made the first version a priority for this field.
    • Second, the first version has the job_title field populated and the second version does not. So, the first version is used to populate that field in the result set.
    • Third, the second version has the language field populated and the first version does not. So, the second version is used to populate the language field in the result set.

    Query the Source Data Versions

Export Queried Source Data

  1. Right click the result set in the database tool and click Export Data.
  2. Then, click the Export to CSV file option and click Next.

    Export Queried Source Data

  3. If the result set is over 10,000 rows, click the Extract type dropdown and select By segments to increase or reduce “Fetch size” as needed. When done, click Next.

    Export Queried Source Data

  4. For the Format settings, the default settings for a csv are sufficient.

    Export Queried Source Data

  5. Next, on the Output page, there is the “Directory” field that represents where the file will be saved. This can be changed as needed. The file name field uses the table name and timestamp as the default value for the name, but this can also be changed as needed. Once the needed changes have been made, click Next.

    Export Queried Source Data

  6. Verify Source and Target settings then confirm by clicking Proceed.

    Export Queried Source Data

Upload Refined Source Data and View All Sources

Create a third version for the Source and upload the new file to it. For instructions on how to do this, see the Create a New Version and Upload Data to the New Version sections of the Set-up and Query Versions page.

Glue Crawlers are scheduled to run every hour. The uploaded data will be available to view in the database tool once the Glue Crawler has finished traversing the new data.

After the Crawler has run, open the database tool and refresh the table. Open the customers_table and sort by id. All three versions of the Source, including the refined Source version, can be found here.

Export Queried Source Data


Copyright © 2025 Kingland Systems LLC