Sources

Data Refinery Sources represent external systems that provide data to Data Refinery. Sources are created and managed under the Data Refinery Designer (DR Designer) Projects tab. Sources are displayed under Projects as the Source is associated to the Project. A user can only see these Projects and Sources with the proper permissions or by being assigned a User Project Role (UPR). Once a Source is created, use the Source Upload action to bind the Source to its external data. Sources can also be managed through the Data Refinery Designer API (DR Designer API).

Table of contents

How to Create a Source

Data is made available for querying in Data Refinery through Data Refinery Sources. Follow the steps below to create a new Source.

  1. Once logged into Data Refinery Designer, the Projects page will load with the Projects list appearing in a column to the left of the screen. A user should select the desired Project from the list in order to create a new Source.

    If a user needs to create a new Project, see How to Create a Project.

  2. Once the Project has been selected, Project details should appear to the right of the Projects list. Click the Create Source button.

    Create Source Button

  3. Fill out the fields in the “Create Source” form.

    Source Form Example

    • Name: Required. Preferred Name of the Source.
    • Description: Required. Additional details for the Source, explaining Type, and including any special characters.
    • Type: Required. Source types indicate characteristics of the data bound to the Source such as whether it is external, mixed, or finished for external use. Although informational for now, this value may be used eventually to support differing Source behavior.
      • External: Outside system data.
      • Blended: Mix of data or Sources to create a new Source.
      • Mastered: Reporting or dashboard data in its final form.

      Note. Source Types can be configured.

    • Classification: Required. File type of data to be uploaded. Options include: JSON, JSONL, CSV, ORC and parquet.
  4. Click Create at the bottom of the form.

    Once the Source has been created, it will appear in the Sources section of the Project details.

    SourceListEx

    Refer to the POST (Create a Source) API in the DR Designer API Projects Reference for instructions on how to create Sources programmatically.

How to Update Source Details

Any user who can view a Source in DR Designer has the ability to update Source details. To edit information in an existing Source, follow the steps below.

  1. Select the Projects tab in the navigational bar at the top of the page.

  2. To locate the desired Source, a user must first search for the Project that the Source has an association to. A user can do this by scrolling down the list or typing the Project name in the search bar. Select the Project among the list to the left of the page and click the arrow to reveal a dropdown list of associated Sources.

    Project Arrow Button

  3. Select the Source that needs to be updated.

    Source Selection Under Project

  4. Once the Source details are open, the “Name,” “Type,” and “Description” fields can be updated. This is indicated by the transparent pencil icon to the right of the field.

    When hovering over “Name,” “Type,” or “Description,” a user can select one of these fields to edit.

    • For Source Name, edit the field by selecting the box and typing a new name. All printable ASCII characters are valid.
    • For Source Type, a user must click the field for a drop down arrow to appear. Select the arrow to view the possible selections.
    • For Source Description, edit the field by selecting the box and typing a new description.

    Editable Fields for Source

  5. After making an updated selection, select the green checkmark that appears directly below the “Name,” “Type,” or “Description” fields (see image below) or click away to save the change. Select the ‘X’ to discard the changes.

    Approve Edit Green Checkmark

Refer to the PUT (Update Source) API in the DR Designer API Projects Reference for instructions on how to update Sources programmatically.

How to Upload Source Data

Once a Data Refinery Source is defined, it must be bound to external data before it can support queries. The action to bind external data to a Source is called “Upload.” Follow the steps below to upload Source data. If a Source does not have a classification one will be assigned using the extension of the next file uploaded to the Source.

  1. To locate the desired Source, a user must first search for the Project that the Source has an association to. A user can do this by scrolling down the list or typing the Project name in the search bar on the Projects page. Select the Project among the list to the left of the page and click the arrow to reveal a dropdown list of associated Sources.

    Project Arrow Button

  2. Select the Source a user will be uploading data to.

    Source Selection Under Project

  3. The Source details will open on the right side of the page. Click the Upload Dataset button. This will open an “Upload Dataset” form.

    Upload Dataset Button

  4. Select which version to upload data using the dropdown in the “Upload to Version” field. Then, choose a file to upload by clicking the Choose File button under the “Upload File” field.

    Upload Dataset Form

    Currently supported file types are CSV, JSON, JSONL, ORC, and Parquet.

  5. When the fields are complete, select Create. A green banner will appear at the top of the screen indicating a successful upload of the dataset.

    Success Banner

    The dataset is now uploaded and linked to the Source. However, the uploaded file will not be available to query until AWS Glue crawls the dataset based on configuration settings.

Refer to the POST (Upload data to Source) sources/{ID}/upload API in the DR Designer API Projects Reference for instructions on uploading a Source programmatically.

How to Create a New Source Version

Data Refinery Sources can be versioned, where each Source can have many versions of data. Data Refinery automatically indexes queries against these versions. To create a new Source Version, follow the steps below.

  1. To locate the desired Source, a user must first search for the Project that the Source has an association to. A user can do this by scrolling down the list or typing the Project name in the search bar on the Projects page. Select the Project among the list to the left of the page and click the arrow to reveal a dropdown list of associated Sources.

    Project Arrow Button

  2. Select the Source a user will be creating a new version.

    Source Selection Under Project

  3. The Source details will open on the right side of the page. Click the Create Version button. This will open a “Create Version” form.

    Create Version Button

  4. Add a version comment, if desired, and click the Create button. The new version should appear under the version tab of the Source details view. A user can now upload data to the new version!

    Create Version Form

For instructions on how to create a new Version for a Source programmatically, refer to the POST sources/{ID}/versions API in the DR Designer API Projects Reference.

How to Sample Data

In DR Designer, a limited query is run to show a sample of the data that is uploaded to a Source. This sample is a preview of the data that users can review to ensure the data is in the correct place. Users can view this sample under the Sample Data tab of a Source in DR Designer. To sample the data uploaded to a Source, follow the steps below.

  1. To locate the desired Source, a user must first search for the Project that the Source has an association to. A user can do this by scrolling down the list or typing the Project name in the search bar on the Projects page. Select the Project among the list to the left of the page and click the arrow to reveal a dropdown list of associated Sources.

    Project Arrow Button

  2. Select the Source that a user would like to sample the data.

    Source Selection Under Project

  3. The Source details will open to the right of the page. Click the Sample Data tab.

    Sample Data Button

  4. If the data has been uploaded and processed in the database, meaning AWS Glue has crawled the dataset, a table will automatically appear with the first ten entries.

    Sample Data Table

How to Delete a Source

To delete a Source, follow the steps below.

  1. On the Projects page, navigate to the desired Project and select the drop down arrow to expose the existing associated Sources.

    Project Arrow Button

  2. Select which Source a user would like to delete. This will open Source details on the right side of the page.

    Source Selection Under Project

  3. Click the Delete Source button in the top right-hand corner of the page.

    Delete Source Button

  4. A dialog will appear, requiring the user to acknowledge that all Project associations and Source versions will be removed before deleting the Source. The user must select both options, enabling a Delete button that completes the task.

    Source Delete Button

The data will remain queryable, but no new data will be uploaded. As a result, no new schema changes will be queryable. The Source is removed asynchronously, and removal may fail without notice.

Refer to the DELETE (Delete Source) API in the DR Designer API Projects Reference for instructions on how to programmatically delete a Source.

How to Manage Project-Source Associations

Project-Source associations link or un-link Sources to Projects for querying. A user with an Owner UPR or PROJECT_ADMIN permission has the authority to make these associations. See the Editing Project-Source Associations section on the Projects page for more information.


Copyright © 2025 Kingland Systems LLC