Skip to main content

On AWS Redshift, it is recommended to ingest data via CSV files in an S3 bucket, rather than using SQL INSERT statements. Data Virtuality already has this strategy implemented that you can activate in the settings of your Redshift data source.

Since Data Virtuality, as well as Redshift, will need to access the S3 bucket, there is some setup work required. There are two ways to handle the authentication:

  1. IAM User: Creating a new user with a secret access key and access key id that grant access to the S3 bucket.
  2. IAM Role: Creating a new role that is assumed by your Redshift cluster and EC2 instance running Data Virtuality.

Using an IAM Role is generally preferred as you will not have to deal with keys that might eventually get compromised or need to be rotated regularly. Instead, AWS will handle this implicitly for you.

Create an S3 bucket

  1. Navigate to the S3 section in your AWS account
  2. Click on the "Create bucket" button and follow the instructions

Create an IAM policy

  1. Navigate to the IAM section in your AWS account
  2. Click on "Policies" and "Create policy"
  3. Switch to the "JSON" tab and paste the following code (replace the bucket name with your own):
    {
    "Version": "2012-10-17",
    "Statement": :
    {
    "Effect": "Allow",
    "Action": "s3:*",
    "Resource": :
    "arn:aws:s3:::dv-test-upload/*",
    "arn:aws:s3:::dv-test-upload"
    ]
    }
    ]
    }
  4. Finish creating the policy by following the wizard

Depending on which authentication strategy you chose, please follow the corresponding section below.

IAM Role authentication

 

Setting up the role

  1. Navigate to the IAM section in your AWS account
  2. Click on "Roles" and "Create role"
  3. Choose EC2 from the list of services and continue via the "Next: Permissions" button
  4. Select the previously created IAM policy and click "Next: Review"
  5. Provide a name and click "Create role"
  6. On the list of all roles, click on the name of the newly created role and select the "Trust relationships" tab
  7. Click "Edit Trust Relationship", adjust the JSON document as follows, and click "Update Trust Policy"
    {
    "Version": "2012-10-17",
    "Statement": "
    {
    "Effect": "Allow",
    "Principal": {
    "Service": "
    "ec2.amazonaws.com",
    "redshift.amazonaws.com"
    ]
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

Assigning the role to EC2

  1. Navigate to the EC2 section in your AWS account
  2. Right-click on the instance that hosts Data Virtuality, select "Instance Settings" and then "Attach/Replace IAM Role"
  3. Choose the previously created role from the dropdown and click "Apply"

Assigning the role to Redshift

  1. Navigate to the Redshift section in your AWS account
  2. Click on your cluster and on "See IAM role"
  3. Select the previously created role and click "Apply changes"

Configuring Data Virtuality

  1. Open the Data Virtuality Studio and connect to your instance
  2. Right-click on "Analytical Storage <dwh>" and click "Edit Analytical Storage"
  3. Add the following translator parameters (replace placeholders with your own values): uploadMode=s3Load,bucketName=<name of the S3 bucket>,awsAccountId=<your AWS account ID>,iamRole=<the name of the IAM role>
  4. Click on "Finish"

IAM User authentication

 

Creating an IAM User

  1. Navigate to the IAM section in your AWS account
  2. Click on "Users" and "Add user"
  3. Choose a name, select "Programmatic access" and click "Next: Permissions"
  4. Click on "Attach existing policies directly" and select the previously created role
  5. Click on "Next: Review" and "Create user". Note down the Access Key ID and Secret Access Key.

Configuring Data Virtuality

  1. Open the Data Virtuality Studio and connect to your instance
  2. Right-click on "Analytical Storage <dwh>" and click "Edit Analytical Storage"
  3. Add the following translator parameters (replace placeholders with your own values): uploadMode=s3Load,bucketName=<name of the S3 bucket>,secretKey=<your Secret Access Key>,keyId=<your Access Key ID>
  4. Click on "Finish"

Now your S3 load is fully set up and will be used automatically when inserting data into Redshift.

Be the first to reply!

Reply