Loading Data from Amazon S3 Bucket to IBM Netezza Performance Server

Mario Galjanić

/ 2021-08-25

Introduction

With the increase of cloud computing, more and more users are adopting secure, scalable, and redundant services for data storage. One of the most popular is Amazon S3 service. S3 stands for Simple Scalable Storage, and with broad accompanying services available from AWS it is easy to understand its popularity.

With IBM's new flagship product called IBM CP4D NPS (Cloud Pak for Data - Netezza Performance Server), we can use the newly acquired connector as part of the nzrestore utility. With Netezza, there are many ways to pull data from a backup, but a relatively new feature has been included which allows the use of an AWS stack. Currently, it is possible to move data from a remote backup directory using an S3 bucket. Depending on your preferences, network speeds, size of data, and availability, there are multiple options. For example, if the target system is airgapped (meaning it is not connected to internet and can only be accessed via VPN), the process is a bit more complicated. In that case, you need to have the capability to save data to a networked drive that has external access. Pull the data using the AWS CLI commands which we will describe shortly. Once you have the backups saved on the local network, mounting the drive to NPS and standard nzload / nzrestore with the known features is smooth sailing. In our case today we will cover the more complex nzrestore directly from Amazon S3 bucket to a given database in Netezza Performance Server (NPS).

To summarize, the process can be performed in two ways:

Move backups from S3 to a mounted NPS drive using the AWS CLI utility
- Perform the ingestion via nzrestore
Move the data from S3 bucket and directly ingest via nzrestore utility using the S3 connector.

So, let's dig in!

Setup Amazon S3 Bucket for Connection

If you don't already have the bucket with the backups that you want to restore on the target environment, you will have to create it. If you already have it setup, you can proceed to the next step. In order to access the S3 bucket, you will first need to create it. Login to your AWS web console and navigate to S3 service and create a new bucket:

center-medium

From there, you will be directed to the wizard where you can select various options. For the purposes of this tutorial we will leave everything as it is and only modify the bucket name.

center-medium

Next, once the bucket is created, we need to pull the information to access it. In the upper right corner click on your user profile and go to My Security Credentials as shown below.

center-medium

From there, you will be taken to the screen where you can create new security credentials and review existing ones. Click on create access key and a similar screen will appear:

center-medium

From there, copy your Access Key ID and Secret Access Key to a safe place as these will be needed for the nzrestore application to successfully connect to your S3 bucket.

Restoring Database from S3 Bucket

On your NPS environment, execute the following command:

nzrestore -db DummyDB -sourcedb DummyDB \ -connector s3 \ -connectorArgs UNIQUE_ID=dummyBackupFolder:ACCESS_KEY_ID=asdf1234321fdsa:SECRET_ACCESS_KEY=1234shfdskfdjds3dsakdjda33s:DEFAULT_REGION=eu-central-1:BUCKET_URL=s3bucket-dummy \ -npshost ipshost

Some context is needed to explain how to properly configure the connector arguments. The connector consists of the following elements:

UNIQUE_ID
- Necessary to identify the folder where the nzrestore script will look for the Netezza backups.
ACCESS_KEY_ID
- The previously acquired unique id tied to your user and used to authenticate your access privileges to the S3 bucket
SECRET_ACCESS_KEY
- The previously acquired Secret Key ID tied to your user and used to authenticate your access privileges to the S3 bucket
DEFAULT_REGION
- Region where the S3 bucket is hosted
BUCKET_URL
- Unique address of the S3 bucket

Remaining elements of the command are standard ones when using the nzrestore command.

Pro tip: Use the nohup & to send the process to the background and ensure process completion in case of a connection loss to your NPS server.

Restore Monitoring

In case our session gets terminated and command "jobs" does not return our restore command, we can use a query to list the 10 most recent restores.

SELECT seqno, optype, status, dbname, starttime, '/nz/kit/log/restoresvr/' || logfile AS log_file_path FROM _v_restore_history ORDER BY starttime DESC LIMIT 10;

Column “STATUS” will give us information if restore is still in progress or if it has finished (successful/failed). By issuing tail –f <log_file_path> we can monitor the progress of our command in real time.

Conclusion

Today, we learned how to use the nzrestore utility in conjunction with Amazon S3 service to perform database restores on the fly. We hope this article helps you streamline your usual DBA processes and if you have any questions, feel free to reach out to us.

Loading Data from Amazon S3 Bucket to IBM Netezza Performance Server

Mario Galjanić

Introduction

Setup Amazon S3 Bucket for Connection

Restoring Database from S3 Bucket

Restore Monitoring

Conclusion

Share This Story, Choose Your Platform!

Share This Story

Drive your business forward!