Using SSL to securely connect to Amazon Redshift Cluster

Amazon Redshift is fully managed Cloud Datawarehouse from AWS for running analytic workloads. Lot of customers have requirements to encrypt data in transit as part of security guidelines. Redshift provides support for SSL connections to encrypt data and server certificates to validate the server certificate that the client connects to.

Before we proceed how SSL works with Redshift, lets understand why we need SSL. SSL is required to encrypt client/server communications and provides protection against three types of attack:

Eavesdropping – In case of un-encrypted connections, a hacker could use network tools to inspect traffic between client and server and steal data and database credentials. With SSL, all the traffic is encrypted.

Man in the middle (MITM) – In this case a hacker could hack DNS and redirect the connection to a different server than intended. SSL uses certificate verification to prevent this, by authenticating the server to the client.

Impersonation – In this hacker could use database credentials to connect to server and gain access to data. SSL uses client certificates to prevent this, by making sure that only holders of valid certificates can access the server.

For more details, refer to this link .

You can easily enable SSL on Amazon Redshift Cluster by setting require_ssl to True in Redshift parameter group.

In this blog post, I would discuss 3 sslmode settings – require, verify-ca, and verify-full , through use of psql and python.

Setup details

I have a Amazon Redshift cluster with publicly accessible set to “No” and I would be accessing it via my local machine. Since database is in private subnet, I would need to use port forwarding via bastion host. Make sure this bastion host ip is whitelisted in Redshift security group to allow connections

## Add the key in ssh agent
ssh-add <your key>

## Here bastion host ip is 1.2.3.4 and we would like to connect to a redshift cluster in Singapore running on port 5439. We would like to forward traffic on localhost , port 9200 to redshift 
ssh -L 9200:redshift-cluster.xxxxxx.ap-southeast-1.redshift.amazonaws.com:5439 [email protected]

When we enable require_ssl to true, we have instructed Redshift to allow encrypted connections. So if any client tries to connect without SSL, then those connections are rejected. To test this , let’s modify sslmode settings for psql client, by setting PGSSLMODE environment variable.

export PGSSLMODE=disable
psql -h localhost -p 9200 -d dev -U dbmaster
psql: FATAL: no pg_hba.conf entry for host "::ffff:172.31.xx.9", user "dbmaster", database "dev", SSL off

As we can see, all database connections are rejected. Let’s now discuss SSL mode – require, verify-ca, and verify-full

1.  sslmode=require

With sslmode setting of require, we are telling that we need a encrypted connection. If a certificate file is present, then it will also make use of it to validate the server and behavior will be same as verify-ca. But if the certificate file is not present, then it won’t complain (unlike verify-ca) and connect to Redshift cluster.

export PGSSLMODE=require
psql -h localhost -p 9200 -d dev -U dbmaster
psql (11.5, server 8.0.2)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

We can see that psql has made a SSL connection and is using TLS 1.2.

2. sslmode=verify-ca

If we use, verify-ca, then the server is verified by checking the certificate chain up to the root certificate stored on the client. At this point, we need to also give Redshift certificate which has tobe  be downloaded from download link provided in Amazon Redshift documentation. To demonstrate, that we really need a certificate, let’s try connecting without certificate

export PGSSLMODE=verify-ca 
psql -h localhost -p 9200 -d dev -U dbmaster 
psql: error: could not connect to server: root certificate file "/Users/amit/.postgresql/root.crt" does not exist
Either provide the file or change sslmode to disable server certificate verification.

psql is complaining that it couldn’t find the certificate and we should either provide ssl certificate or change sslmode settings. Let’s download the certificate and store under home directory

mkdir ~/.postgresql 
cd .postgresql

curl -o root.crt https://s3.amazonaws.com/redshift-downloads/redshift-ca-bundle.crt

After downloading certificate , and placing under the desired directory, our connection attempt succeeds

3. sslmode=verify-full

Next, if we want to prevent Man in the Middle Attack (MITM), we need to enable sslmode=verify-full . In this case the server host name provided in psql host argument will be matched against the name stored in the server certificate. If hostname matches, the connection is successful, else it will be rejected.

export PGSSLMODE=verify-full
psql -h localhost -p 8192 -d dev -U awsuser
psql: error: could not connect to server: server certificate for "*.xxxxxx.ap-southeast-1.redshift.amazonaws.com" does not match host name "localhost"

In our test connection  fails, as we are using port forwarding and localhost doesn’t match the redshift hostname pattern

What this means is that if you use any services like route53 to have friendly names, verify-full won’t work as the hostname specified in psql command and host presented in certificate don’t match. If your security team is ok with verify-ca option, then you can revert to that setting, else you will have to get rid of aliases and use actual hostname.

In my case, I can resolve the error by connecting to Redshift Cluster from bastion host (instead of my local host tunnel setup) and using psql command  with actual hostname

psql -h redshift-cluster.xxxxxx.ap-southeast-1.redshift.amazonaws.com -p 5439 -d dev -U dbmaster

SSL Connection using Python

Next, let’s see how you can connect to Redshift cluster using python code. This is useful, if you have Lambda code or other client applications which are written in Python. For this example, we will use PyGreSQL module for connecting to Redshift Cluster.

$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import pgdb
>>> rs_port =5439
>>> rs_user = 'dbmaster'
>>> rs_passwd='****'
>>> rs_dbname = 'dev'
>>> rs_host='redshift-cluster.xxxxx.ap-southeast-1.redshift.amazonaws.com'
>>> rs_port =5439

>>> conn = pgdb.connect(dbname=rs_dbname,host=rs_host,port=rs_port , user=rs_user, password=rs_passwd,sslmode=rs_sslmode)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/miniconda3/envs/venv37/lib/python3.7/site-packages/pgdb.py", line 1690, in connect
cnx = _connect(dbname, dbhost, dbport, dbopt, dbuser, dbpasswd)
pg.InternalError: root certificate file "/home/ec2-user/.postgresql/root.crt" does not exist
Either provide the file or change sslmode to disable server certificate verification.

Before running above code, I removed the certificate file to show that pgdb.connect requires SSL certificate. Let’s now add the certificate to non-default location like “/home/ec2-user/root.crt” and use sslrootcert argument to pass the location

>>> rs_cert_path='/home/ec2-user/root.crt'
>>> conn = pgdb.connect(dbname=rs_dbname,host=rs_host,port=rs_port , user=rs_user, password=rs_passwd,sslmode=rs_sslmode,sslrootcert=rs_cert_path)
>>> cursor=conn.cursor()
>>> cursor.execute("select current_database()")
<pgdb.Cursor object at 0x7fa73bfe15d0>
>>> print(cursor.fetchone())
Row(current_database='dev')

As you can see above, after passing the SSL certificate, connection succeeds and we can fetch data from Redshift cluster.