Is Amazon's Redshift a Game Changer?

msteen - 4 minute(s) read.

Now and then new technologies, ideas, and even buzzwords come along that fundamentally change the way people look at the IT game.  When Amazon first released Amazon Web Services (AWS) it changed the game of cloud-based data centers by introducing pay-as-you-go pricing for servers and storage.  By replacing large up-front capital infrastructure expenditures with much lower costs that people could scale as their businesses grew, Amazon grew their own business by fostering many more entrants into the e-commerce space that in many cases also turned to Amazon logistics and fulfillment services.  That was a game changing moment that was definitely a win-win for all parties.

In February of this year, AWS brought that same game changing perspective to the world of data warehousing by releasing Redshift.  Redshift is a new, petabyte-scale data warehouse-as-a-service (DWaaS) platform that offers the same low cost, cloud-based, pay as you go model available with other AWS products.

When you read about Redshift at a high-level from their own website, it sounds very inviting:

Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes. We’ve made Amazon Redshift easy to use by automating most of the common administrative tasks associated with provisioning, configuring, monitoring, backing up, and securing a data warehouse.

Powerful security functionality is built-in. Amazon Redshift supports Amazon VPC out of the box and you can encrypt all your data and backups with just a few clicks. Once you’ve provisioned your cluster, you can connect to it and start loading data and running queries using the same SQL-based tools you use today.

From a pricing perspective in its simplest form, Redshift has no up-front cost whatsoever.  Beginning users can provision a new instance with a few clicks and in a matter minutes be up and running with 2TB of space for $0.85 per hour.  Later as storage requirements and utilization grows, users can also take advantage of reserved instance pricing with a lower long-term price model that comes in under $1,000 per TB per year.

While the low cost and overall performance characteristics are compelling, Redshift’s key differentiator lies more in its capabilities for both scalability and elasticity. With traditional data warehouse appliances one can only take advantage of the computing capacity they own. In comparison, Redshift changes the paradigm to simply taking advantage of as much computing power as needed when and only when it is needed.

For example, a day in the life of the typical data warehouse begins in the early morning hours with periods of extreme utilization to assimilate the prior day’s information.  This is followed by varying periods of relative quiet until the business day begins.  Mornings bring a flurry of reporting activity to review new activity and analyze emerging trends.  Reporting activity sometimes lessens around midday only to stage a comeback that nearly equals or even surpasses the morning levels.  This continues to the end of the business day and then falls off throughout the evening until the next early morning load window begins again.

For this reason, traditional data warehouse appliances have you plan for the peaks.  You can re-allocate resources between load and query tasks throughout the day, but in order to have the capacity manage the peaks you have to own it.  With Redshift, you can take full advantage of workload elasticity by starting the day with a minimally sized 2 node cluster.  Just prior to the morning peak load window that cluster can then expanded to 8 or more clusters depending on expected work load.  Immediately following the successful conclusion of load routines the cluster can be reduced back to 2 nodes.  Later, at the beginning of the business day the cluster would be increased again to perhaps 4 or 6 nodes depending on expected query needs.  The cluster would probably remain at that level throughout the business day before reducing again to 2 nodes in the early evening hours and remaining at that level until the start of the next load window.  During this entire process, the associated costs automatically adjust to match only the amount of capacity being used at the time.

However, as with any new technology, there are always the rumblings of both enthusiasm and concern.

Enthusiasm

  • Cost Savings. With a cloud-based alternative, the potential to save substantial costs over the years is a powerful argument—and no up-front costs are simply too good to ignore.
  • Capital Expense Considerations. Companies have the ability to use large databases without going through a time-consuming procurement process to obtain the hardware and software—which is also a capital expense.  Some organizations are becoming proponents of the pay-as-you-go IaaS concept.
  • Rapid Deployment. Reduced pricing allows you to focus on developing, testing, and delivering even small BI and Analytics projects that show strong return on investment (ROI) that in turn fuel new, focused projects.
  • Enterprise Ready. The platform appears to be capable of speed and performance that enterprise users need.  Peak workload management is also a consideration in order to economically deal with unplanned and unforeseen demands. Redshift has the ability to scale to enormous capacity, claiming capabilities into the petabyte range.

Concerns

  • Migration.  Some have expressed concerns over the way data is initially migrated to the Redshift platform—what are the best practices and how much effort does it take?
  • Stability.  The overall concerns some IT and other management has over the stability, newness and viability of enterprise-level cloud systems.  Convincing people this is a good way to go might be difficult in some environments.
  • Will costs really be reduced?  Without proper management and understanding of the costs, you might actually find yourself running up a large bill.  A new type of cost management needs to be evaluated and put into place or this could be an issue.
  • Pioneer Syndrome. There is a lack of published use cases since it’s a new platform.  In talking to AWS we know they are beginning to promote user success stories and real cost-saving data and more information is starting to become available.

As with many new technology paradigms, higher capacity, increased efficiencies and lower costs are market catalysts.  Redshift appears to be following this model as well.

In a May 22, 2013, blog (www.allthingsdistributed) by Werner Vogels, CTO for Amazon.com, he said the following, Since we launched [Redshift], we’ve been adding over a hundred customers a week and are well over a thousand today. That’s pretty stunning. As far as I know, it’s unprecedented for this space. We’ve enabled our customers to save tens of millions of dollars in up front capital expenses by using Amazon Redshift.

He clearly sounds excited and optimistic.  Will Redshift succeed?  I say yes.  It will be interesting to watch it evolve.  But like many things in technology, it’s hard to predict what will happen.  Will cloud-based data warehousing and analytics replace today’s traditional environments OR will the primary vendor market begin to embrace the new systems and morph into various hybrid variants?  Time will tell.