The business world continues to evaluate and implement the cloud for some of its IT requirements. The concept of the cloud as a viable IT storage solution as well as a way to cut costs is gaining momentum. But it might prompt the question: is the cloud the right place for a data warehouse?
This is an interesting question for many, and a problematic question for some. For most large IT organizations, the most often-cited answer is, “we’re concerned about security, especially customer-sensitive or business-critical information.” This article will address that issue from a somewhat historical viewpoint, but also talk about the trends and upsides that we believe will shape this question in the coming years.
It is quite clear that many of the issues and concerns about the cloud are abating. In a research report where Gartner presented their IT predictions for 2012 and beyond, they stated, “At year-end 2016, more than 50 percent of Global 1000 companies will have stored customer-sensitive data in the public cloud.”
Like many things, sometimes what we think is new (the cloud) isn’t really so new after all. In the 1970s many companies could not afford their own computers (mainframes). So they connected, often through dial-up, to these large systems. It wasn’t called “the cloud,” it was called “time-sharing.” Important and critical information for companies was “out there” on someone else’s computers. From those times, to today, security of shared and managed resources has always been a priority for the provider of those resources.
Data storage in the cloud is probably not perfect for every application. There can be compelling reasons, sometimes contractual ones, where on-premise storage is an absolute requirement. There can be instances where the required latency or speed of certain transactions requires a non-cloud solution. However, data warehouse situations where blinding sub-millisecond speeds requirements are few. When it comes to the business of data warehousing and analytics, there are many environments where the cloud could be utilized as a very cost-effective solution.
Security is an interesting discussion. While organizations may feel more secure having those hard disks and servers down the hall, the argument could also be made that putting that data in the hands of a company whose very existence depends on its ability to provide stable and secure environments, is about a risk-free as you can get. Large cloud providers, with their powerful security teams, are probably more skilled at security and compliance than the handful of security professionals at your company.
Food for thought, how many of your companies offices are set-up like Ft. Knox with absolute physical security? Within those semi-secured buildings how many ethernet jacks are hidden under a desk, behind a door…being used by anyone with physical access; employee, contractor and data thief alike. Does network security know about every renegade router, every personal WiFi access point? Can they physically stop the wireless signals from extending out into the parking lot or across the street, providing convenient access to anyone with the right skills? Highly unlikely, to impossible. Now, how many people have access to your cloud providers facilities? These ARE secured similar to military bases. And while you need to be extremely prudent in any scenario, cloud or on-premise of network hacking intrusions, the fact is that 99% of on-premise systems today are far more exposed and vulnerable than the top cloud provider solutions ever will be.
Clouds on the Horizon
It is unlikely that organizations with huge, enterprise data warehouses investments will want to switch over their entire data warehouse and analytics platform to a cloud environment. We are not hearing or seeing a lot of buzz in that regard. However, for even large organizations, if there is data that is siloed or has variable demands, the cloud might be a relocation possibility in order to reduce costs.
In addition, as more and more medium-size organizations hear and read about the benefits of business intelligence, analytics systems, and data discovery that large companies are using, there is an interest bubble that is floating downstream. Some of these companies don’t have a legacy data warehouses or the on-premise infrastructure to build and manage one. These organizations are perfect candidates for a cloud-based “data warehouse-as-a-service” approach. We are seeing more and more interest in that market.
One primary issue that has held things back in the past is pricing. With a cloud-based platform that is scalable and elastic, with lowered tool prices and improvements around BI and ETL (extract, transfer, load), we may be approaching a point where large-scale data warehousing benefits can now reach down into smaller companies.
Amazon Web Services Redshift
At iOLAP, we have been very interested in one of the latest cloud offerings that was built purpose built for data warehousing. Within the last year, Amazon Web Services (AWS) announced and released Redshift, a likely game-changing platform to the world of data warehousing. Redshift is a petabyte-scale data warehouse-as-a-service (DWaaS) platform that offers the same low cost, cloud-based, pay as you go model available with other AWS products.
From a pricing perspective in its simplest form, Redshift has no up-front cost. Beginning users can provision a new instance with a few clicks and in a matter minutes be up and running with 2TB of space for $0.85 per hour. Later as storage requirements and utilization grows, users can also take advantage of reserved instance pricing with a lower longer-term price model that comes in under $1,000 per TB per year.
While the low cost and overall performance characteristics are compelling, Redshift’s key differentiator lies more in its capabilities for both scalability and elasticity. With traditional data warehouse appliances one can only take advantage of the computing capacity they own. In comparison, Redshift changes the paradigm to simply taking advantage of as much computing power as needed when and only when it is needed. The complete ramifications of that approach to pricing and elasticity are just now being evaluated and understood by customers. It will be interesting to watch how this disrupts traditional models of both pricing and deployments.
Below are some of the most commonly-touted benefits of looking into a cloud-based infrastructure, in particular, Redshift.
- Cost Savings. The potential to save substantial costs over the years is a powerful argument—and the consideration of no up-front costs can sometimes be too good to ignore. Couple that with the flexibility of on-demand and metered pricing and savings can quickly balloon.
- Capital Expense Considerations. Companies have the ability to use large databases without going through a time-consuming procurement process to obtain the hardware and software—a capital expense. Some organizations are becoming proponents of the pay-as-you-go Infrastructure-as-a-Service (Iaas) concept.
- Rapid Deployment. Reduced pricing allows you to focus on developing, testing, and delivering even small or “proof of concept” BI and Analytics projects that show strong return on investment (ROI) that in turn fuel new, focused projects.
- Enterprise Ready. Redshift appears to be capable of speed and performance that enterprise users need. Peak workload management is also a consideration in order to economically deal with unplanned and unforeseen demands. Redshift has the ability to scale to enormous capacity, with capabilities into the petabyte range.
There are other considerations that are not discussed as often. One interesting benefit is something called Data Monetization. Data monetization simply means taking the efforts associated with maintaining your business data and turning that data into a profit center.
Cloud-based data, when properly structured and protected, is accessible via the internet. Not earth-shaking by itself, but that simple concept offers tremendous opportunities for collaboration and monetization. How valuable would it be if partners to your business have access to some of your information? How valuable would it be if you had access to data from your partners or suppliers? What if you had easy and inexpensive access to additional third-party data for demographics, climate information, and so on? When companies partner together to share information, new patterns of revenue, sales campaign effectiveness, market reach, and so on, can be identified—and all partners benefit.
The concept of cloud-based data warehousing solutions is intriguing. It comes with some concerns, but every organization is different in their approach to common cloud-based issues and the addition of new projects. The concept is also garnering interest for smaller organizations where cost is a bigger concern and the infrastructure won’t allow them to do what they want and need to do in business intelligence and analytics. To take advantage of a cloud platform for data warehousing, however, some organizations need to let go of pre-existing biases. People have built their careers around creating an in-house infrastructure, and their concerns and push-back, however unwarranted, will be part of the organizational culture.
Just remember what Gartner’s report stated, “At year-end 2016, more than 50 percent of Global 1000 companies will have stored customer-sensitive data in the public cloud.”
Maybe it’s time to jump in the pool, or at least dip the first toe in.