We were recently asked by a customer to assist with getting their Cloudera environment spun up on Azure. While this has been accomplished several times, we had some unique challenges to solve due to security requirements. This post will cover the major pre-requisites and challenges we faced along the way. We had Cloudera and Microsoft professional services work with us as we performed the installation for the client.
Over the last few years there has been a lot of industry buzz about the future of the enterprise data warehouse (EDW). Maybe we should change the classic EDW acronym for a new title: Extended Data Warehouse.
There have been several advancements within the Hadoop world that have positioned Hadoop closer to the data warehousing community than ever before. With a series of Hadoop 2.0 releases starting in October 2013, Hadoop is now much closer to being a platform for a data warehouse.
How old is your data warehouse? It’s a simple question and probably one you don’t think about much. The majority of production data warehouses are now 15-20 years old and probably very transactional centric. Over the years, you’ve probably remodeled “the house” more than a few times—adding some “rooms” and “upgrades” here and there. It’s starting to feel its age as more Business Intelligence requirements have been added, including Mobile applications and specialized analytics. And more and more ideas seem to show up in your inbox every day, especially Big Data questions.
So what exactly is Big Data? In the real world view, Big Data is the culmination of several years' worth of data that your company has stored in their data warehouse as instructed by their DBA since, well, forever. This data that has been archived in different locations for safe keeping, and possible later use, is extremely valuable for marketing, sales and other decision makers in your organization. The official Wiki definition of Big Data is: “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.” You will see that definition used in a lot of places.
Every successful technology goes through several cycles of invention, discovery, socialization, adoption and continuous improvement. Hadoop is no exception. It has been embraced by early adopters and is now in the “discovery path” for other customers and vendors. The adoption is well supported by third party vendors who have customized and extended their product offerings with their own Hadoop distributions and implementation to help customers adopt the new technology.