Building A Data Lake For The Enterprise

In today’s environment, businesses are feeling the heat of competition from all sides, in many cases from disruptors within their industries who are wiping their markets clean
with technology and data. To compete, businesses need innovation; and to innovate, they need data.

There’s a fire hose of data moving through enterprises, and ongoing analysis from Unisphere Research shows that many organizations already have data well into the petabyte
range. However, not enough of this growing pile of data is making it over to the tools and platforms that decision makers use or to any accompanying decision-making applications. A total of 31% say the majority of their data actually makes it to the analysis stage.

Big data is enabling many types of business opportunities—from predictive analytics to the Internet of Things. IoT in particular is making it critical to be able to take lots of data feeds, pull the points that are of material importance to customers, and engage with them in real time. Other emerging initiatives include artificial intelligence, machine learning, and robotic process automation. Most of the data that is generated is not captured. As it moves through the enterprise and is discarded, it ends up in silos and locked away in proprietary databases. It may even end up on tapes stored in a basement.

Why isn’t more data being made available to help the organization? There are still limitations on the data that is moving into the analytics stage; it still gets shepherded through the extract, transform, and load model—or ETL—which is not going away anytime soon. ETL is a technology and methodology that has worked well for enterprises for more than 2
decades, especially for bringing data from enterprise sources into one place where it can be examined, such as within a data warehouse.

However, the ETL methodology is expensive, and with more data flowing in, those costs will invariably rise. Aggregating, transforming, and reloading data between systems
takes time and resources, requiring investments in people, skills, and systems. The time it takes to move data from one system to another doesn’t work in a world demanding realtime, or near real-time responsiveness.


Send to Other