When the data is generated at the enterprise level, it is massive and it grows exponentially. The data approach has a role to play in unlocking the value from the data. In the last few years, tremendous growth has been seen in terms of data. Data has been growing at a 40% compound annual rate and even faster in some the industries. Storing and processing new data for analytics purposes in some cases is cost prohibitive. Combination of old design, rigid structure of data model and cost of existing Enterprise data warehouse solutions make it difficult for organizations to unlock the value from the new data sources. Organizations can gain new insights by combining data from various sources. Unlike traditional relational databases which requires data to be transformed in specific formats before it is stored in tables, Enterprise Data Lake allows data to be stored in raw format from various sources and apply structure at the time of data access. New type of data can be loaded without data model in place. Data model needs to be designed based upon the questions that data can answer. “Schema on Read” approach empowers Enterprises to quickly store data in any format and apply structure in flexible and agile manner based on problem in hand.
- Creating a centralized Data lake to hold all the raw data
- Use of the Hadoop Clusters to convert the raw data into a cleaned data and save it back to the data lake.
- Load the cleansed data into the data warehouse
- Branch out in new area by storing parts of data in columnar data warehouse solutions on cloud without effecting existing processes.
USE CASES FOR CREATING A DATA LAKE
- Store all type of data. Data lake can be created to store structure, semi structure and unstructured data. This make it easier and economical to store data which was difficult and costly in the past esp. with IoT scenario, weblog etc.
- Experimental analysis – Data lake can be used to store the data for experimental analysis before we can define the value of analysis from stored data.
OUR ARCHITECTURE APPROACH
- Metadata management
- Secure and Compliant
- Centralized and easily to manage
- Open Architecture
- Enterprise Data Governance
- Low-cost and effective