A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytic processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users in the format of Reports / Dashboards.
Data Warehousing System Architecture
Every Data warehouse system should have the below setup to get complete benefits of Data warehouse features in the report & Analysis area.
Brief definition for Data warehouse system – We have all different type of sources and those sources are extracted to Stage (ODS) database using any of the ETL tool. Then process the data whatever the way you want (apply calculation, aggregation, Ranking, etc.,) and load into Warehouse database. From Data warehouse you can generate standard reports, Dashboards for end users and do Data analytics using any of the Reporting / Dashboard / Analytics tools. Also I have listed some of emerging ETL & Report tools for your reference.
Data Warehouse Characteristics
Data warehouse systems have their own specific characteristics and below are some major characteristics.
Subject-Oriented: Data warehouses are designed to help you analyze data and Information is presented according to specific subjects or areas of interest, not simply as files. Data is manipulated to provide information about a particular subject. For example, to learn more about your company’s sales data, you can build a warehouse that concentrates on sales subject which is accessible to end-users, but is provided structure and organized according to the specific needs.
Integrated: Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. Data warehouse is a single source of information for multiple areas of interest. The data warehouse provides one-stop shopping and contains information about a variety of subjects.
Non-Volatile: Once entered into the warehouse, data should not change. . Information is consistent regardless of when the warehouse is accessed. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse. In order to discover trends in business, analysts need large amounts of data.
Accessible: Main purpose of a data warehouse is to provide readily accessible information to end-users.
Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.
Comparison of Data Warehouse and OLTP system
Data warehouse is distinctly different from the OLTP system and maintained by day-to-day operational systems. Data warehousing is not simply an ‘access wrapper’ for operational data, where data is simply ‘dumped’ into tables for direct access. Among the differences:
Data warehouse system uses many definitions and each one having their own specific characteristics.
Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers.
Staging Area: Any data store that is designed primarily to receive data into a warehousing environment.
Operational Data Store (ODS): A collection of data that addresses operational needs of various operational units. It is not a component of a data warehousing architecture, but a solution to operational needs.
OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs.
Multidimensional Analysis: The ability to manipulate information by a variety of relevant categories or “dimensions” to facilitate analysis and understanding of the underlying data. It is also sometimes referred to as “drilling-down”, “drilling-across” and “slicing and dicing”
Hypercube: A means of visually representing multidimensional data.
Star Schema: A means of aggregating data based on a set of known dimensions. It stores data multidimensional in a two dimensional Relational Database Management System (RDBMS), such as Oracle.
Snowflake Schema: An extension of the star schema by means of applying additional dimensions to the dimensions of a star schema in a relational environment.
Multidimensional Database: Also known as MDDB or MDDBS. A class of proprietary, non-relational database management tools that store and manage data in a multidimensional manner, as opposed to the two dimensions associated with traditional relational database management systems.
OLAP Tools: A set of software products that attempt to facilitate multidimensional analysis. OLTP tools can incorporate data acquisition, data access, data manipulation, or any combination thereof.