Just today, the thesis work of us has been passed a long way and the topic has been selected — “Data compression for large multidimensional data warehouses”
So, time to go with some documentations. I have googled through “data warehousing” and collected some things for my ease to make it easier. Would find out the zest out of these long long lines to very short “mahmudio” expressions about “data warehousing”… 😀
Here the collection goes:
Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database. Common accessing systems of data warehousing include queries, analysis and reporting. Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume, of course. The final result, however, is homogeneous data, which can be more easily manipulated.
Data warehousing is commonly used by companies to analyze trends over time. In other words, companies may very well use data warehousing to view day-to-day operations, but its primary function is facilitating strategic planning resulting from long-term data overviews. From such overviews, business models, forecasts, and other reports and projections can be made. Routinely, because the data stored in data warehouses is intended to provide more overview-like reporting, the data is read-only. If you want to update the data stored via data warehousing, you’ll need to build a new query when you’re done.
This is not to say that data warehousing involves data that is never updated. On the contrary, the data stored in data warehouses is updated all the time. It’s the reporting and the analysis that take more of a long-term view.
Data warehousing is not the be-all and end-all for storing all of a company’s data. Rather, data warehousing is used to house the necessary data for specific analysis. More comprehensive data storage requires different capacities that are more static and less easily manipulated than those used for data warehousing.
Data warehousing is typically used by larger companies analyzing larger sets of data for enterprise purposes. Smaller companies wishing to analyze just one subject, for example, usually access data marts, which are much more specific and targeted in their storage and reporting. Data warehousing often includes smaller amounts of data grouped into data marts. In this way, a larger company might have at its disposal both data warehousing and data marts, allowing users to choose the source and functionality depending on current needs.
source: click here
A data warehouse is a repository of an organization’s electronically stored data. Data warehouses are designed to facilitate reporting and analysis .
This definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.
Datawarehousing arises in an organisation’s need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation.
The practical reality of most organisations is that their data infrastructure is made up by a collection of heterogeneous systems. For example, an organisation might have one system that handles customer-relationship, a system that handles employees, systems that handles sales data or production data, yet another system for finance and budgeting data etc. In practice, these systems are often poorly or not at all integrated and simple questions like: “How much time did sales person A spend on customer C, how much did we sell to Customer C, was customer C happy with the provided service, Did Customer C pay his bills” can be very hard to answer, even though the information is available “somewhere” in the different data systems.
Another problem is that ERP systems are designed to support relevant operations. For example, a finance system might keep track of every single stamp bought; When it was ordered, when it was delivered, when it was paid and the system might offer accounting principles (like double bookkeeping) that further complicates the data model. Such information is great for the person in charge of buying “stamps” or the accountant trying to sort out an irregularity, but the CEO is definitely not interested in such detailed information, the CEO wants to know stuff like “What’s the cost?”, “What’s the revenue?”, “did our latest initiative reduce costs?”.
Yet another problem might be that the organisation is, internally, in disagreement about which data is correct. For example, the sales department might have one view of its costs, while the finance department has another view of that cost. In such cases the organisation can spend unlimited time discussing who’s got the correct view of the data.
It is partly the purpose of Datawarehousing to bridge such problems. It is important to note that in Datawarehousing the source data systems are considered as given: It is not the task of the datawarehousing consultant to figure out, that since the problem is that the CRM system identifies a person by initials, while the Employee-Time-Management system identifies a person by full name while the ERP system identifies a person by social security number; and since a person can change his name: things do not work and the organization should invest in and implement one or two new systems to handle CRM, ERP etc. in a more consistent manner.
Rather, the datawarehousing consultant is charged with making the data appear consistent, integrated and consolidated despite the problems in the underlying source systems. The datawarehousing consultant achieves this by employing different datawarehousing techniques, creating one or more new data repositories (i.e. the datawarehouse) whose data model(s) support the needed reporting and analysis.