What is Data Warehouses? Uses, Challenges

Coursera 7-Day Trail offer

What are Data Warehouses?

A data warehouse is a large, centralized repository of data that is used for analysis and reporting. It is designed to support business intelligence activities by consolidating data from multiple sources into a single, unified view. Data warehouses are typically used to store structured data from transactional systems such as customer relationship management (CRM), enterprise resource planning (ERP), and supply chain management (SCM) systems. The data is extracted, transformed, and loaded (ETL) into the warehouse, where it is organized and optimized for analytical processing.

Since the inception of desktop computing, in the mid-1980s, around the world, there has been a proliferation of data use and needs for data storage. Almost all employees of organizations, above a certain size, now use computers and produce, modify, or read data. For very large organizations, the amount of data that is used on a day-to-day basis could be as high as in petabytes. With this huge explosion in data, organizations felt the need for:

  • Consolidating much of the data from various databases into a whole that could be understood clearly.

  • Focusing on the use of data for decision-making, as opposed to simply running transactions.

The need for creating data warehouses arose from the above two needs. The technology of data warehouses draws on enterprise databases to create a separate set of tables and relations that can be used to run particular kinds of queries and analytical tools. Warehouses are different from transaction databases, as users can run complex queries on them, which are related to the functions of the enterprise that need not affect the transaction processing.

To create a data warehouse, data is extracted from transactional tables and pre-processed to remove unwanted data types and then loaded into tables in the warehouse. The extraction process requires making queries into transactional databases that are currently being used.

This is a challenge as the data tables may be distributed across various servers, and the data may be changing rapidly. The data obtained from these tables are maintained in a staging area, a temporary storage area, where the data is scrubbed. The idea of data scrubbing is to remove recognizable erroneous data. This task is often difficult, as errors are not obvious – say a misspelled name or a wrong address – and require careful examination to remove them. At the scrubbing stage, data is not corrected in any manner; it is invariably removed from the collection of raw data.

Once the data is scrubbed or cleaned, it is loaded onto the tables that constitute the warehouse. When an organization is creating a warehouse for the first time, the entire data is loaded into a database, using a particular design. Subsequent data that is obtained from the transaction databases are then extracted, cleaned, and loaded incrementally to the earlier tables.

Data about a particular domain or a problem to be analyzed is maintained in data marts. For example, a mart may be created to examine sales data alone. This mart will collect data related to the sales activities across the organization and store them in the warehouse. However, it will exclude the data related to production, finance, employees, and so on. The mart can then be analyzed for particular problems related to sales trends, sales predictions, and so on. Furthermore, the mart may be updated periodically to include the fresh data available.

Data in warehouses can be stored in tables with timestamps. This is the dimensional method of creating warehouses. The idea here is to store data in a single or a few, unrelated tables that are given one additional attribute of a timestamp (that indicates when the data was collected or created).

For example, one table in a dimensional warehouse may include data on customers, sales, products, orders, shipping, and a timestamp of each transaction. Each timestamp will pertain to one particular event in the life of the organization when a transaction occurred and the data was created. Such a table can be analyzed to examine trends in sales, fluctuations in orders across seasons, and so on.

Another method of storing data is in the regular tables-and-relations format of relational databases. Here too an additional attribute of a timestamp is included within the tables. Various kinds of analysis can be conducted on data available in warehouses including data mining, online analytical processing, and data visualization. These different methods are designed to extract patterns and useful information from very large data sets.

An online analytical process (OLAP) is used to analyze and report data on sales trends, forecasts, and other time-series-based analyses. Such analyses allow managers to see and visualize the data in a form that shows interesting and unusual patterns that would not be easily visible from the analysis of transaction data alone.

In modern organizations, ones that have a strong online presence and collect data from customer visits to websites and transaction data from different types of e-commerce sites, the extent and size of the data are such that analyzing it for patterns is almost impossible unless a warehouse is used. For example, one firm analyses data, using OLAP, from millions of visitors to different pages of its website to dynamically place advertisements that would conform to the visitors’ interests, as determined by the regions on the page the visitor hovers over or clicks on.


Data Mining Uses of Data Stored in Warehouses

Data mining means extracting patterns and knowledge from historical data that is typically housed in data warehouses. Data mining is a loose term that means many things at the same time – data storing, data analysis, use of artificial intelligence techniques, statistical pattern recognition, and others.

The original idea of data mining came from the field known as ‘knowledge discovery in databases (KDD). KDD is a sub-field of artificial intelligence and is concerned with finding useful, human-understandable knowledge from data. Several other terms are now used to describe the same ideas – business intelligence, data analytics, and web analytics. Data mining is used with data accumulated in data warehouses.

Following are some examples of data stored in warehouses that are used for mining:

Click-stream Data

This data is collected from website pages as users click on links or other items on the web page. Data on where a user clicks, after what interval, what page the user goes to, does the user return and visit other links, etc., are collected. The data are mined to identify which links are most frequently visited, for how long, and by what kind of users. The online search firm, Google, has initiated an entire field of mining click-stream data that is known as web analytics.

Point-of-sale Purchase Data

Data obtained from retail sales counters is the classic data set to which mining software was applied. The data pertains to the item quantities, price values, date and time of purchase, and details about customers that are obtained from point-of-sale terminals. The data is used to perform a ‘market basket’ analysis, which essentially shows what kinds of items are typically purchased with each other.

In a famous case, a large retailer found from a market basket analysis that men in a certain part of the USA were likely to buy beer and diapers on Thursday evenings. This was an unusual finding and the retailer sought to explain why. It was later learned that many families with young children planned their weekend trips on Thursday evenings, at which point the women would ask men to go and buy diapers from the store. The men would take this opportunity to buy beer, also for the weekend. The retailer used this information to announce promotions and increase sales of both these products.

Online Search Data

This data is about a search that users type in search boxes on web pages. Many organizations collect the text typed in by users while they are searching for some information. This text data reveals what users are interested in and is mined for patterns. The data collected pertains to the search texts typed in, the time at which they are typed, and the number of times different searches are done.

Many online retailers, such as Flipkart, mine this data to identify what users are interested in and then make product suggestions based on association rules. For example, users searching for books on Java programming may be offered what others have seen and purchased, including associated books on programming they have not considered.

Text Data

This is text data that is posted by users on web pages, blogs, emails, wikis, twitter feeds, and others. Many organizations have found that by mining this data they can glean interesting insights and trends. Many tools and software programs have been created recently to mine text data. One example is provided by the online site called Wordle.

The www.wordle.net site hosts an application that mines text submitted to it. The application counts the frequency of words appearing in the submitted text and then creates a word ‘cloud’ with the most frequent words appearing as the largest.

For example, a word cloud of the text of a portion of the Constitution of India. Text from Part III of the Constitution, comprising of the Fundamental Rights, was submitted to wordle.net. This part consists of about 13 pages of printed text.


Challenges of Enterprise Systems Implementations

Enterprise systems have faced severe challenges of implementation and even failures. The Foxmeyer Drug Co in the USA went bankrupt after a failed ERP implementation. Researchers estimate that about 75% of ERP implementation projects fail; the failure implies excessive overrun of budgets or outright inability to perform after implementation.

One of the most famous cases of failure of a CRM project was that of a Siebel implementation at AT&T. AT&T was one of the largest telecommunication firms in the USA and had a huge mobile phone business. It had implemented Siebel software to allow customers to interact with its staff, maintaining a history of all the interactions. In 2003, AT&T upgraded the Siebel software to a new version and this resulted in large-scale problems. Servers crashed routinely and millions of customers were left unserved.

The problems with the CRM were that it did not allow the creation of new customer accounts, would not allow customers to change their numbers under the new number portability laws in the USA, and also blocked call access for many customers. Many of their customers moved to their competitors. This had a profound impact on AT&T’s business and it later was bought out by a rival firm.

Much criticism too has been leveled at enterprise software. The most prominent one is that by adopting industry practices, as embodied in the software, firms are losing their competitive advantage. After all, say critics, many firms derive a competitive advantage by how they do the same things as their rivals albeit in a better manner, so customers come to them.

If every firm in the industry follows the same practices, they will be indistinguishable from each other. Another criticism is that the enterprise-level focus forced by enterprise systems is, in many cases, so severe that it shuts out the innovative ways in which employees can change their work practices to meet the demands of the market. Such systems deaden the innovation potential of employees. Other critics argue that the changes forced by the system rupture the organisational culture, quite adversely, in many cases.

However, despite the criticism enterprise systems have been adopted widely across industry segments, as also by small and medium enterprises and by government agencies. The current trend of providing enterprise services through the cloud model, by firms such as Salesforce.com, has lowered many of the implementation risks as also the costs.

Issues Having a Bearing on the Successful Implementation of an Enterprise System

Possibly the most important issue in ensuring the success of enterprise systems is that of managing change effectively. Such systems introduce massive changes in terms of procedures and practices that the firm’s employees are used to. Large systemic changes pose a threat and raise the levels of anxiety in the firm. Job roles are changed and redefined, the employees are trained for using the system in a manner that suits the new processes rather than what the employees were comfortable with, and the flexibility that they enjoyed, in many cases, is replaced by a rigid process.

Change Management

Change management entails training staff in the new procedures, creating new teams and defining new tasks, creating new reporting relationships, and creating a new culture in which employees will work. This requires careful and sensitive handling of involved employees, right from the start of the project. Key employees from departments and groups have to be recruited for guiding the system’s implementation, and they have to be encouraged to participate freely rather than resist the system.

Although change is not easily accepted, it is achieved after employees begin to see the visible benefits of the system through reduced workloads or the effective use of the system. The management has to tread carefully and nurture this change when it becomes visible.

Top Management Support

The second most important criterion for the successful implementation of an enterprise system is that of top management support. Almost all the successful cases of ERP deployments show that the top management was deeply committed to the project and supported it all through the implementation steps.

The top management has to be involved from the start and stay with the project management team right through the duration of the project. They should openly take responsibility for the success of the system and announce rewards and incentives for its successful operation.

Top management is also the key ingredient for change management. Their acceptance, support, and encouragement of change make possible the creation of new cultural norms and business practices. In several large and successful ERP roll-outs, the top management spent much time talking about the project to employees, participated in meetings and training workshops, wrote personal web pages and emails to staff, and spoke to news media about the project. They created a positive environment for the new system and celebrated its success.

Controlling Costs

Controlling costs is the other critical issue in ensuring the success of enterprise system implementations. The costs of implementing and running an ERP system range from USD 400,000 to about USD 3 million, depending on the size of the company. These figures are from a study conducted in 2007 and are much lower than an average of USD 15 million per firm, which was found in 2003.

These figures are largely for the USA and European commercial firms. In India, the average figures are reported to be in the range of Rs 6–9 million (USD 130,000–200,000) for an ERP system implementation per firm. These are the figures that are arrived at after the implementation is complete and firms know or can approximate the total amounts they have spent. The initial estimates that they had for the projects were, invariably, lower.

Of the many items that managers usually underestimate, training of staff and personnel is the most frequent. Since almost all employees of the firm are affected and have to learn the new system, this cost grows as people realize the challenges of entering and using data that affects many others in the firm, and demand more help and training.

Restricting Customisation

Another area of increased costs is the customization of the package. As the implementation proceeds, many firms realize that the changes in processes being demanded are not possible or difficult and then they resort to customization, which requires extra work from the vendor and hence extra costs. Customization also leads to further demands on testing and integration, which also add to the cost.

Other areas that add to increased costs are data conversion and migration (from old formats to the formats required by the system); costs of consultants who are hired to solve specific and hard problems; costs of time lost by the best employees in helping with the new system rather than working on their tasks; and the loss in productivity immediately after implementation as employees have to work with an entirely new system.

Vendor Selection

Project management and vendor selection are also key issues for the success of the project. Project management entails that the management and the implementation team have to keep a sharp tab on the deadlines and .milestones. If these are missed, certainly, later milestones will also be missed. Delayed implementations cause a loss of morale and a poor image among all stakeholders. Key to the success of the project is the implementation partners.

Few firms have the internal skills to manage an enterprise system implementation on their own. They invariably rely on the vendor to do the implementation or a third party, who acts as a consultant. The selection of this partner is very important as the partner needs to have the necessary skills, experience, and a cost structure that matches the expectations of the client firm. There are many examples of lawsuits against implementation consultants by clients as the project went sour or failed and wasted millions of dollars.


Management Information Systems

(Click on Topic to Read)

Leave a Reply