Saturday, January 30, 2010

Topic 11 - Future of Data Warehousing, Data Mining and BI

The Future of Data Mining – Predictive Analytics

Information Management Magazine, August 2004

Lou Agosta

The future of data mining lies in predictive analytics. The technology innovations in data mining since 2000 have been truly Darwinian and show promise of consolidating and stabilizing around predictive analytics. Variations, novelties and new candidate features have been expressed in a proliferation of small start-ups that have been ruthlessly culled from the herd by a perfect storm of bad economic news. Nevertheless, the emerging market for predictive analytics has been sustained by professional services, service bureaus (rent a recommendation) and profitable applications in verticals such as retail, consumer finance, telecommunications, travel and leisure, and related analytic applications. Predictive analytics have successfully proliferated into applications to support customer recommendations, customer value and churn management, campaign optimization, and fraud detection. On the product side, success stories in demand planning, just in time inventory and market basket optimization are a staple of predictive analytics. Predictive analytics should be used to get to know the customer, segment and predict customer behavior and forecast product demand and related market dynamics. Be realistic about the required complex mixture of business acumen, statistical processing and information technology support as well as the fragility of the resulting predictive model; but make no assumptions about the limits of predictive analytics. Breakthroughs often occur in the application of the tools and methods to new commercial opportunities.

Unfulfilled Expectations: In addition to a perfect storm of tough economic times, now improving measurably, one reason data mining technology has not lived up to its promise is that "data mining" is a vague and ambiguous term. It overlaps with data profiling, data warehousing and even such approaches to data analysis as online analytic processing (OLAP) and enterprise analytic applications. When high-profile success has occurred (see the front-page article in the Wall Street Journal, "Lucky Numbers: Casino Chain Mines Data on Its Gamblers, And Strikes Pay Dirt" by Christina Binkley, May 4, 2000), this has been a mixed blessing. Such results have attracted a variety of imitators with claims, solutions and products that ultimately fall short of the promises. The promises build on the mining metaphor and typically are made to sound like easy money - "gold in them thar hills." This has resulted in all the usual dilemmas of confused messages from vendors, hyperbole in the press and unfulfilled expectations from end-user enterprises.

Common Goals: The goals of data warehousing, data mining and the emerging trend in predictive analytics overlap. All aim at understanding consumer behavior, forecasting product demand, managing and building the brand, tracking performance of customers or products in the market and driving incremental revenue from transforming data into information and information into knowledge. However, they cannot be substituted for one another. Ultimately, the path to predictive analytics lies through data mining, but the latter is like the parent who must step aside to let the child develop her or his full potential. This is a trends analysis, not a manifesto in predictive analytics. Yet the slogan rings true, "Data mining is dead! Long live predictive analytics!" The center of design for cutting-edge technology and breakthrough commercial business results has shifted from data warehousing and mining to predictive analytics. From a business perspective, they employ different methods. They are positioned in different places in the technology hierarchy. Finally, they are at different stages of growth in the life cycle of technology innovation.

Technology Cycle: Data warehousing is a mature technology, with approximately 70 percent of Forrester Research survey respondents indicating they have one in production. Data mining has endured significant consolidation of products since 2000, in spite of initial high-profile success stories, and has sought shelter in encapsulating its algorithms in the recommendation engines of marketing and campaign management software. Statistical inference has been transformed into predictive modeling. As we shall see, the emerging trend in predictive analytics has been enabled by the convergence of a variety of factors, with the market really taking off in the late 1990s after a long gestation period (see Figure 1).


Figure 1: Predictive Analytics Enabling Technologies

Technology Hierarchy: In the technology hierarchy, data warehousing is generally considered an architecture for data management. Of course, when implemented, a data warehouse is a database providing information about (among many other things) what customers are buying or using which products or services and when and where are they doing so. Data mining is a process for knowledge discovery, primarily relying on generalizations of the "law of large numbers" and the principles of statistics applied to them. Predictive analytics emerges as an application that both builds on and delimits these two predecessor technologies, exploiting large volumes of data and forward-looking inference engines, by definition, providing predictions about diverse domains.

Methods: The method of data warehousing is structured query language (SQL) and its various extensions. Data mining employs the "law of large numbers" and the principles of statistics and probability that address the issues around decision making in uncertainty. Predictive analytics carries forward the work of the two predecessor domains. Though not a silver bullet, better algorithms in operations research, risk minimization and parallel processing, when combined with hardware improvements and the lessons of usability testing, have resulted in successful new predictive applications emerging in the market. (Again, see Figure 1 on predictive analytics enabling technologies.) Widely diverging domains such as the behavior of consumers, stocks and bonds, and fraud detection have been attacked with significant success by predictive analytics on a progressively incremental scale and scope. The work of the past decade in building the data warehouse and especially of its closely related techniques, particularly parallel processing, are key enabling factors. Statistical processing has been useful in data preparation, model construction and model validation. However, it is only with predictive analytics that the inference and knowledge are actually encoded into the model that, in turn, is encapsulated in a business application.

Definition

This results in the following definition of predictive analytics: Methods of directed and undirected knowledge discovery, relying on statistical algorithms, neural networks and optimization research to prescribe (recommend) and predict (future) actions based on discovering, verifying and applying patterns in data to predict the behavior of customers, products, services, market dynamics and other critical business transactions. In general, tools in predictive analytics employ methods to identify and relate independent and dependent variables - the independent variable being "responsible for" the dependent one and the way in which the variables "relate," providing a pattern and a model for the behavior of the downstream variables. Differentiators are summarized in Figure 2.


Figure 2: Data Warehousing, Data Mining and Predictive Analytic Differentiators

Differentiators

In data warehousing, the analyst asks a question of the data set with a predefined set of conditions and qualifications, and a known output structure. The traditional data cube addresses: What customers are buying or using which product or service and when and where are they doing so? Typically, the question is represented in a piece of SQL against a relational database. The business insight needed to craft the question to be answered by the data warehouse remains hidden in a black box - the analyst's head. Data mining gives us tools with which to engage in question formulation based primarily on the "law of large numbers" of classic statistics. Predictive analytics have introduced decision trees, neural networks and other pattern-matching algorithms constrained by data percolation. It is true that in doing so, technologies such as neural networks have themselves become a black box. However, neural networks and related technologies have enabled significant progress in automating, formulating and answering questions not previously envisioned. In science, such a practice is called "hypothesis formation," where the hypothesis is treated as a question to be defined, validated and refuted or confirmed by the data. The confirmation or refutation of the hypothesis counts as knowledge in the strict sense. In neither data mining nor predictive analytics is a decision made. A prediction is a prediction, not a decision. The ultimate determining mark of predictive analytics (and applications) is that the prediction is inside the model.

A few examples will make clear the differentiators and sharpen the distinction between data mining and predictive analytics, and how the predictive analytics emerge from the context of the former.

  • Prescriptive, not merely descriptive: Scanning through a terabyte haystack of billing data for a few needles of billing errors is properly described as data mining. However, it is descriptive, not prescriptive. When a model is able to predict errors based on a correlation of variables ("root cause analysis"), then the analysis is able to recommend what one ought to do about the problem (and is, therefore, prescriptive). Note the model expresses a "correlation" not a "causation," though a cause-and-effect relationship can often be inferred. For example, Xerox uses Oracle's data mining software for clustering defects and building predictive models to analyze usage profile history, maintenance data and representation of knowledge from field engineers to predict photocopy component failure. The copier then sends an e-mail to the repair staff to schedule maintenance prior to the breakdown.
  • Stop predicting the past; predict the future: Market trend analysis as performed in data warehousing, OLAP and analytic applications often asks what customers are buying or using which product or service, and then draws a straight line from the past into the future, extrapolating a trend. This too can be described as data mining. One might argue this predicts the future because it says something about what will happen, but a more accurate description would be that it "predicts the past" and then projects that into the future. The prediction is not really in the analysis. Furthermore, data mining in the limited sense used here is only able to envision continuous change - extending the trend from past to future. Predictive analytics is also able to generate scores from models that envision discontinuous changes - not only peaks and valleys, but also cliffs and crevasses. This is especially the case with "black box" type functions such as neural networks and genetic programming (which, of course, contain special challenges of their own). Rarely do applications in OLAP, query and reporting or data warehousing explicitly relate independent and dependent variables, but that is of the essence in predictive analytics. For example, KXEN is used to find the optimal point between the savings of catching a bad customer versus the cost of turning away a good paying customer (opportunity cost).
  • Invent hypotheses, not merely test them: Finally, data mining is distinguished from predictive analytics in terms of hypothesis formulation and validation. For example, one hypothesis is that people default on loans due to high debt. Once the analyst formulates this hypothesis by means of imaginative invention out of her or his own mind, the OLAP analyst then launches queries against the data cube to confirm or invalidate this hypothesis. Predictive analytics is different in that it can look in the data for patterns that are useful in formulating a hypothesis. The analyst might not have thought that age was a determinant of risk, but a pattern in the data might suggest that as a useful hypothesis for further investigation.

Borderline examples are abundant. These include cases such as fraud detection, which are not primarily predictive, but also deserve attention. They map to the current approach because the assertion that the credit card transaction or insurance claim is fraudulent is like a hypothesis to be further tested and invalidated or confirmed. Because the hypothesis formulation and verification is similar to the scientific method, the results of predictive analytics are often dignified by being described as "knowledge discovery." For example, when Farmer's Insurance determined that drivers of sports cars were at higher risk for collisions than drivers of Volvos, they were predicting the obvious. However, they used IBM's Intelligent Miner to predict that drivers of sports cars who also owned "safe" cars such as Volvos fit the profile of the "safe" family car, not the risky sport car driver. Therefore, they were able to make them an offer of a modest discount and pocket the difference, creating a win/win scenario. This is something that a packaged solution is often unable to do because it lacks the flexibility and the collaborative context needed to explore variations in the independent variables.

In spite of precise differentiators between data mining and other specialties, the average person is likely to continue to refer to "data mining" as any computationally intense process that uses a large volume of data. No one is proposing to legislate how people speak. Gold mining metaphors will continue to be pervasive in marketing messages and industry discussions. The suggestion here is to substitute "predictive (analytics)" for the term when it is used in data mining requirements analyses, marketing messages, analyst reports or other professional conversations to see if it holds up to the differentiators detailed in this article. The predictable result is to deflate the hyperbole as well as that of "wanna-be" data warehousing, business intelligence or data mining vendors aspiring to predictive analytics without predictive technology. In spite of laboring mightily to differentiate between the terms, do not forget that data warehousing, data mining, OLAP and predictive analytics can often be complementary and strengthen one another. The point is simply that functionality must be diligently qualified. While the term "data mining" will continue to be used, it is important to realize that the truth and future of the term data mining lies in predictive analytics.

Lou Agosta is an independent industry analyst in data warehousing. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, data mining and data quality. He can be reached at LAgosta@acm.org.

Source: http://www.information-management.com/issues/20040801/1007209-1.html?pg=2

Saturday, January 23, 2010

Topic 10 - Implementing Enterprise BI systems

This week's lecture is about implementing a BI system and a Business Intelligence Competency Center (BICC). The following is a guide on implementing a BI system:

Definitions and Overview

Business Performance Management (BPM) establishes a framework to improve business performance by measuring key business characteristics which can be used to feedback into the decision process and guide operations in an attempt to improve strategic organisational performance. Other popular terms for this include; Enterprise PM (EPM), Corporate PM (CPM) Enterprise Information Systems (EIS), Decision Support Systems (DSS), Management Information Systems (MIS).

BPM: Cycle of setting objectives, monitoring performance and feeding back to new objectives.
Business Intelligence (BI) can be defined as the set of tools which allows end-users easy access to relevant information and the facility to analyse this to aid decision making. More widely the 'intelligence' is the insight which is derived from this analysis (eg. trends and correlations).

BI: Tools to Access & Analyse Data

Key Performance Indicators (KPIs) are strategically aligned corporate measures that are used to monitor, predict and anticipate the performance of the organisation. They form the basis of any the BPM solution and in an ideal world it should be possible to relate strategic KPIs to actual operational performance within the BI application.
KPIs provide a quick indication on the health of the organisation and guide management to the operational areas affecting performance.

In many companies analysis of data is complicated by the fact that data is fragmented within the business. This causes problems of duplication, inconsistent definitions, inconsistency, inaccuracy and wasted effort.
Silos of Data: Fragmented, Departmental Data Stores, often aligned with specific business areas.
Data Warehousing (DWH) is often the first step towards BI. A Data Warehouse is a centralised pool of data structured to facilitate access and analysis.

DWH: Centralised/Consolidated Data Store

The DWH will be populated from various sources (heterogeneous) using an ETL (Extract, Transform & Load) or data integration tool. This update may be done in regular periodic batches, as a one off load or even synchronised with the source data (real time).

ETL: The process of extracting data from a source system, transforming (or validating) it and loading it into a structured database.

A reporting (or BI) layer can then be used to analyse the consolidated data and create dashboards and user defined reports. A modelling layer can be used to integrate budgets and forecasting.

As these solutions get more complex, the definitions of the systems and what they are doing becomes more important. This is known as metadata and represents the data defining the actual data and its manipulation. Each part of the system has its own metadata defining what it is doing. Good management & use of metadata reduces development time, makes ongoing maintenance simpler and provides users with information about the source of the data, increasing their trust and understanding of it.

Metadata: Data about data, describing how and where it is being used, where it came from and what changes have been made to it.

Commercial Justifications

There is clear commercial justification to improve the quality of information used for decision making. A survey conducted by IDC found that the mean payback of BI implementation was 1.6 years and that 54% of businesses had a 5 year ROI of >101% and 20% had ROI > 1000%.

ROI on BI > 1000% from 20% of organisations

There are now also regulatory requirements to be considered. Sarbanes-Oxley requires that US listed companies disclose and monitor key risks and relevant performance indicators - both financial and non financial in their annual reports. A robust reporting infrastructure is essential for achieving this.

SarbOx requires disclosure of financial & non-financial KPIs

Poor data quality is a common barrier to accurate reporting and informed decision making. A good data quality strategy, encompassing non system issues such as user training and procedures can have a large impact. Consolidating data into a DWH can help ensure consistency and correct poor data, but it also provides an accurate measure of data quality allowing it to be managed more pro-actively.

Data Quality is vital and a formal data quality strategy is essential to continually manage and improve it.

Recent research (PMP Research) asked a broad cross section of organisations their opinion of their data quality before and after a DWH implementation.

- "Don't know" responses decreased from 17% to 7%
- "Bad" or "Very Bad" decreased from 40% to 9%
- Satisfactory (or better) increased from 43% to 84%

DWH implementations improve Data Quality.

Tools Market Overview

At present BI is seen as a significant IT growth area and as such everyone is trying to get onto the BI bandwagon:

ERP tools have BI solutions e.g SAP BW, Oracle Apps
CRM tools are doing it: Siebel Analytics,
ETL vendors are adding BI capabilities: Informatica
BI vendors are adding ETL tools: Business Objects (BO) Data Integrator (DI), Cognos Decision Stream
Database vendors are extending their BI & ETL tools:
Oracle: Oracle Warehouse Builder, EPM
Microsoft: SQL 2005, Integration Services, Reporting Services, Analytical Services

Improved Tools

Like all maturing markets, consolidation has taken place whereby fewer suppliers now cover more functionality. This is good for customers as more standardisation, better use of metadata and improved functionality is now easily available. BI tools today can now satisfy the most demanding customer's requirements for information.

Thinking and tools have moved on - we can now build rapid, business focussed solutions in small chunks - allowing business to see data, store knowledge, learn capabilities of new tools and refine their requirements during the project! Gone are the days of the massive data warehousing project, which was obsolete before it was completed.
A typical DWH project should provide usable results within 3 - 6 Months.

Advice & Best Practice

Initial Phase < 6 months

Successful BI projects will never finish. It should perpetually evolve to meet the changing needs of the business. So first 'wins' need to come quickly and tools and techniques need to be flexible, quick to develop and quick to deploy.

Experience is Essential

Often we have been brought in to correct failed projects and it is frightening how many basic mistakes are made through inexperience. A data warehouse is fundamentally different to your operational systems and getting the initial design and infrastructure correct is crucial to satisfying business demands.

Keep Internal Control

We believe that BI is too close to the business and changes too fast to outsource. Expertise is required in the initial stages, to ensure that a solid infrastructure is in place (and use of the best tools and methods.) If sufficient experience is not available internally external resource can be useful in the initial stages but this MUST include skills transfer to internal resources. The DWH can then grow and evolve (with internal resourcing) to meet the changing needs of the business.

Ensure Management and User Buy In

It may sound obvious but internal knowledge and support is essential for the success of a DWH, yet 'Reporting' is often given a low priority and can easily be neglected unless it is supported at a senior business level. It is common to find that there is a limited knowledge of user requirements. It is also true that requirements will change over time both in response to changing business needs and to the findings/outcomes of the DWH implementation and use of new tools.

Strong Project Management

The complex and iterative nature of a data warehouse project requires strong project management. The relatively un-quantifiable risk around data quality needs managing along with changing user requirements. Plan for change and allow extra budget for the unexpected. Using rapid application development techniques (RAD) mitigates some of the risks by exposing them early in the project with the use of proto-types.

Educating the End Users

Do not under estimate the importance of training when implementing a new BI/ DWH solution. Trained users are 60% more successful in realising the benefits of BI than untrained users. But this training needs to consider specific data analysis techniques as well as how to use the BI tools. In the words of Gartner, "it is more critical to train users on how to analyse the data." Gartner goes on to say "... that focusing only on BI tool training can triple the workload of the IT help desk and result in user disillusionment. A user who is trained on the BI tool but does not know how to use it in the context of his or her BI/DWH environment will not be able to get the analytical results he or she needs...". Hence bespoke user training on your BI system and data is essential.

Careful planning of the training needs and making the best use of the different training mediums now available can overcome this issue. Look for training options such as: Structured classroom (on or off site), web based e-learning (CBT), on the job training & skills transfer, bespoke training around your solution & data.

Technical Overview

Information Portal: This allows users to manage & access reports and other information via a corporate web portal. As users create & demand more reports the ability to easily find, manage & distribute them is becoming more important.
Collaboration: The ability for the Information Portal to support communication between relevant people centred around the information in the portal. This could be discussion threads attached to reports or workflow around strategic goal performance.
Guided Analysis: The system guides users where to look next during data analysis. Taking knowledge from people's heads and placing it in the BI system.
Security: Access to system functionality and data (both rows and columns) can be controlled down to user level and based on your network logon.
Dashboards & Scorecards:
Providing management with a high level, graphical view of their business performance (KPIs) with easy drill down to the underlying operational detail.
Ad-hoc Reporting and Data Analysis: End users can easily extract data, analyse it (slice, dice & drill) and formally present it in reports & distribute them.
Formatted/ Standard Reports: Pre-defined, pixel perfect, often complex reports created by IT. The power of end user reporting tools and data warehousing is now making this type of report writing less technical and more business focussed.
Tight MS Office integration: More users depend on MS Office software, therefore the BI tool needs to seamlessly link into these tools.
Write Back: The BI portal should provide access to write back to the database to maintain: reference data, targets, forecasts, workflow.
Business Modelling/ Alerting: around centrally maintained data with pre-defined, end user maintained, business rules.
Real Time: As the source data changes it is instantly passed through to the user. Often via message queues.
Near Real Time: Source data changes are batched up and sent through on a short time period, say every few minutes - this requires special ETL techniques.
Batch Processing: Source Data is captured in bulk, say overnight, whilst the BI system is offline.

Relational Database Vs OLAP (cubes, slice & dice, pivot)

This is a complex argument, but put simply most things performed in an OLAP cube can be achieved in the relational world but may be slower both to execute and develop. As a rule of thumb, if you already work in a relational database environment, OLAP should only be necessary where analysis performance is an issue or you require specialist functionality, such as budgeting, forecasting or 'what if' modelling. The leading BI tools seamlessly provide access to data in either relational or OLAP form, making this primarily a technology decision rather than a business one.

Top Down or Bottom Up Approach?

The top down approach focuses on strategic goals and the business processes and organisational structure to support them. This may produce the ideal company processes but existing systems are unlikely to support them or provide the data necessary to measure them. This can lead to a strategy that is never adopted because there is no physical delivery and strategic goals cannot be measured.

The bottom up approach takes the existing systems and data and presents it to the business for them to measure & analyse. This may not produce the best strategic information due to the limited data available and data quality.

We recommend a compromise of both approaches: Build the pragmatic bottom up solution as a means to get accurate measures of the business and a better understanding of current processes, whilst performing a top down analysis to understand what the business needs strategically. The gap analysis of what can be achieved today and what is desired strategically will then provide the future direction for the solution and if the solution has been designed with change in mind, this should be relatively straight forward, building upon the system foundations already in place.

Advanced Business Intelligence

The following describes some advanced BI requirements that some organisations may want to consider: Delivering an integrated BPM solution which has business rules and workflow built in allowing the system to quickly guide the decision maker to the relevant information.

Collaboration and Guided Analysis to help manage the action required as a result of the information obtained.
More user friendly Data Mining and Predictive Analytics, where the system finds correlations between un-related data sets in order to find the 'golden nugget' of information.

More integration of BI information into the Front Office Systems e.g. a gold rated customer gets VIP treatment when they call in, data profiling to suggest this customer may churn, hence offer them an incentive to stay.

Increased usage of Real Time data.

End to end Data Lineage automatically captured by the tools. Better metadata management of the systems will mean that users can easily see where the data came from and what transformations it has undergone, improving the trust in the data & reports. Systems will also be self documenting providing users with more help information and simplifying ongoing maintenance.

Integrated, real time Data Quality Management as a means to measure accuracy of operational process performance. This would provide cross system validation, and verify business process performance by monitoring data accuracy, leading to better and more dynamic process modelling, business process re-engineering and hence efficiency gains.

Packaged Analytical Applications like finance systems in the 80's and packaged ERP (Enterprise Requirement Planning) in the 90's. Packaged BI may become the standard for this decade. Why build your own data warehouse and suite of reports and dashboards from scratch when your business is similar to many others? Buy packaged elements and use rapid deployment templates and tools to configure them to meet your precise needs. This rapid deployment capability then supports you as your business evolves.


BI for the masses: As information becomes more critical to manage operational efficiencies, more people need access to that information. Now the BI tools can technically and cost effectively provide more people with access to information, BI for the masses is now reality and can provide significant improvement to a business. The increased presence of Microsoft in the BI space will also increase usage of BI and make it more attractive. BusinessObjects' acquisition of Crystal and recent release of XI will also extend BI to more people, in and outside the organisation - now everyone can be given secure access to information!

Conclusion

The potential benefits from a BI/DWH implementation are huge but far too many companies fail to realise these through: lack of experience, poor design, poor selection and use of tools, poor management of data quality, poor or no project management, limited understanding of the importance of metadata, no realisation that if it is successful it will inevitably evolve and grow, limited awareness of the importance of training..... with all these areas to consider using a specialist consultancy such as IT Performs makes considerable sense.

Source: http://ezinearticles.com/?Quick-Guide-to-Implementing-Business-Intelligence,-Data-Warehousing-and-BPM&id=3465076

Saturday, January 16, 2010

Topic 9 - Text and Web Mining

This week's lecture is quite interesting as it involves something that most people have done before. Text mining is a method to uncover information hidden in text and applies data mining to unstructured or less structured text files. It entails the generation of meaningful numerical indices from the unstructured text and process these indices using various data mining algorithms. It attempts to categorise textual data and not understand its contents. Text mining is used in automatic detection of email spam or phishing through analysis of the document content, analysis of warranty claims, help desk calls/reports, and so on to identify the most common problems and relevant responses etc.

There are 3 types of Web Mining. Web content mining refers to the extraction of useful information from Web pages, Web structure mining refers to the development of useful information from the links included in the Web documents, and Web usage mining refers to the extraction of useful information from clickstream analysis of Web server logs containing details of webpage visits, transactions etc. As mentioned above, web mining can extract information such as visits to websites like Amazon.com. From there, the website can make recommendations based on the customer's past visits and purchases.

Sunday, January 3, 2010

Topic 8 - Regression and Neural Networks

This week, the lecture is about regression and neural networks.Regression models is about using mathematical equation to relate one or more numeric input attributes to a single numeric output attribute. We can use regression models for fit data, time-series data (for forecast) or other data (for prediction). Some of the terms use are R2(squared) and t intercept.

Neural networks is computer technology that attempts to build computers that will operate like a human brain. The machines possess simultaneous memory storage and works with ambiguous information. Neural networks accept inputs and outputs that are numerical, and where the relationships between inputs and outputs are not linear or the input data are not normally distributed.Some applications include approval of loan applications, fraud prevention and time-series forecasting.

Saturday, December 5, 2009

Topic 5/6 - Information Dashboard Design

Even though the lectures are from 2 weeks, i'll combine them as they are the same topic. In this topic, we learnt about the types of dashboards and what makes a good dashboard. Firstly, dashboards are visual representation of data and are supposed to deliver a clear message to the user at a glance. Therefore it is important to understand the power of visual perception. The Gestalt Principles of Visual Perception is important as it is useful in designing a dashboard. The 6 principles are Principle of proximity, closure, similarity continuity, enclosure and connection. This websites shows good examples of the various principles. http://graphicdesign.spokanefalls.edu/tutorials/process/gestaltprinciples/gestaltprinc.htm

To create a good dashboard, we should reduce the non-data pixels by eliminating all unnecessary non-data pixels and de-emphasize and regularize the non-data pixels that remain, and enchance the data pixels by eliminating all unncessary data pixels and highlight the most important data pixels that remain. Non-data pixels would include decorations and borders.

The second part of information dashboard design is basically about choosing the appropriate display media. For example, line graphs are good for showing trends while bar graphs are good for comparision.

Saturday, November 14, 2009

Topic 4 - Data Warehouse and OLAP

This week's BI is mainly about Data Warehouse. This week's lecture also showed the difference between DBMS, OLAP and Data Mining. We also learnt something called the Data Mart, which is something like a smaller version of the Data Warehouse, and it caters to one department, for example sales. A few of the data marts can make up to a data warehouse.

The various schemas are revisited this week while more in-depth. Even though this is learnt during Data Mining, i have completely forgotten what this is about. http://en.wikipedia.org/wiki/Star_schema

There is also something called the parent child dimensions and slowly changing dimensions(SCD). The former is based on two dimension table columns that together define the relationships among the members of the dimension. SCD is dimensions that change over time, for example salesperson moving from one sales territory to another. There will be changes to organisation chart as employees are promoted or resigned. There are a total of 3 types of SCDs.

We also learnt about Relational OLAP(ROLAP), Multidimensional OLAP(MOLAP) and Hybrid OLAP(HOLAP) and the differences.
http://en.wikipedia.org/wiki/ROLAP
http://en.wikipedia.org/wiki/MOLAP

Sunday, November 8, 2009

Topic 3 - Developing Dashboards

This week's BI is mainly about creating dashboards and process maps. First, we need to know what is a business process. A business process is a series of related activities that "flow" through an organisation. It is not limited to a single function or department and is something that can be viewed from end to end.

We learnt about the advantages of process mapping and the process framework. We also learnt how to create a cross-functional flowchart, which is also called swim lane. It basically looks like a swimming pool with each lane showing the different departments involved as well as customers. It shows how each department contributes to the process and which department contributes to which action. http://en.wikipedia.org/wiki/Swim_lane