Thursday, January 20, 2011

Data-Warehouse Concepts

Q. What is a Data Warehouse?


Ø  A data warehouse is integrated information collected from multiple sources that becomes the foundation for decision support and data analysis.
Ø  A data warehouse is a relational database that is designed for query and analysis rather than transaction processing. It usually contains historical data that is derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
    Characteristics of Data Warehouse
1.     Subject Oriented: A data warehouses is designed with the purpose of analyzing a particular area of business, e.g. Sales or Finance.
2.     Integrated: Data warehouses have the data from disparate sources put into a consistent format.
3.     Static/Non volatile: As the data is made for analysis it is said to be static or Nonvolatile means that the data should not change once entered into the warehouse.
4.     Time Variant: Historical data has to be maintained to analyze the business or market trends.

Q. What is Data Mart?
Ø  Data Marts can be said to be a subset of data warehouse or can be a small data warehouse itself. It is logical grouping of the data warehouse dimensions and the related fact, created to meet a specific group of users or requirements. It is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. E.g. costing Data Mart, sales Data Mart. A Data Mart tends to be tactical and aimed at meeting an immediate need or meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. Users of a Data Mart can expect to have data presented in terms that are familiar or specific to a group of users. Data marts can be part of an Enterprise Wide Data Warehouse.

Q. What are the different types of data-mart?

Ø  Data-marts are classified based on the data-sources used to build the data-mart.
The different types of data-mart are:
a)     Dependent data-mart: built using data-warehouse as a source.
b)    Independent data-mart: built using the operational data source.
c)     Hybrid/Federated data-mart: using both data-warehouse and operational data source.

Q. What is metadata?

Ø  Metadata is the information about the data. This is the layer of the data warehouse, which stores the information about the various aspects of data warehouse, like the source data, transformed data, date and time of data extraction, target databases, date and time of data loading, how the structures and calculation rules are stored, plus, possibly, additional information on data sources, definitions, quality, transformations, date of last update, user access privileges, etc. For ex in BO, the repository is the Metadata. In an ETL tool, the metadata contains information about the source tables, target tables, transformations, mappings etc. An Ideal situation is when we can control everything in a Data Warehouse through a Useful Metadata; that means data loading, data cleansing, transformations, reporting, admin activities, security, etc.

Q. What is DataWarehousing? Why is it useful and important?
Ø  Data Warehousing is a way to convert huge volume of data  into useful information, which can be used for making business decisions.
           
It is useful in the below situations:
a)     Complex Analysis
b)    What if Analysis?
c)     Past & present trend analysis
d)    Moving averages
e)     Multidimensional Analysis
f)     Slice and Dice of data
g)    Drill Down and Drill Up to different levels of data

Q. What are the differences between OLTP and OLAP (Data-Warehouse)?

Ø  The differences between OLAP and OLTP can be listed as below:


OLAP
OLTP
Definition
On Line Analytical Processing
On Line Transactional Processing
Example
Data Warehouse
ERP, Legacy system
Data
Static. A time frame is decided to load the data into data warehouse, so data remains stagnant over a certain period.
Dynamic. As the updates, deletions and modifications are online the data is continuously changing thus not helpful for analysis or decision making
History
Historical data is stored thus making it difficult to study the trend of the business over the past to help in analysis.
Old data is purged or archived
Data Atomicity
Data is aggregated or summarized and stored at the higher level.
Data is stored at microscopic level
Normalization
Denormalized database are used to maintain detailed information in a row of record
Normalized databases
Joins in queries
Lesser and easier joins as the tables are Denormalized
More and complex joins as the tables are normalized
User
Senior management or Sales and marketing head to analyze the business trends and make decisions
Operational staff adding, modifying or deleting day to day transactions
Performance
Faster and better and ease of use as non technical people can create ad-hoc reports
Complex
Read-write
Read only data
Data can be updated, modified or deleted
Update Frequency
Updated at a fixed interval of time
Continuously updated
Example
Data Warehouse
ERP, Legacy system


Q. What are the major stages/steps in a Data Warehousing project?

Ø  The steps in a data warehousing project can be listed as:

a)     Understanding the business
b)    Understanding the present and future needs
c)     Strategic Planning
d)    Design of Data Warehouse
e)     Extraction of Data from different sources to a common staging area
f)      Cleaning of data
g)    Transformation of Data
h)    Transportation of  Data
i)      Analysis of data (OLAP)

Q. What are the different tools used in data-warehousing projects?

Ø  The tools used in data-warehousing projects can be categorized into three technical streams:

a)      Database (Data warehouse, Data Mart).
b)    ETL (Extraction, Transformation, Load). 
c)     Reporting (OLAP Tools).

Some tools and products available from different vendors are listed below:

 

10 comments:

  1. nice meterial,but it doesn't have full details

    ReplyDelete
  2. This is such a well detailed information regarding the concepts of a warehouse making. I'm going to bookmark this for future reference. I'm sure that this will be helpful for me in the near future. Thanks for sharing.

    ReplyDelete
  3. Nice helpful....please add some more concept which you think we should know..
    Thank You

    ReplyDelete
  4. Being new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well as giving..

    Informatica Training in Chennai

    Dataware Housing Training in Chennai

    ReplyDelete
  5. Sharing the same interest, Infycle feels so happy to share our detailed information about all these courses with you all! Do check them out oracle plsql training in chennai & get to know everything you want to about software trainings.

    ReplyDelete
  6. Grab Data Science Certification in Chennai for skyrocketing your career with Infycle Technologies, the best Software Training & Placement institutes in and around Chennai. In addition to the Certification, Infycle also gives the best placement training for personality tests, interview preparation, and mock interviews for leveling up the candidate's grades to a professional level.

    ReplyDelete
  7. Thanks a lot very much for the high quality and results-oriented help.
    I won’t think twice to endorse your blog post to anybody who wants
    and needs support about this area.
    software testing institute in Chennai
    javascript course in Chennai
    mysql dba online training in Chennai

    ReplyDelete