Data Mining for Business Decisions Part-1
Data Mining
Data
mining refers to the process of searching and analyzing a large batch of raw data to discover patterns and extract
useful information and transform it into an understandable structure for further
use.
With the rise of technologies, the volume of data available for mining is
growing.
Business Decisions
Business Decision explains making
choices that influence the direction, growth and success of a company and
determine short-term and long-term organizational activities.
Why is data mining required for
business decisions?
In today’s data-driven world, data
mining is required for business decisions because it helps organizations
extract valuable insights from large datasets and enable them to make more
informed, accurate and strategic choices.
Strategic information
Strategic
information is the kind of data and insights that are important for making
long-term decisions which shape the overall direction of a business.
It
supports strategic planning and helps businesses align their goals, resources
and actions with their mission and vision.
Need for strategic
information
Strategic information drives effective
decision-making, competitive positioning and planning.
Strategic information is important and
required for making informed decisions that guide an organization towards
achieving its long-term goals.
Why is strategic information needed ?
·
Market Positioning – To
understand customer needs, market trends and competitor activities and it
allows organizations to position themselves effectively.
·
Informed Decision-Making – Insights
and data to make informed choices that align with the organization’s long-term
goal are provided by strategic information.
·
Competitive Advantage- Strategic information helps organizations develop
unique strategies which differentiate them from competitors.
·
Long –Term Planning – Strategic information is required by organization to
develop effective long-term plans.
·
Resource Allocation – Strategic information is required for allocating
resources efficiently.
·
Risk management-
Strategic information helps in identifying risks and uncertainties.
·
Performance Monitoring- To track performance against objectives and goals,
strategic information is required.
·
Innovation and Growth- Strategic information is needed to identify
opportunities for growth and innovation.
·
Stakeholder Communication – Strategic information is required for effective
communication with stakeholders.
Operational and
Informational Data Stores
Data
Stores refer to repositories used to store, manage, and distribute data sets.
Operational Data Store
An operational data store is
a central database that
aggregates data from multiple systems and used for operational
reporting and as a source of data for the enterprise data warehouse (EDW).
Informational Data Store
An informational data store
refers to data storage system designed to support business intelligence,
analytics and reporting by storing and managing large volumes of data over
extended periods.
Difference between operational and
informational data stores
Data Warehouse
A data warehouse is an enterprise system
and a centralized repository used for the analysis and reporting of structured
and semi-structured data from multiple sources such as customer relationship
management, marketing automation etc.
Characteristics
·
Historical Data
Storage
·
Subject-Oriented
·
Integrated Data
·
Non-Volatile
·
Data Cleansing and
Transformation
·
Optimized for
Querying and Reporting
·
Scalability
·
Metadata Management
·
High Availability
·
Data Modeling
Role and Structure
Role
·
Optimized for
Complex Queries
·
Data Analysis
·
Decision Support
·
Performance
Optimization
·
Data integration
·
Data Consistency
and Quality
·
Historical Data
Storage
·
Centralized Data
Repository
Structure
A data warehouse has a multi-layered
structure which supports efficient data processing and analysis.
Key Components
- Data Sources
- ETL Process
- Staging Area
- Data Marts
- Metadata
- Data Governance and Security
- Access Layer
·
Introduction to Business
Intelligence
Business Intelligence or BI is the
technology-driven process of analyzing data and presenting actionable
information to help managers, executives and other corporate end users make
informed business decisions.
It
combines business analytics, data mining, data tools and infrastructure,
data visualization, and best practices to help
organizations make more data-driven decisions.
Key Components
- Data Sources
- Data Warehousing
- Data Analysis Tools
- Decision-Making Support
- Reporting and Visualization
Benefits
- Competitive Advantage
- Increased Operational Efficiency
- Improved Decision-Making
- Enhanced Customer Understanding
Some BI Tools and Platforms
- Tableau
- Microsoft Power BI
- QlikView/Qlik Sense
- IBM Cognos Analytics
- SAP BusinessObjects
- Sisense
- Looker
- Zoho Analytics
- Oracle Analytics Cloud
- Domo
Introduction to OLAP and its
Operations
Online analytical
processing or OLAP refers to the kind of
software technology which can be used to analyze business data from different points
of view.
Organizations collect
and store data from multiple data sources, such as websites, applications
etc. OLAP combines and groups this data into categories to provide actionable
insights for strategic planning.
It helps organizations
process and benefit from a growing amount of digital information.
OLAP solves complex analytical
programs.
It processes large
amounts of data from a data mart, data warehouse or other data
storage unit.
Benefits
- Non-technical user support
- Faster decision making
- Integrated data view
- Multidimensional Analysis
Types
- Multidimensional OLAP
- Relational OLAP
- Hybrid OLAP
OLAP Operations
- Roll-Up- Roll-up involves aggregating data by climbing up the hierarchy of dimensions.
- Drill-Down- It is the opposite of roll-up and involves breaking down data into more detailed levels.
- Dice- Dicing is similar to slicing but it involves selecting two or more dimensions to create a smaller subcube.
- Slice- Slicing involves selecting a single dimension from the OLAP cube and fixing it at a particular value, creating a subcube.
- Pivot- It involves rotating the data cube to view the data from different perspectives.
- Drill-Through- It allows users to access detailed data from the underlying transactional database.
- Drill-Across- It involves accessing related data from different fact tables within the same schema.
Data Mart
A data mart refers to a
data storage system that contains information specific to an organization's
business unit. It contains a small and selected part of the data that the
company stores in a larger storage system.
To analyze
department-specific information more efficiently, data mart is used by
companies.
It is a subset of a data
warehouse which is focused on a specific business line or team.
Features
- Smaller in Scope
- Faster Access
- Subject-Oriented
- Simplified Data Structure
Types
- Dependent Data Mart
- Independent Data mart
- Hybrid Data Mart
Building a Data Warehouse
To
build a data warehouse is a complex process which involves designing and
implementing a system to consolidate, store and manage large volumes of data
from various sources.
Steps to build data warehouse
- Define Business Requirements
- Data Modeling and Design
- Choosing the Right Technology
- ETL Process
- Data Integration and Quality Management
- Build and Populate the Data Warehouse
- Create Reports, Dashboards and Analytics
- Performance Tuning and Optimization
- Security and Data Governance
- Maintenance and Support
- Evaluate and Iterate
Introduction to Dimensional Modeling and ETL Process
Dimensional Modeling
It is
a design technique used in BI systems and data warehouses to structure data in
a way that is optimized for querying and reporting.
It is
focused on ease of use and performance in analytical tasks.
Concepts in Dimensional Modeling
- Fact Table- It contains quantitative data that users want to analyze.
- Dimension Table- It stores descriptive information about the business entities related to the facts such as time, products etc.
- Star Schema- It is the simplest type of dimensional model where a central fact table is directly linked to dimension tables. It resembles star with the fact table at the center and dimensions as points radiating out.
- Factless Fact Table- It captures events or conditions that don’t have associated numerical measures but are important to track.
- Snowflake Schema- It is a more normalized version of the star schema. In this, dimension tables are further broken down into related tables, resulting in a “snowflake” structure.
- Grain of a Fact Table- It describes the level of detail represented by each record.
· Benefits
·
Flexibility
·
Supports Complex Analysis
·
Efficient Query Performance
·
User-Friendly
ETL Process
ETL or
extract, transform, load refers to a data integration process that combines, cleans and organizes
data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other
target system.
Extract, transform, and
load or ETL improves business intelligence and analytics by making the process more
reliable, efficient, detailed, and accurate.
It involves extracting
data from various sources, transforming it into a suitable format, and loading
it into a data warehouse.
Key Stages of the ETL Process
·
Extract – This phase involves collecting data from various source
systems which may include databases, flat files, APIs and cloud services.
· Transform- This phase
converts the raw data into a clean and usable format that aligns with the
schema of the data warehouse.
Common
Transformations - Data Cleaning, Data Integration, Data Aggregation, Data Normalization/ Denormalization,
Data Formatting
·
Load- This phase
involves transferring the transformed data into the data warehouse.
Types of
Loading:
1.
Initial Load- Loading all historical data into the warehouse for the
first time.
2.
Incremental
Load-
Periodically loading new or updated data, typically on a daily, weekly, or
monthly basis.
3.
Full Refresh- Replacing
existing data with new data(less common).
· Benefits
- Improved Data Quality
- BI Support
- Scalability
- Data Consolidation
· Thank You
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
Very nice informative With our Email Marketing Software, you can organize contacts, segment your audience, and send personalized email campaigns that get results. Combined with the Email Extractor Email Marketing Software
ReplyDelete