Oracle Advanced person Needed

I need attached work done in exact 7 hours

Dont bid before you read the work.

Only oracle master needed who knows star schema.

My budget is fixed USD 40.

If you need more, dont bid else I will report you.

FIT5195 S1 2020 – Take Home Test (10%)

Due date: Week 6, Friday 1-May-2020, 11:55pm

This Take Home Test consists of two Case Studies. You are required to answer all questions.

Case Study 1: Government Hospital Data Warehouse (50 marks)
Currently, the Victorian government has a simple database system implemented to store the

collective data of all government-run hospitals. The database system includes information about their

services, staff, and patients. The simple database system consists of the following tables and

attributes:

Figure 1 : A simplified structure of the Government Hospital’s current E/R Diagram

Due to the nationwide impact of a new pandemic, the Premier of Victoria (i.e. the head of government

in the Australian state of Victoria) requires a data warehouse for analyzing trends and forecasting

purposes. The goal of the analyzing is to identify which frequently used services, popular hospital

locations, etc.

You are required to design a small Data Warehouse to keep track of the statistics. The Premier is

particularly interested in analyzing the total number (population quantity) of patients and total

service charged by services used, time periods, age groups, and hospital locations.

The Premier also wants to be able to drill down based on the cost of services range (low price < $20,

medium price between $20 and $50, and high price >$50).

Tasks:

[1]. Create the operational database based on the above E/R Diagram, and then populate with
sufficient numbers of records in each table. It is suggested that each table would have 5-20

records depending on the use of that table in the operational database. You need to create a script

file containing the CREATE TABLE and INSERT INTO commands.

You can follow the list of services from this link:

https://www.monash.edu/health/medical/services.

[2]. Then develop a GOVERNMENT HOSPITAL star schema. Identify the fact table, dimensions
and attributes required to support the schema. If the star schema consists of a Bridge Table, you

have

to also include the Weight Factor and List Aggregate.

The result of this task is a star schema diagram. You can use any drawing tool, such as

Lucidchart, to draw the star schema.

[3]. Validate your star schema using the Two-Column Table Methodology. You are required to
illustrate some two-column tables for this task based on your star schema design.

[4]. Write the SQL commands to create the fact and dimension tables. You need to create a script
file containing the appropriate SQL commands to create the fact and dimension tables.

[5]. Write the SQL commands to answer the following queries: (you need to make sure that there are
records in your fact and dimensions tables. For each of the following queries, write the SQL and

show the results):

senior 65+) in April 2020.

d) Show the total service charged for general medical consultations in each suburb.

Submission checklist for Case Study 1:

A. A pdf file or word document containing:

i. The SQL commands to create tables and insert records to the operational database –

Task 1

ii. The star schema diagram – Task 2

iii. The Two-Column Table Methodology illustration – Task 3

iv. The SQL commands to create the dimension and fact tables, as well as the contents of

these tables – Task 4

v. The SQL commands to answer the queries in Task 5 and the query results

B. The .sql files containing:

i. The SQL commands to create the operational database.

i. The SQL commands to create the data warehouse.

iii. The SQL commands to answer the queries in Task 5.

(Notes: The marker of this test will simply run this file. So you need to make sure that the

SQL commands are written correctly in the .sql file. Pay a particular attention to the quote,

especially if you cut and paste from a Word file).

C. A folder:

i. Save all the files from [A] and the SQL files from [B] in one folder.

ii. Name the folder CaseStudy1_yourStudentID.

https://www.monash.edu/health/medical/services

Case Study 2: Accident Records Data Warehouse (50 marks)

Figure 2 : A simplified structure of the Accident Record’s current database system

The above figure displays how the Victorian Roads and Safety (VicR&S) Department stores accident

related data. When there are any road accidents, the details about the accidents such as the location,

date, time and event (e.g. what exactly happened during the accident) are recorded. For each accident,

the information (e.g. name, contact number, employment start date, current employment branch)

about one police officer who is in charge of overseeing the events of the accident (e.g finding out

who/what caused the accident) are stored. In addition, to identify a particular accident, each accident

is given a unique accident number.

On the other hand, one accident may be caused by several vehicles (e.g. a collision between 2

vehicles). All vehicles have owners who may or may not be the same as the vehicle driver during the

accident. Therefore, the VicR&S database stores information about vehicles involved in the accident

(i.e. vehicle number, vehicle model, vehicle make and vehicle manufacturer), the information about

the owners of the vehicles (i.e. owner name, owner address, owner contact number) and the

information about the drivers at the time of the accident (driver name, driver address, driver contact

number, driver licence no, and when the driver started to drive the vehicle). Furthermore, for each

accident, the severity of damage incurred by each vehicle is recorded.

Currently, to reduce the number of road accidents VicR&S department is interested in the following:

– The total number of accidents happening by different locations and by different lighting
periods (daytime: 6AM – 5:59PM and nighttime 6PM – 5:59AM).

– The total number of accidents by each vehicle model.
– The number of vehicles involved in every accident event on different locations.
– The number of accidents taken care of different police officer branches.

Tasks:

[1]. Develop an ACCIDENT RECORDS star schema. Identify the fact table, dimensions and
attributes required to support the schema. If the star schema consists of a Bridge Table, you have

to also include the Weight Factor and List Aggregate.
The result of this task is a star schema diagram. You can use any drawing tool, such as
Lucidchart, to draw the star schema.

[2]. Validate your star schema using the Two-Column Table Methodology. You are required to
illustrate some two-column tables for this task based on your star schema design.

[3]. Write the SQL commands to create the fact and dimension tables. You need to create a script
file containing the appropriate SQL commands to create the fact and dimension tables. The

operational tables are accessible from the ACCIDENT account. The result of this task is the SQL

commands. You will also need to show the contents of the tables that you have created.

[4]. Write the SQL commands to answer the following queries: (you need to make sure that there are
records in your fact and dimensions tables. For each of the following queries, write the SQL and

show the results):

a) Show the total number of accidents happening by different locations and by different
lighting periods (daytime: 6AM – 5:59PM and nighttime 6PM – 5:59AM).

b) Show the total number of accidents by each vehicle model.
c) Show the number of vehicles involved in every accident event on different locations.
d) Show the number of accidents taken care of by different police officer branches.

[5]. You need to come up with additional two more questions and answer these questions using the
SQL commands. Also explain the reason for why the management would like to have such

information.

Submission checklist for Case Study 2:

A. A pdf file or word document containing:

i. The star schema diagram – Task 1

ii. The Two-Column Table Methodology illustration – Task 2

iii. The SQL commands to create the dimension and fact tables, as well as the contents of

these tables – Task 3

iv. The SQL commands to answer the queries in Task 4 and the query results

v. The additional two questions with the SQL commands, query result, and your
explanation – Task 5

B. The .sql files containing:
i. The SQL commands to create the data warehouse.

ii. The SQL commands to answer the queries in Task 4 and Task 5.

(Notes: The marker of this test will simply run this file. So you need to make sure that the
SQL commands are written correctly in the .sql file. Pay a particular attention to the quote,
especially if you cut and paste from a Word file).
C. A folder:
i. Save all the files from [A] and the SQL files from [B] in one folder.

ii. Name the folder CaseStudy2_yourStudentID.

Submission Method:

1. A zip file:
a. Save the CaseStudy1_yourStudentID and CaseStudy2_yourStudentID folders into one

folder named FIT5195_Test_yourname_studentID.

b. ZIP the folder FIT5195_Test_yourname_studentID.zip. This must be a ZIP file and
not other types of compressed folder. The zip file should contain the prescribed folder

structure as follows:

● FIT5195_Test_yourname_studentID/
○ CaseStudy1_yourStudentID/

○ Files for tasks

○ CaseStudy2_yourStudentID/

○ Files for tasks

2. Upload your zip file on Moodle by the due date: Friday 1-May-2020, 11:55pm (No late
submission is accepted).

END OF TAKE HOME TEST

Oracle/FIT5195-2-Star Schema

Week 2 – Star Schema

Semester 1, 2020

FIT5195 – Business Intelligence
and Data Warehousing

Developed by:
Agnes Haryanto

Agnes.Haryanto@monash.edu

MONASH
INFORMATION

TECHNOLOGY

Agenda

1. Notations and Processes

1. Star Schema Notation

2. E/R Diagram Notation

3. Transformation Process (Case Study)

2. Two-Column Table Methodology

Recall – The Big Picture

Using FLUX

1. Visit http://flux.qa/ on your internet enabled device

2. Log in using your Monash account (not required if
you are already logged in to Monash)

3. Click on the “+” to join audience

4. Enter the Audience Code:

• Caulfield – 3GANT7

• Fully Flex – 39WRG8

• Malaysia – VTVPLW

5. Select FIT5195 in the Active Presentation menu

6. Answer questions when they pop up

http://flux.qa/1AW6N8

Recall – Data Warehouse

▪ To address the drawback of operational database, and a need for decision-

making support data, data warehouse is needed.

▪ A data warehouse is a multi-dimensional view of databases, with

aggregates and pre-computed summaries.

➢ In many ways, it is basically doing aggregates in advance; that is exactly pre-

computation done at the design level, rather than at the query level.

Recall – Data Warehouse

Star Schema

▪ A Star Schema is a design representation of a multi-dimensional view. It is a

data modeling technique used to map multidimensional decision support

data into a relational database.

▪ The reason for the star schema’s development is that existing relational

modeling techniques: ER and normalization, did not yield a database

structure that served the advanced data analysis requirements well.

Star Schema Components

▪ There are Three main components of the Star Schema:

1. Facts

2. Dimensions

3. Attributes

Star Schema Components

1. Facts

Facts are numeric measurements (values) that represent a specific business aspect

or activity.

For example, sales figures are numeric measurements that represent product and/or

service sales.

2. Dimensions

Dimensions are qualifying characteristics that provide additional perspectives to a

given fact.

For example, sales might be viewed from specific dimension(s), such as sales

location, sales period, sales product, etc.

Star Schema Notation

▪ A Sales Star Schema

➢ Fact:

• Sales

➢ Dimensions:

• Time

• Product

• Branch

▪ Notation-wise, the Fact uses a bolder line, to differentiate between Fact

from Dimensions.

Star Schema Notation

▪ A Sales Star Schema

➢ Fact:

• Sales

➢ Dimensions:

• Time

• Product

• Branch

▪ The lines that represent a relationship

between the fact and dimensions can be

straight lines or bended lines.

Star Schema Notation

▪ Using the star schema notation, the

number of dimensions can be unlimited.

▪ If there is more dimensions, then we just

add more dimensions linked to the Fact.

Star Schema Components

3. Attributes

Each dimension table contains attributes.

For example:

Product dimension: Prod Type,

Description.

Location dimension: Region,

State,

City.

Time dimension: Year,

Month.

Star Schema Notation

▪ Sales Star Schema

(b) Sales Star Schema complete with the Attributes(a) Outline star schema for Sales

E/R Diagram Notation

(a) An Entity in E/R Diagram

(b) Relationships in E/R Diagram

E/R Diagram Notation

Associative

Relationship

E/R Diagram Notation

Non-Associative

Relationship

Transformation Process

Case Study #1

Case Study #1: International College

The admission office handles enrolment, payment, and marketing campaigns to international

students, often through educational agents located overseas. This admission office has an

operational system that maintains all the details of international students enrolled in the College.

Payment details are also handled by this office. Basically, the operational system has the following

features:

▪ Every student details are kept in the database. This includes the courses that the students enroll.

▪ As the College is a multi-campus university, some courses are offered in a different campus. The

admission office handles international students of all campuses.

▪ Some international students coming to the College are handled by an educational agent. This is

particularly common for the first course that a student enrolls in. Subsequent courses are not

normally handled by an agent, because the students themselves deal directly with the College.

▪ International students pay tuition fees several times (normally once every semester) for each

course they are doing.

Case Study #1: International College

The College now requires a data warehouse for analysis

purposes. The analysis is needed for identifying at least the

following questions:

1. How many students come from certain countries?

2. What is the total income for certain postgraduate

courses?

3. How many students are handled by certain agents?

4. How the number of enrolment of courses fluctuates

across the year?

Case Study #1: International College

▪ College Star Schema

➢ Fact:

• Number of Students

• Total Income

➢ Dimensions:

• Country

• Agent

• Course

• Year

Transformation Process

Case Study #2

Case Study #2: Sales

▪ Suppose that we would like to analyze

Total Sales from various point of

views, such as Quarter, Branch, and

Product Category.

Case Study #2: Sales

▪ Sales Star Schema

➢ Fact:

• Total Sales

➢ Dimensions:

• Time

• Branch

• Product Category

Two Column Table
Methodology

Two Column Table Methodology

When creating a star schema, you need to

imagine that the data you want to analyse

consists of two columns.

The first column is the category (e.g. A, C,

D, E), and the second column is the

statistical numerical figure (e.g. B).

The second column (e.g. B) has to be

consistent throughout all the two-column

tables.

One Fact Measurement:

Two Column Table Methodology

▪ Case Study 1: Analysis of Accountants

Suppose the CPA organization would like to analyze its members (i.e.

accountants) in a particular city. Assume that the organization has the full

details of its members.

Education Number of Accountants

Diploma 84953

Bachelor 349203

Higher Degree 98943

Others 2322

Two Column Table Methodology

▪ We can also look at the figures from the gender point of view, like:

▪ Another way to analyze number of accountants is form the type of the accountant job

itself; something like:

▪ Note that the figures are fictitious, and the “Types” of Accountants (indicating different

roles of accountants) are also fictitious.

Gender Number of Accountants

Male 434322

Female 89932

Type Number of Accountants

Government 3843

Private Business 45303

Personal 45930

etc

Two Column Table Methodology

▪ You can further identify other example to analyze number of accountants. In

the above three tables, the first angle to look at the number of accountants

is from the educational background, the last one is from the type of the

accountant itself, whether it is a private business accountant, etc.

▪ As you can see, the second column is CONSISTENTLY UNIFORM. In the

above example, it is number of accountants. The first column changes

depending on from which angle that you want to see.

Two Column Table Methodology

▪ Therefore, in this case study, the star schema could look like the following:

Two Column Table Methodology

The second column in the two-column

tables, which is the numerical fact

measurement (e.g. column B) can

actually be multiple columns (call them:

B1, B2, B3), as long as all of these

columns (e.g. B1, B2, B3) relate to all of

the categories (e.g. A, C, D, E).

Multiple Fact Measurements:

Two Column Table Methodology

▪ Case Study 2: Student Enrollment

The University Administrator(s) needs to keep track of the number of

enrollment for particular unit or campus and the students’ performance

each year in order to maintain the University performance. The head of

admin has assigned you the task of developing a small Data Warehouse in

which to keep track the enrollment and performance statistics.

Two Column Table Methodology

▪ For example:

▪ Another example could be something like this:

Subject Number of Students Total Score

Database 8 539

Java 5 327

SAP 1 63

Network 2 105

Semester Number of Students Total Score

One 9 618

Two 7 416

F1 F2

Two Column Table Methodology

▪ In analyzing number of students (apart from the subject and semester as

shown above), you could also see the number of student from another

angle, for example from the campus and grade:

Campus Number of Students Total Score

Main 9 658

City 5 271

DE 2 105

F1 F2

Two Column Table Methodology

▪ For example:

Grade Number of Students Total Score

HD 3 253

D 4 300

C 4 256

P 2 105

N 3 120

F1 F2

Two Column Table Methodology

▪ The first columns of the above examples are the dimensions, whereas

the other columns that contain the statistical/summarized/aggregated

values is the fact.

▪ In the above example, the fact is then STUDENT_ENROLLMENT_FACT,

and the dimensions are SUBJECT, SEMESTER, GRADE and CAMPUS.

Two Column Table Methodology

▪ The star schema for the STUDENT ENROLLMENT is shown as follows:

End of Week 2 Lecture

Oracle/FIT5195-3-Bridge Table

Week 3 – Bridge Tables

Semester 1, 2020

FIT5195 – Business Intelligence
and Data Warehousing

Developed by:
Agnes Haryanto

Agnes.Haryanto@monash.edu

Soon Lay-Ki

Soon.LayKi@monash.edu

MONASH
INFORMATION

TECHNOLOGY

Agenda

1. Bridge Tables

2. Temporary Tables

1. Temporary Dimension Tables

2. Temporary Tables in the Operational Database

Using FLUX

1. Visit http://flux.qa/ on your internet enabled device

2. Log in using your Monash account (not required if
you are already logged in to Monash)

3. Click on the “+” to join audience

4. Enter the Audience Code:

• Caulfield – 3GANT7

• Fully Flex – 39WRG8

• Malaysia – VTVPLW

5. Select FIT5195 in the Active Presentation menu

6. Answer questions when they pop up

http://flux.qa/1AW6N8

Bridge Tables

▪ A bridge table is a table that links between two dimensions; and only one

of these two dimensions are linked to the fact.

➢ As a result, the star schema becomes a snowflake schema.

Bridge Tables

▪ Two reasons on why a dimension cannot be connected directly to the Fact:
▪

a) The Fact table has a fact measure, and the dimension has a key identity. In

order to connect a dimension to the Fact, the dimension’s key identity must

contribute directly to the calculation of the fact measure. Unfortunately, this

cannot happen if the operational database does not have this data.

b) The operational database does not have this data if the relationship between

two entities in the operational database that hold the information about

dimension’s key identity and the intended fact measure is a many-many

relationship.

Bridge Tables

Case Study #1

Case Study #1 – A Product Sales Case Study

▪ A company management team would like to

analyze the statistics of its product sales

history. The analysis is needed to identify

popular products, suppliers supplying those

products, the best time to purchase more

stock, etc.

▪ A small data warehouse is to be built to keep

track of the statistics.

▪ The management is particularly interested in

analyzing the total sales (quantity * price) by

product, customer suburbs, sales time

periods (month and year), and supplier.

Case Study #1 – A Product Sales Case Study

▪ The management is particularly interested in

analyzing the total sales (quantity * price) by

product, customer suburbs, sales time

periods (month and year), and supplier.

Case Study #1 – A Product Sales Case Study

▪ The management is particularly interested in

analyzing the total sales (quantity * price) by

product, customer suburbs, sales time

periods (month and year), and supplier.

▪ Sales Star Schema

➢ Fact:

• Total Sales

➢ Dimensions:

• Product

• Customer locations/suburbs

• Time period

• Supplier

▪ Possible Two-Column Methodology Tables:

Case Study #1 – A Product Sales Case Study

ProductNo TotalSales

A1 $130,000

B2 $15,900

C3 $2,500,000

… …

TimeID TotalSales

201801 $25,000

201802 $4,700

201803 $3,500

… …

Suburb TotalSales

Caulfield $6,500

Chadstone $12,000

Clayton $1,800

… …

(a) Product point of view (b) Time point of view (c) Suburb point of view

Case Study #1 – A Product Sales Case Study

▪ Sales Star Schema

➢ Fact:

• Total Sales

➢ Dimensions:

• Product

• Customer locations/suburbs

• Time period

• Supplier

Case Study #1 – A Product Sales Case Study

SupplierID TotalSales

S1 $77,000

S2 $5,700

S3 $12,500

… …

Supplier point of view

Case Study #1 – A Product Sales Case Study

SupplierID TotalSales

S1 $77,000

S2 $5,700

S3 $12,500

… …

Supplier point of view

Case Study #1 – A Product Sales Case Study

SupplierID TotalSales

S1 $77,000

S2 $5,700

S3 $12,500

… …

Case Study #1 – A Product Sales Case Study

Bridge Table

▪ To create Time Dimension:
– create table TimeDim as

select

distinct to_char(SalesDate, ’YYYYMM’) as TimeID,

to_char(SalesDate, ’YYYY’) as Year,

to_char(SalesDate, ’MM’) as Month

from Sales;

▪ To create Customer Location Dimension:
– create table CustLocDim as

select distinct Suburb, Postcode

from Customer;

Case Study #1 – A Product Sales Case Study

▪ To create Product Dimension:
– create table ProductDim as

select distinct ProductNo, ProductName

from Product;

▪ To create Bridge Table:
– create table ProductSupplierBridge as

select *

from StockSupplier;

▪ To create Supplier Dimension:
– create table SupplierDim as

select SupplierID, Name as SupplierName

from Supplier;

Case Study #1 – A Product Sales Case Study

▪ To create Fact Table:
– create table ProductSalesFact as

Select

to_char(S.SalesDate, ’YYYYMM’) as TimeID,

P.ProductNo,

C.Suburb,

sum(SI.QtySold*P.Price) as TotalSales

from Sales S, Product P, Customer C, SalesItem SI

where S.SalesNo = SI.SalesNo

and SI.ProductNo= P.ProductNo

and C.CustomerID = S.CustomerID

group by

to_char(S.SalesDate, ’YYYYMM’), P.ProductNo, C.Suburb;

Case Study #1 – A Product Sales Case Study

Bridge Tables

Case Study #2

Case Study #2 – A Truck Delivery Case Study

▪ A trucking company is responsible for picking up goods from warehouses of a retail chain

company, and delivering the goods to individual retail stores.

▪ A truck carry goods during a single trip, which is

identified by TripID, and delivers these goods to

multiple stores. Trucks have different capacities

for both the volumes they can hold and the

weights they can carry.

▪ At the moment, a truck makes several trips each

week. An operational database is being used to

keep track the deliveries, including the scheduling

of trucks, which provide timely deliveries to

stores.

Case Study #2 – A Truck Delivery Case Study

▪ A trip may pick up goods from many

warehouses

o i.e. a many-many relationship between

Warehouse and Trip

▪ A trip uses one truck only, and a truck may

have many trips in the history

o i.e. a many-1 relationship between Trip and

Truck

▪ A trip delivers goods (e.g. TVs, fridges, etc)

potentially to several stores

o a many-many relationship between Trip and

Store, which is represented by the Destination

table

▪ Sample data in the operational database:

Case Study #2 – A Truck Delivery Case Study

WarehouseID Location

W1 Warehouse1

W2 Warehouse1

W3 Warehouse1

… …

(d) Truck Table

(b) Trip Table (c) TripFrom Table

TripID Date TotalKm TruckID

Trip1 14-Apr-2018 370 Truck1

Trip2 14-Apr-2018 570 Truck2

Trip3 14-Apr-2018 250 Truck3

Trip4 15-Jul-2018 450 Truck1

… … … …

TripID WarehouseID

Trip1 W1

Trip1 W2

Trip1 W3

Trip2 W1

Trip2 W2

… …

TruckID VolCapacity WeightCategory CostPerKm

Truck1 250 Medium $1.20

Truck2 300 Medium $1.50

Truck3 100 Small $0.80

Truck4 550 Large $2.30

Truck5 650 Large $2.50

… … … …

StoreID StoreName Address

M1 MyStore City Melbourne

M2 MyStore Chaddy Chadstone

M3 MyStore HiPoint High Point

M4 MyStore Donc Doncaster

M5 MyStore North Northland

M6 MyStore South Southland

M7 MyStore East Eastland

M8 MyStore Knox Knox

… …

TripID StoreID

Trip1 M1

Trip1 M2

Trip1 M4

Trip1 M3

Trip1 M8

Trip2 M4

Trip2 M1

Trip2 M2

… …

(a) Warehouse Table

(e) Store Table (f) Destination Table

Case Study #2 – A Truck Delivery Case Study

▪ The management of this trucking company would like to analyze the deliver cost, based on

trucks, time period, and store.

Case Study #2 – A Truck Delivery Case Study

▪ Sales Star Schema

➢ Fact:

• Total Delivery Cost

(distance * cost per kilometre)

➢ Dimensions:

• Truck

• Time period

• Store

Case Study #2 – A Truck Delivery Case Study

▪ From the Truck point of view, Truck1 has two trips (e.g. Trip1 and Trip4), with the total kilometres of

820km (370km + 450km). The cost for Truck1 is $1.20. Hence, calculating the cost for Truck1 is

straightforward. Other trucks can be calculated this way.

▪ From the Period point of view, 14-Apr-2018 has three trips (e.g. Trip1,Trip2, and Trip3). Trip1 (370km) is

delivered by Truck1 which costs $1.20/km. Trip2 and Trip 3, on the same day, can be calculated the same

way. Hence, on 14-Apr-2018, the total cost can be calculated.

▪ From the Store point of view; The cost is calculated

based on Trip, but a trip delivers goods to many

stores. Therefore, the delivery cost for each store

cannot be calculated. The delivery cost is for the trip –

not for the store.

Case Study #2 – A Truck Delivery Case Study

Solution Model 1 – Using a Bridge Table

Case Study #2 – A Truck Delivery Case Study

Solution Model 2 – add a Weight Factor attribute

A weight factor is only needed if we want to estimate

the contribution that a dimension made to the fact

Case Study #2 – A Truck Delivery Case Study

StoreID StoreName Address

M1 MyStore City Melbourne

M2 MyStore Chaddy Chadstone

M3 MyStore HiPoint High Point

M4 MyStore Donc Doncaster

M5 MyStore North Northland

M6 MyStore South Southland

M7 MyStore East Eastland

M8 MyStore Knox Knox

… …

TripID StoreID

Trip1 M1

Trip1 M2

Trip1 M4

Trip1 M3

Trip1 M8

Trip2 M4

Trip2 M1

Trip2 M2

… …

(a) Trip Dimension Table

▪ To create Trip Dimension:
– create table TripDim2 as

select T.TripID, T.TripDate, T.TotalKm,

1.0/count(*) as WeightFactor

from Trip T, Destination D

where T.TripID = D.TripID

group by T.tripid, T.tripdate, T.totalkm;

Case Study #2 – A Truck Delivery Case Study

Solution Model 3 – a List Aggregate version

Case Study #2 – A Truck Delivery Case Study

TripID StoreID

Trip1 M1

Trip1 M2

Trip1 M4

Trip1 M3

Trip1 M8

Trip2 M4

Trip2 M1

Trip2 M2

… …

(b) Bridge Table

(a) Trip Dimension Table

listagg (Attr1, ‘_’) within group

(order by Attr1) as ColumnName

▪ To create Trip Dimension:
– create table TripDim3 as

select T.TripID, T.TripDate, T.TotalKm,

1.0/count(D.StoreID) as WeightFactor,

listagg (D.StoreID, ’_’) within group

(order by D.StoreID) as StoreGroupList

from Trip T, Destination D

where T.TripID = D.TripID

group by T.TripID, T.TripDate, T.TotalKm;

Case Study #2 – A Truck Delivery Case Study

▪ Joining based on the StoreGroupList attribute in the Trip dimension table
and the StoreID in the Store dimension table:

– select *

from TripDim3 T, StoreDim3 S

where T.StoreGroupList like ’%’||S.StoreID||’%’;

▪ Without the StoreGroupList attribute in the Trip dimension, we need to join
three tables:

– select *

from TripDim3 T, BridgeTable3 B, StoreDim3 S

where T.TripID = B.TripID

and B.StoreID = S.StoreID;

Case Study #2 – A Truck Delivery Case Study

Bridge Tables

Summary

Bridge Tables – Summary

▪ In principal, a Bridge Table is used:
▪

a) When it is impossible to have a dimension connected directed to the Fact table,

because simply there is no relationship between this dimension and the Fact table

(e.g. in the Product Sales case study, it is impossible to have a direct link from

SupplierDim to ProductSalesFact)

b) When an entity (which will become a dimension) has a many-many relationship with

another entity (dimension) in the E/R schema of the operational database (e.g.

Supplier and Stock has a many-many relationship).

c) When temporality aspect (data history) is maintained in the operational database

and the bridge table can be used to accommodate the dimension that has temporal

attributes (e.g. product supply history is maintained in the second snowflake schema

example).

Bridge Tables – Summary

▪ When a Bridge Table is used in the schema, there are two additional options:
▪

a) A Weight Factor is used to estimate the contribution of a dimension in the calculation

of the fact measure. Because this is only an estimate, a weight factor is option.

b) Every snowflake schema (whether it has Weight Factor or not) can be implemented

in two ways: a List Aggregate version, and a non-List Aggregate version.

Temporary Dimension Tables

1. Direct copying from a relational table in the operational database

2. Selecting certain attributes (or records) from a relational table in the

operational database

3. Creating the dimension table from scratch

– Insert records into the newly created dimension table

– Suitable for small dimension table

– Temporary dimension table is needed in circumstances where a new

dimension table cannot be created directly

Ways to Create Dimension Tables

Temporary Dimension Tables – Sales Case Study

An E/R Diagram for the

Sales Case Study

A Star Schema for the Sales Case Study

QuarterID – ‘YYYYQ’ is a unique

combination between year and quarter,

where Q is between 1 and 4

▪ TimeDIM has to be created manually

▪ Questions:

Can we insert the records into TimeDIM

one by one?

▪ If yes, how many records to be

inserted?

▪ If no, why?

Temporary Dimension Tables – Sales Case Study (cont.)

Solution for TimeDIM:

1. Create a temporary Time Dimension table that contains Month and Year

2. Add with a new attribute, called QuarterID.

3. Perform a series of updates to fill in the QuarterID column with the correct

values

4. Create the final TimeDIM table

Temporary Dimension Tables – Sales Case Study (cont.)

Exercise:

Construct the SQL command for Step 4

Temporary fact table is required

because TimeDIM is not directly copied

from the operational database

Temporary Fact Tables – Sales Case Study

Temporary Fact Tables – Sales Case Study (cont.)

2
3

Temporary Tables in Operational Database

Total_Sales = SUM(TotalPrice)

An E/R Diagram for the

Sales Case Study

A Star Schema for the Sales Case Study

Temporary Tables in Operational Database – Case Study

An E/R Diagram for the

Sessional Contract Jobs

Temporary Tables in Operational Database – Case Study (cont.)

A Star Schema for the

Sessional Contract Jobs

Last Degree of the

Employee

▪ One employee may have multiple

degrees.

▪ Most of the work requires a Bachelor

degree, to allow current Master

students to work as a teaching

assistant or as a programmer

▪ Current PhD students who have a

Master degree often work as a

teaching assistant too

▪ BUT, only the last degree is required

in DigreeDIM table.

Temporary Tables in Operational Database – Case Study (cont.)

Temporary Tables – EmployeeTemp Table

Retrieves the last degree of the

employees

Sample Records for Employee and Emp_Degree Table

Employee

Emp_Degree

RANK() OVER Function

Retrieved for

EmployeeTemp

table.

Create Fact Table from EmployeeTemp Table

Temporary Dimension table

Needed when it is not possible to create a dimension table directly; need to

apply a series of additional steps when creating a dimension table through the

creation of an intermediate table

Temporary table in operational database

Tables in the operational database are not readily used to create the Fact table.

Need transformation phase, where an intermediate table, called the Temporary

Operational Database Table is created.

Summary

Oracle/FIT5195-4-Snowflake Schema

Week 4 – Snowflake Schema

Semester 1, 2020

FIT5195 – Business Intelligence
and Data Warehousing

Developed by:
Agnes Haryanto

Agnes.Haryanto@monash.edu

Soon Lay-Ki

Soon.LayKi@monash.edu

MONASH
INFORMATION

TECHNOLOGY

Agenda

1. Hierarchies

1. Hierarchy vs. Non-Hierarchy

2. Hierarchy versus Multiple Independent Dimensions

3. Linked Dimensions

4. Hierarchy Design Considerations

2. Determinant Dimensions

1. Determinant vs. Non-Determinant Dimensions

2. Determinant Dimension vs. Pivoted Fact Table

3. Non-Type as a Determinant Dimension

4. Multiple Relationship between a Dimension and the Fact

Using FLUX

1. Visit http://flux.qa/ on your internet enabled device

2. Log in using your Monash account (not required if
you are already logged in to Monash)

3. Click on the “+” to join audience

4. Enter the Audience Code:

• Caulfield – 3GANT7

• Fully Flex – 39WRG8

• Malaysia – VTVPLW

5. Select FIT5195 in the Active Presentation menu

6. Answer questions when they pop up

http://flux.qa/1AW6N8

Hierarchies

▪ A Hierarchy is formed when a dimension is broken down to two or

more dimensions in a hierarchical manner.

➢ As a result, the star schema becomes a snowflake schema.

(a) A simple Star Schema

(b) A Snowflake with Hierarchy

Hierarchies

(a) A Snowflake with a Bridge Table (b) A Snowflake with Hierarchy

Hierarchies

1. Hierarchy versus Non-Hierarchy Dimensions

2. Hierarchy versus Multiple Independent Dimensions

3. Linked Dimensions

4. Hierarchy Design Considerations

1. Hierarchy vs. Non-Hierarchy

HierarchyNon-Hierarchy

1. Hierarchy vs. Non-Hierarchy

CityID City Name State CountryID

MEL Melbourne Victoria AU

SYD Sydney New South Wales AU

KUL Kuala Lumpur Selangor MA

JNB Johannesburg Johannesburg SA

CampusID CampusName Address Suburb Postcode CityID

CL Clayton Campus Wellington Rd Clayton 3800MEL

CA Caulfield Campus Dandenong Rd Caulfield East 3145MEL

PA Parkville Campus Royal Parade Parkville 3052MEL

SY Sydney Campus Opera Boulevard Sydney 2001SYD

MUM Malaysia Campus Jalan Lagoon Bandar Sunway 47500KUL

MSA South Africa Campus Peter St Johannesburg 1725JNB

(a) Campus Dimension Table

CountryID Country Name

AU Australia

MA Malaysia

SA South Africa

(b) City Dimension Table
(c) Country Dimension Table

1. Hierarchy vs. Non-Hierarchy

(a) Correct Hierarchy (b) A Wrong Hierarchy

1. Hierarchy vs. Non-Hierarchy

1. One table vs. many tables

The consequence is in the join query processing when producing reports.

2. Normalized vs. Un-normalized

With the hierarchy option, the tables are normalized, which follows the relational

model. In contrast, the non-hierarchy option, the table (e.g. the Campus

Dimension table) is unnormalized, which is basically 1NF with visible replication

on information, which is prone to anomalies (e.g. insert, update, and delete

anomalies).

3. Drilling down and rolling up

The hierarchy model is not about drilling down information exploration.

1. Hierarchy vs. Non-Hierarchy – Summary

▪ From the query point of view, both versions need two queries.

▪ From the query processing point of view, the non-hierarchy version uses

one join operation only, because it only needs to join the fact and one

dimension.

▪ From the conceptual point of view, the hierarchy model does not actually

offer a better roll up or drill down features.

2. Hierarchy vs. Multiple Independent Dimensions

(a) Separate Dimensions (b) Combined Dimensions

2.1. Separate vs Combined Dimension

(a) Separate Dimensions (b) Combined Dimensions

2.1. Separate vs Combined Dimension

(a) Separate Dimension Model – Campus Dimension Table

(b) Separate Dimension Model –

Country Dimension Table

2.1. Separate vs Combined Dimension

(d) Combined Dimension Model – Campus Dimension Table

(e) Combined Dimension Model

– Fact Table

2.1. Separate vs Combined Dimension
(c) Separate Dimension Model – Fact Table (e) Combined Dimension Model – Fact Table

2.1. Separate vs Combined Dimension

(a) Separate Dimension Model – Campus Dimension Table (b) Separate Dimension Model –

Country Dimension Table

(d) Combined Dimension Model – Campus Dimension Table

2.1. Separate vs Combined Dimension

(a) Separate Dimensions (b) Combined Dimensions

2.2. Combined Dimension vs. Hierarchy

(b) Combined Dimensions (c) Hierarchy Dimensions

2.2. Combined Dimension vs. Hierarchy

(a) Hierarchy Model – Campus Dimension Table

(b) Hierarchy Model – Country Dimension Table

2.2. Combined Dimension vs. Hierarchy

▪ Reasons on why the Hierarchy model is not ideal:

➢ An efficient query processing to query the fact and dimensions,

➢ The identifier of the Campus Dimension (e.g. CampusID) already covers the

child dimension (e.g. Country Dimension), and

➢ Campus and Country information is often regarded as one entity; at least in this

case study.

(b) Combined Dimensions (c) Hierarchy Dimensions

3. Linked Dimensions

(a) Linked Dimension Model – Campus Dimension Table

(b) Linked Dimension Model – Country Dimension Table

4. Hierarchy Design Considerations

Hierarchy – Summary

▪ A dimension hierarchy is connecting two or more dimensions in a hierarchical

manner, using a many-1 relationship.

➢ As a result, the dimensions in a hierarchy are normalized, in 3NF, using the

context of Relational Database Design.

▪ Compare and contrasts five different models:

1. Separate Dimension model

2. Combined Dimension model

3. Hierarchy model

4. Linked Dimension model

5. Hierarchy Design Considerations

Determinant Dimensions

Dimensions and Fact Tables – Recap

A Star Schema for the Sales Case Study

Example #1:

Querying Fact Table

Example #2:

Querying Fact Table (cont.)

A star schema may have a special dimension

all data retrieval from this star schema must use this dimension

else, the retrieved data will become meaningless.

Determinant Dimensions

Determinant

_DIM

Determinant_

Att

Special dimension

Key identifier

Introducing a Determinant Dimension – Petrol Station Case Study

Sample operational database of a company about petrol prices from all petrols

stations in Victoria:

The requirements for the data warehouse

are to answer the questions related to

(a) average petrol price,

(b) minimum petrol price, and

Petrol Station – Star Schema Version-1

Chosen Dimensions:

(i) day of week,

(ii) suburb, and

(iii) petrol company.

Two-column Table for Petrol Prices based on Day of Week

Category Fact Measures

Minimum price for which petrol type?

Petrol Station – Star Schema Version-1 (cont.)

▪ Not all petrol stations sell all types of petrol, e.g. Premium 98.

▪ Calculating the average price for Premium 98 on Monday cannot be:

Total_Petrol_Price_P98 / Number_of_Petrol_Station

▪ The number of petrol stations is different for different petrol type!

Petrol Station – Star Schema Version-1 (cont.)

▪ Solution: Split the Number of Petrol Station column into six columns – one

for each petrol type.

▪ Question: How many columns in total?

Petrol Station – Star Schema Version-1 (cont.)

Two-column

Table for Petrol

Prices based

on Suburb
Two-column Table for Petrol Prices based on Company

Petrol Station – Star Schema Version-1 (cont.)

Petrol Station – Star Schema Version-2

Min. price for

which petrol

type??

select DayofWeek,

Min_Petrol_Price

from PetrolFact

where DayofWeek = ’Monday’

and PetrolType = ’Unleaded’;

Petrol Station – Star Schema Version-2

select DayofWeek, PetrolType,

Min_Petrol_Price

from PetrolFact

where DayofWeek = ’Monday’;

Petrol Station – Star Schema Version-2

Determinant Dimension uses dotted box in the star schema.

The new star schema with a Determinant Dimension for Petrol Case Study

Determinant vs. Non-

Determinant

Dimensions

The Olympic Games Case Study

▪ The Olympic Games committee maintains an operational database that

stores all matches, games, as well as the medal winners of the Olympic

Games over the years

▪ Let’s build a data warehouse to analyse the medal counts, by each country,

sport, and at which Olympic Games

Determinant vs. Non-Determinant Dimensions

The Olympic Games Case Study – The Star Schemas

Version-1

Version-2

The questions are:

1. What is the difference

between these two

versions of star schema?

2. Is Medal Type Dimension

a Determinant

Dimension?

The Olympic Games Case Study – The Star Schema Version-1

Two-column tables

(pivot tables) for each

dimension category in

star schema version-1

The Olympic Games Case Study – The Star Schema Version-2

▪ Is MedalTypeDIM a determinant dimension?

– To retrieve the data from version-2 star schema, must we have the

information from Medal Type Dimension?

Determinant vs. Non-Determinant Dimensions

Version-2

COUNT = total medals from

other dimensions, without the
medal type dimension)

Fact Table for Version-1

The Olympic Games Case Study – The Star Schema Version-1 vs Version-2

Fact Table for Version-2

The Olympic Games Case Study – The Star Schema Version-1 vs Version-2

Differenc

Version 1 Version 2 (with

MedalTypeDIM)

Storage Lower with only three records Nine records

Modelling The schema looks more

complex and crowded; but the

storage cost is lower

More concise and compact with

less number of measures in the

fact, easier to understand

Query

Processin

Less join processing between

dimension tables and the fact

table.

Requires additional join with

MedalTypeDIM

Determinant

Dimensions vs.

Pivoted Fact Table

The PTE Academic Test Case Study
▪ PTE Test consists of four main components: Listening, Reading, Speaking, and

Writing

▪ There are many test venues (all major cities in Australia, and overseas as well), each

country may have a different price setting

▪ A student taking a PTE test will have one score for each of these four components,

as well as one overall score

▪ The score is a numerical score, ranging from 10 to 90

▪ The results will normally be released five days after the actual test date

▪ A student may take a PTE Academic Test several times, in order to improve the

score

▪ A student is identified by his/her RegistrationID, which stays the same student stays

the same

Determinant Dimensions vs. Pivoted Fact Table

ERD for PTE Academic Test

The PTE Academic Test Case Study – Sample Records for 8 Students

The PTE Academic Test Case Study – Star Schema with Determinant Dim.

To implement the star schema in SQL:

create table CountryVenueDim

select distinct C.CountryCode, C.CountryName, T.TestPrice

from Test_Venue T, Country C

where T.CountryCode = C.CountryCode;

create table CitizenshipDim

select distinct C.CountryCode as Citizenship, C.CountryName

from Student S, Country C

where S.Citizenship = C.CountryCode;

create table YearDim

select distinct to_char(TestDate, ’YYYY’) as Year from Test;

The PTE Academic Test – Star Schema with Determinant Dim. (cont.)

create table GradeDim (

Grade varchar2(3),

Description varchar2(20),

MinScore number,

MaxScore number

);

insert into GradeDim values (’4.5’, ’Functional’, 30, 35);

insert into GradeDim values (’5’, ’Vocational’, 36, 49);

insert into GradeDim values (’6’, ’Competent’, 50, 64);

insert into GradeDim values (’7’, ’Proficient’, 65, 78);

insert into GradeDim values (’8-9’, ’Superior’, 79, 90);

The PTE Academic Test – Star Schema with Determinant Dim. (cont.)

Superior

create table TestComponentDim (

TestComponent varchar2(20)

);

insert into TestComponentDim values

(’Listening’);

insert into TestComponentDim values

(’Reading’);

insert into TestComponentDim values

(’Writing’);

insert into TestComponentDim values

(’Speaking’);

insert into TestComponentDim values

(’Overall’);

The PTE Academic Test – Star Schema with Determinant Dim. (cont.)

Create temporary fact table:

create table TempFact

select TV.CountryCode, S.Citizenship,

to_char(T.TestDate, ’YYYY’) as Year,

TR.ListeningScore, TR.ReadingScore,

TR.WritingScore, TR.SpeakingScore,

TR.OverallScore, TR.RegistrationID

from Test_Venue TV, Test T, Student S,

Test_Result TR

where TV.VenueID = T.VenueID

and T.TestNo = TR.TestNo

and TR.RegistrationID = S.RegistrationID;

alter table TempFact

add (

GradeOverall varchar2(3),

GradeListening varchar2(3),

GradeReading varchar2(3),

GradeWriting varchar2(3),

GradeSpeaking varchar2(3)

);

The PTE Academic Test – Star Schema with Determinant Dim. (cont.)

update TempFact

set GradeOverall = (

case

when OverallScore >= 30 and

OverallScore <= 35 then ’4.5’ when OverallScore >= 36 and

OverallScore <= 49 then ’5’ when OverallScore >= 50 and

OverallScore <= 64 then ’6’ when OverallScore >= 65 and

OverallScore <= 78 then ’7’ when OverallScore >= 79 and

OverallScore <= 90 then ’8-9’ end); Repeat for GradeListening, GradeReading, GradeWriting, and GradeSpeaking Superior 63 ▪ The TempFact table has the correct grades for each student. ▪ The problem now is to breakdown the grades for a student into multiple test components – Create another temporary fact for each of the test components, eg. a temporary fact called ’OverallFact’ for the Overall Score, and so on so forth create table OverallFact as select CountryCode, Citizenship, Year, GradeOverall As Grade, ’Overall’ as TestComponent, count(RegistrationID) as Total_Students_Overall from TempFact group by CountryCode, Citizenship, Year, GradeOverall, ’Overall’; The PTE Academic Test - Star Schema with Determinant Dim. (cont.) Repeat for GradeListening, GradeReading, GradeWriting, and GradeSpeaking 64 ▪ There were eleven students taking the test which are now grouped into nine records in Table OverallFact The PTE Academic Test - Star Schema with Determinant Dim. (cont.) 65 ▪ There are also three other fact tables: ReadingFact, WritingFact, and SpeakingFact, with identical table structures ▪ These five fact tables can now be ’combined’ to form one final fact table using a union operator: create table FinalFact as select CountryCode, Citizenship, Year, Grade, TestComponent, Total_Students_Overall as Total_Students from OverallFact union select * from ListeningFact union select * from ReadingFact union select * from WritingFact union select * from SpeakingFact; The PTE Academic Test - Star Schema with Determinant Dim. (cont.) 66 ▪ The final fact table contains 45 records ▪ Conclusion: – When the input record in the operational database contains the scores of all the test components as one record, it will be challenging to break down the records into multiple records in the final fact table, because there is a Test Component dimension. The PTE Academic Test - Star Schema with Determinant Dim. (cont.) 67 The PTE Academic Test - Star Schema without Determinant Dim. ● Fact measure for all the test components ● This means that the fact table is a Pivoted Fact Table ○ Shifts the Test Component into the fact measure; ○ The Determinant Dimension - TestComponentDim is then removed from the star schema 68 ▪ How many records do we have in fact table with five fact measures? ▪ However, since our test result data is incomplete, we will have problems with the final fact, because in the non-determinant dimension version, we need to keep track the zeroes – Even if the data is the test result is a lot, but still does not cover all possible combinations from the dimension! The PTE Academic Test - Star Schema without Determinant Dim. (cont.) Answer: 60 records in the final fact table = 2 countries of test venues * 6 countries of citizenship * 1 year * 5 grades 69 ▪ ‘Superior’ grade is only available for Listening and Reading scores ▪ Using the normal fact table, there will not be entry in the fact table for Superior-Speaking, and for Superior-Writing; ▪ There will not be zero value in TotalStudents, where the Grade column is ’Superior’ and the Test Component column is either ’Speaking’ or ’Writing’ → less than 60 rows of records in the fact table The PTE Academic Test - Star Schema without Determinant Dim. (cont.) Superior 70 Solution: Step1 Get all possible combinations from all dimensions using Cartesian product between all dimensions: create table AllDimensions as select CO.CountryCode, CI.Citizenship, Y.Year, G.Grade from CountryVenueDim CO,CitizenshipDim CI, YearDim Y, GradeDim G; The PTE Academic Test - Star Schema without Determinant Dim. (cont.) 71 Solution: Step 2 Re-create five temporary fact tables, one for each test component, using Outer Join operation between AllDimensions table and each of the temporary fact tables created in the previous section (see slide #35). The PTE Academic Test - Star Schema without Determinant Dim. (cont.) create table OverallFactNew as select A.CountryCode, A.Citizenship, A.Year, A.Grade, nvl(O.Total_Students_Overall, 0) as Total_Students_Overall from AllDimensions A, OverallFact O where A.CountryCode = O.CountryCode(+) and A.Citizenship = O.Citizenship(+) and A.Year = O.Year(+) and A.Grade = O.Grade(+); create table ListeningFactNew as select A.CountryCode, A.Citizenship, A.Year, A.Grade, nvl(O.Total_Students_Listening, 0) as Total_Students_Listening from AllDimensions A, ListeningFact O where A.CountryCode = O.CountryCode(+) and A.Citizenship = O.Citizenship(+) and A.Year = O.Year(+) and A.Grade = O.Grade(+); create table ReadingFactNew as select ...; create table WritingFactNew as select ...; create table SpeakingFactNew as select ...; 72 Solution: Step 3 Create the final fact table using the join operation The PTE Academic Test - Star Schema without Determinant Dim. (cont.) create table FinalFact2 as select O.CountryCode, O.Citizenship, O.Year, O.Grade, O.Total_Students_Overall, L.Total_Students_Listening, R.Total_Students_Reading, W.Total_Students_Writing, S.Total_Students_Speaking from OverallFactNew O, ListeningFactNew L, ReadingFactNew R, WritingFactNew W, SpeakingFactNew S where O.CountryCode = L.CountryCode and L.CountryCode = R.CountryCode and R.CountryCode = W.CountryCode and W.CountryCode = S.CountryCode and O.Citizenship = L.Citizenship and L.Citizenship = R.Citizenship and R.Citizenship = W.Citizenship and W.Citizenship = S.Citizenship and O.Year = L.Year and L.Year = R.Year and R.Year = W.Year and W.Year = S.Year and O.Grade = L.Grade and L.Grade = R.Grade and R.Grade = W.Grade and W.Grade = S.Grade; 73 Solution: Step 4 Because our sample data is small, there are many records with zeroes values in the Total Students columns. Delete records where Total Students in all of the five test components are equal to zero. The PTE Academic Test - Star Schema without Determinant Dim. (cont.) delete from FinalFact2 where Total_Students_Overall = 0 and Total_Students_Listening = 0 and Total_Students_Reading = 0 and Total_Students_Writing = 0 and Total_Students_Speaking = 0; Non-Type as a Determinant Dimension 75 ▪ Type Dimension maybe a candidate for a Determinant Dimension: – PetroTypeDim is a Determinant Dimension in the Petrol Price case study, – MedalTypeDim in the Olympic Games case study is a not a Determinant Dimension ▪ Type Dimension is a good candidate for a Determinant Dimension, but needs further examination to check if it is really a Determinant Dimension ▪ Other dimensions - non-type dimension can potentially be a Determinant Dimension Non-Type as a Determinant Dimension 76 ▪ The Class Type Dimension in this example is not a Determinant Dimension because we could get Number of Students without referring to any of the class types. ▪ E.g. retrieving number of students in the Science Faculty last year is certainly a useful information, even without Class Type information. Non-Type as a Determinant Dimension – University Enrolment Case Study Type Dimension 77 Total student of Science Faculty Non-Type as a Determinant Dimension – University Enrolment Case Study Total student of Science Faculty last year vs. The information is dependent on the YearDIM - determinant dimension 78 Converting a Determinant Dimension to a Pivoted Fact Table Non-Type as a Determinant Dimension – University Enrolment Case Study 79 Question 1: What if the number of records in the Determinant Dimension is huge? – Number of fact measures increased to 20, 50? – May not be practical Solution: Enforce Determinant Dimension through the User-Interface level ▪ Users are forced to enter/choose a criteria from the Determinant Dimension. ▪ In filling a web-based form, this enforcement is normally indicated by a star (*) next to the text entry to indicate that this text entry must be filled in, before the form can proceed. Pivoted Fact Table for University Enrolment Case Study 80 Question 2: What if there are a lot of attributes in the Determinant Dimension? – If we shift the Determinant Dimension to the fact measures, only the key identifier is kept in the fact measure. – All other attributes from the Determinant Dimension will be lost from the star schema. This issue will be discussed in the next section. Pivoted Fact Table for University Enrolment Case Study (cont.) Multiple Relationship between a Dimension and the Fact 82 ▪ Private Taxi Case Study – The dimension (or a Determinant Dimension) to be shifted to the fact measure (e.g. Pivoted Fact Table) has many attributes. – If only the key identifier of the dimension (or the Determinant Dimension) is used as the fact measure, all other attributes of the dimension (or the Determinant Dimension) are no longer exists in the star schema Multiple Relationship between a Dimension and the Fact 83 ▪ Assume that – The business rules stated that there is no Determinant Dimension. Even without a Determinant Dimension, we could potentially shift a dimension to become fact measures – The company is a small private taxi company, has only five cars ▪ In the fact table, instead of having three fact measures, for each fact measure, we could replicate five times – one for each car. Private Taxi Case Study Replicate for each car ===> 15 fact

measures

Remove this

CarDIM?

▪ CarDIM should not be removed
– Other details of the car are

included in the data retrieval

▪ Because Car Dimension is not a

Determinant Dimension, we can

simply keep the original star

schema
– Include only CarNo in the fact

table

– Use join operation to get more

details about car in query

Private Taxi Case Study (cont.)

select F.MonthYear, D.Make,

D.Model,

sum(F.Total_Kilometers)

from PrivateTaxiFact1 F,

CarDim D

where F.CarNo = D.CarNo

and F.CarNo = 5

and F.MonthYear = ’1908’

group by F.MonthYear, D.Make,

D.Model;

Insist that fact measures are broken into five cars

▪ Need to have five Car Dimension’s key

identifier in the fact

Private Taxi Case Study (cont.)

To retrieve Total Kilometers of five cars:

select F.MonthYear, D1.CarMake, D1.Year,

D2.CarMake, D2.Year, D3.CarMake, D3.Year,

D4.CarMake, D4.Year, D5.CarMake, D5.Year,

sum(Total_Kilometers)

from PrivateTaxiFact2 F, CarDim D1, CarDim

D2, CarDim D3, CarDim D4, CarDim D5

where F.CarNo1 = D1.CarNo

and F.CarNo2 = D2.CarNo

and F.CarNo3 = D3.CarNo

and F.CarNo4 = D4.CarNo

and F.CarNo5 = D5CarNo

and MonthYear = ’1908’

group by F.MonthYear, D1.CarMake, D1.Year,

D2.CarMake, D2.Year, D3.CarMake, D3.Year,

D4.CarMake, D4.Year, D5.CarMake, D5.Year;

▪ Although conceptually the dimension is already shifted to the fact measure,

we still need to keep the original dimension in the star schema.

▪ This is the only way to preserve all other attributes of that dimension.

▪ This relationship replication is applicable to shifting a Determinant

Dimension to the fact measure.

Multiple Relationship between a Dimension and the Fact

Two alternatives to enforce the implementation of a star schema that features a

Determinant Dimension

1. Enforce through the user-interface, whereby the user has to fill in or to

choose the Determinant Attribute during the search.

2. Shift the Determinant Dimension into a Pivoted Fact Table, whereby the key

identifier of the Determinant Dimension is incorporated into the fact

measure.

Determinant Dimension – Conclusion

Oracle/FIT5195-5-MoreComplexProcesses_update (1)

Week 5 – More Complex Processes

Semester 1, 2020

FIT5195 – Business Intelligence
and Data Warehousing

Developed by:
Agnes Haryanto

Agnes.Haryanto@monash.edu

Soon Lay-Ki

Soon.LayKi@monash.edu

MONASH
INFORMATION

TECHNOLOGY

Agenda

1. More Complex Processes

1. Use of count Function

2. Average in the Fact

3. Outer Join

2. One Attribute Dimensions

Using FLUX

1. Visit http://flux.qa/ on your internet enabled device

2. Log in using your Monash account (not required if
you are already logged in to Monash)

3. Click on the “+” to join audience

4. Enter the Audience Code:

• Caulfield – 3GANT7

• Fully Flex – 39WRG8

• Malaysia – VTVPLW

5. Select FIT5195 in the Active Presentation menu

6. Answer questions when they pop up

http://flux.qa/1AW6N8

More Complex Processes

in Creating Fact

Recall – Star Schema Components

▪ There are Three main components of the Star Schema:

1. Facts

2. Dimensions

3. Attributes

Recall – Fact

▪ A Fact Table consists of key attributes from each dimension,

and fact measures.

▪ A Fact Table is created by a join operation, that joins several

tables from the operational database.

▪ Fact tables are created either through TempFact or directly

retrieval from the tables in the operational database.

▪ The fact measure itself is an aggregated value.

➢ In the SQL command, the fact measure attribute in the Fact
Table is created using an aggregate function, such as count

or sum, and the group by operation.

Use of count Function

▪ The count function is one of the most common aggregate functions used to

create the fact measures.

▪ In SQL, there are three ways of using the count function:

1. count(*) – counts number of records in the query result

2. count(attribute) – counts number of records of that attribute excluding

null values

3. count(distinct attribute) – counts number of unique values in that

mentioned attribute

Case Study #1: Mobile Apps Repositories

Monalisa University is an international university. It has an online mobile app store: Monalisa App Store.

The Monalisa App Store allows students from any university in the world to publish their applications and

receive feedbacks. This app store is considered as a research environment where applications developed

during researches and studies can be tested and used by real users. The applications are free and open

source.

Basically, the operational system has the following features:

▪ Every user details are kept in the database, which includes the universities that the students enroll.

▪ Users can publish their apps or download other apps.

▪ Users can give feedbacks and ratings to other apps.

▪ The download and feedback statistics are stored in the database.

▪ Apps are organized into different categories; and many authors may have more than one app in

different categories.

Case Study #1: Mobile Apps Repositories

▪ A data warehouse is needed

analyze the ratings and feedbacks

by different authors and apps, so

that the staff at Monalisa University

can connect with the talented

authors and send them awards

annually.

▪ The author with the highest

average apps rating will be named

author of the year, and will receive

an award.

Case Study #1: Mobile Apps Repositories

create table CategoryDim as

select * from Category;

Case Study #1: Mobile Apps Repositories

create table UniversityDim as

select distinct UniversityID,

UniversityName

from University;

Case Study #1: Mobile Apps Repositories

create table LocationDim as

select distinct

Country || City as LocationID,

City,

Country

from University;

Case Study #1: Mobile Apps Repositories

create table TimeDim as

select distinct

to_char(DownloadDate, ’YYYYMM’) as TimeID,

to_char(DownloadDate, ’MM’) as Month,

to_char(DownloadDate, ’YYYY’) as Year

from Download;

Case Study #1: Mobile Apps Repositories

create table AppsDownloadFact as

select

to_char(D.DownloadDate, ’YYYYMM’) as DownloadMonth,

U.Country || U.City as LocationID,

A.CategoryID,

A.ApplicationID,

U.UniversityID,

count(*) as TotalDownloads

from University U, App_User R, Download D, Application A

where

U.UniversityID = R.UniversityID and

R.UserID = D.DownloaderID and

D.ApplicationID = A.ApplicationID

group by

to_char(D.DownloadDate, ’YYYYMM’),

U.Country || U.City,

A.CategoryID,

A.ApplicationID,

U.UniversityID;

Case Study #1: Mobile Apps Repositories

create table TimeDim as

select distinct

to_char(CreationDate, ’YYYYMM’) as TimeID,

to_char(CreationDate, ’MM’) as Month,

to_char(CreationDate, ’YYYY’) as Year

from Application;

Case Study #1: Mobile Apps Repositories

create table AppsFact as

select

to_char(A.CreationDate, ’YYYYMM’) as CreationMonth,

U.Country || U.City as LocationID,

A.CategoryID,

A.ApplicationID,

U.UniversityID,

count(distinct ApplicationID) as TotalApps

from University U, App_User R, Download D, Application A

where

U.UniversityID = R.UniversityID and

R.UserID = D.DownloaderID and

D.ApplicationID = A.ApplicationID

group by

to_char(A.CreationDate, ’YYYYMM’),

U.Country || U.City,

A.CategoryID,

A.ApplicationID,

U.UniversityID;

Average in the Fact

Average in the Fact
Unit Code Unit Title Semester Student First Name Score

IT001 Database 1 Mirriam 81

IT001 Database 1 Allan 41

IT001 Database 1 Ben 74

IT001 Database 1 Kate 85

IT001 Database 1 Larry 87

IT001 Database 1 Leonard 75

IT001 Database 2 Juan 64

IT001 Database 2 Andy 32

IT002 Java 1 Ally 65

IT002 Java 1 Menson 47

IT002 Java 2 Mirriam 78

IT002 Java 2 Ben 73

IT002 Java 2 Larry 64

IT003 SAP 1 Ally 63

IT004 Network 2 Juan 53

IT004 Network 2 Menson 52

Average in the Fact
Unit Code Unit Title Semester Student First Name Score

IT001 Database 1 Mirriam 81

IT001 Database 1 Allan 41

IT001 Database 1 Ben 74

IT001 Database 1 Kate 85

IT001 Database 1 Larry 87

IT001 Database 1 Leonard 75

IT001 Database 2 Juan 64

IT001 Database 2 Andy 32

IT002 Java 1 Ally 65

IT002 Java 1 Menson 47

IT002 Java 2 Mirriam 78

IT002 Java 2 Ben 73

IT002 Java 2 Larry 64

IT003 SAP 1 Ally 63

IT004 Network 2 Juan 53

IT004 Network 2 Menson 52

The operational database

contains:

• 9 records of Semester one

• 7 records of Semester two

Average in the Fact
Unit Code Unit Title Semester Student First Name Score

IT001 Database 1 Mirriam 81

IT001 Database 1 Allan 41

IT001 Database 1 Ben 74

IT001 Database 1 Kate 85

IT001 Database 1 Larry 87

IT001 Database 1 Leonard 75

IT001 Database 2 Juan 64

IT001 Database 2 Andy 32

IT002 Java 1 Ally 65

IT002 Java 1 Menson 47

IT002 Java 2 Mirriam 78

IT002 Java 2 Ben 73

IT002 Java 2 Larry 64

IT003 SAP 1 Ally 63

IT004 Network 2 Juan 53

IT004 Network 2 Menson 52

The operational database

contains:

• 9 records of Semester one

• 7 records of Semester two

• 8 records of Database Unit

(6 Semester one and 2

Semester two)

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact
(b) Subject Dimension

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact
(b) Subject Dimension

Average Score for the Database Unit in:

• Semester one: (81+41+74+85+87+75)/6 = 73.833

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact
(b) Subject Dimension

Average Score for the Database Unit in:

• Semester one: (81+41+74+85+87+75)/6 = 73.833

• Semester two: (64+32)/2 = 48

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact
(b) Subject Dimension

Average Score for the Database Unit in:

• Semester one: (81+41+74+85+87+75)/6 = 73.833

• Semester two: (64+32)/2 = 48

These are actually

incorrect!

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact
(b) Subject Dimension

The SQL command:
select avg(Average_Score)

from EnrolmentFact

where UnitCode = ’IT001’;

Query: Calculate Average Score for the Database Unit:

• (73.833 + 48) / 2 = 60.9165

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

(a) Fact

(b) Operational Database
Query: Calculate Average Score for the Database Unit.

Calculation using Fact:

• (73.833 + 48) / 2 = 60.9165

Calculation based on the Operational Database:

• (81+41+74+85+87+75+64+32) / 8 =

539 / 8 = 67.375

Unit Code Unit Title Semester Student First Name Score

IT001 Database 1 Mirriam 81

IT001 Database 1 Allan 41

IT001 Database 1 Ben 74

IT001 Database 1 Kate 85

IT001 Database 1 Larry 87

IT001 Database 1 Leonard 75

IT001 Database 2 Juan 64

IT001 Database 2 Andy 32

IT002 Java 1 Ally 65

IT002 Java 1 Menson 47

IT002 Java 2 Mirriam 78

IT002 Java 2 Ben 73

IT002 Java 2 Larry 64

IT003 SAP 1 Ally 63

IT004 Network 2 Juan 53

IT004 Network 2 Menson 52

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

(a) Fact

(b) Operational Database
Query: Calculate Average Score for the Java Unit in both Semesters.

Calculation using Fact:

• (56+71.667) / 2 = 63.833

Calculation based on the Operational Database:
• (65+47+78+73+64) / 5 = 65.4

Unit Code Unit Title Semester Student First Name Score

IT001 Database 1 Mirriam 81

IT001 Database 1 Allan 41

IT001 Database 1 Ben 74

IT001 Database 1 Kate 85

IT001 Database 1 Larry 87

IT001 Database 1 Leonard 75

IT001 Database 2 Juan 64

IT001 Database 2 Andy 32

IT002 Java 1 Ally 65

IT002 Java 1 Menson 47

IT002 Java 2 Mirriam 78

IT002 Java 2 Ben 73

IT002 Java 2 Larry 64

IT003 SAP 1 Ally 63

IT004 Network 2 Juan 53

IT004 Network 2 Menson 52

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

(a) Fact

(b) Operational Database
Query: Calculate Average Score for Semester One.

Calculation using Fact:

• (73.833+56+63) / 3 = 64.287

Calculation based on the Operational Database:
• (81+41+74+85+87+75+65+47+63) / 9 = 68.667

Unit Code Unit Title Semester Student First Name Score

IT001 Database 1 Mirriam 81

IT001 Database 1 Allan 41

IT001 Database 1 Ben 74

IT001 Database 1 Kate 85

IT001 Database 1 Larry 87

IT001 Database 1 Leonard 75

IT001 Database 2 Juan 64

IT001 Database 2 Andy 32

IT002 Java 1 Ally 65

IT002 Java 1 Menson 47

IT002 Java 2 Mirriam 78

IT002 Java 2 Ben 73

IT002 Java 2 Larry 64

IT003 SAP 1 Ally 63

IT004 Network 2 Juan 53

IT004 Network 2 Menson 52

Average in the Fact

Unit Code Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

(a) Fact

(b) Operational Database
Query: Calculate Average Score for Semester Two.

Calculation using Fact:

• (48+71.667+52.5) / 3 = 57.389

Calculation based on the Operational Database:
• (64+32+78+73+64+53+53) / 7 = 59.4286

Unit Code Unit Title Semester Student First Name Score

IT001 Database 1 Mirriam 81

IT001 Database 1 Allan 41

IT001 Database 1 Ben 74

IT001 Database 1 Kate 85

IT001 Database 1 Larry 87

IT001 Database 1 Leonard 75

IT001 Database 2 Juan 64

IT001 Database 2 Andy 32

IT002 Java 1 Ally 65

IT002 Java 1 Menson 47

IT002 Java 2 Mirriam 78

IT002 Java 2 Ben 73

IT002 Java 2 Larry 64

IT003 SAP 1 Ally 63

IT004 Network 2 Juan 53

IT004 Network 2 Menson 52

Average in the Fact

Unit Code Semester Total_Score Number_of_Students

IT001 1 443 6

IT001 2 96 2

IT002 1 112 2

IT002 2 215 3

IT003 1 63 1

IT004 2 105 2

(a) Fact Version 2

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(b) Subject Dimension

Average in the Fact

Unit Code Semester Total_Score Number_of_Students

IT001 1 443 6

IT001 2 96 2

IT002 1 112 2

IT002 2 215 3

IT003 1 63 1

IT004 2 105 2

(a) Fact Version 2

The SQL command:
select sum(Total_Score)/

sum(Number_of_Students)

as Average_Score

from EnrolmentFact2

where UnitCode = ’IT001’;

Query: Calculate Average Score for

the Database Unit:

• (443+96)/(6+2) = 67.375

Average in the Fact

▪ The problem of Average in the Fact is known as the Average of an Average

problem.

➢ This problem is well known in Mathematics and Statistics.

➢ Average of an average will simply produce an incorrect average result.

▪ Hence, it is not desirable to have an average measure in the fact.

➢ Exceptional case: when the analysis ALWAYS uses all the dimensions (e.g.

Determinant Dimensions).

Min & Max in the Fact

▪ If Average should not be used in the fact, how about Min or Max?

➢ Yes, we can.

➢ Because Max of Max is always a global Max, and Min of Min is always a global

Min.

Ucode Semester Min_Score Max_Score

IT001 1 41 87

IT001 2 32 64

IT002 1 47 65

IT002 2 64 78

IT003 1 63 63

IT004 2 52 53

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact

(b) Subject Dimension
(c) Semester Dimension

Min & Max in the Fact

▪ Query: Find the Maximum Score of Database Unit.

• Max of {87, 64} is 87.

▪ The SQL command:
select max(Max_Score) from EnrolmentFact

where UnitCode = ’IT001’;

Ucode Semester Min_Score Max_Score

IT001 1 41 87

IT001 2 32 64

IT002 1 47 65

IT002 2 64 78

IT003 1 63 63

IT004 2 52 53

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact

(b) Subject Dimension
(c) Semester Dimension

Min & Max in the Fact

▪ Query: Find the Minimum Score of Database Unit.

• Min of {41, 32} is 32.

▪ The SQL command:
select min(Min_Score) from EnrolmentFact

where UnitCode = ’IT001’;

Ucode Semester Min_Score Max_Score

IT001 1 41 87

IT001 2 32 64

IT002 1 47 65

IT002 2 64 78

IT003 1 63 63

IT004 2 52 53

Unit Code Unit Title

IT001 Database

IT002 Java

IT003 SAP

IT004 Network

Semester

(a) Fact

(b) Subject Dimension
(c) Semester Dimension

Average in the Fact – Conclusion

▪ Average in the Fact is not desirable, although technically it satisfies the two

criteria of the fact (e.g. must be a numerical and aggregate value).

▪ Min and Max in the Fact can still be used, since Min Score and Max Score are

valid fact measures (e.g. they are numerical and aggregated values).

▪ In general, count and sum are more common.

Outer Join

Case Study #2: Employment Agency

The Employment Agency has a file of candidates who are willing to work.

▪ They record the candidates number, name, contact address, contact phone number and

maximum hours the candidate is available per week.

▪ Each candidate may have several qualifications.

▪ The Agency uses special codes to record the candidate’s qualification for a job opening.

▪ As well as recording the code, the Agency also records the experience of the candidate in

each qualification area, expressed as the number of months experience.

Case Study #2: Employment Agency

The Agency also has a list of companies that request temporaries.

▪ Each company is assigned a company number as an identifier.

▪ The company name, type of business and principal contact for employment placements is
also recorded.

▪ Each time a company requests a temporary employee, the Agency makes an entry in the
(job) openings file. This file contains an opening number, the company requesting an
employee, required qualification, starting date, anticipated ending date, and hourly pay.

▪ Each opening requires only one specific or main qualification.

▪ The Agency may be able to fill the opening from the staff on its books, however in some
circumstances it cannot fill the request.

Case Study #2: Employment Agency

▪ When a candidate matches the qualification, and is available, he/she is given the job, and
an entry is made in the placement record folder.

➢ This folder contains an opening number, candidate number, actual start date, total
hours worked to date, and end date if the placement is completed.

➢ The placement record folder is used by the Agency as a source of placement
histories for it’s various temporaries.

Case Study #2: Employment Agency

▪ Note:

➢ Not all Openings have

Placements, but each Placement

comes from the Opening.

➢ A Candidate may have several

Placements.

➢ A Company may open several

Openings.

➢ Each Opening has one main

qualification requirement.

Case Study #2: Employment Agency

▪ A data warehouse to analyze

the total openings and total

placements by job durations,

qualification requirements, and

placement starting month.

▪ Several types of job durations:

➢ Short-term (<10 days) ➢ Medium-term (10 to 30 days) ➢ Long-term (>30 days)

Case Study #2: Employment Agency

▪ Star Schema

➢ Fact:

• Total Opening

• Total Placement

➢ Dimensions:

• Duration

• Qualification

• Month

Case Study #2: Employment Agency

▪ Star Schema

➢ Fact:

• Total Opening

• Total Placement

➢ Dimensions:

• Duration

• Qualification

• Month

Case Study #2: Employment Agency

▪ Star Schema

➢ Fact:

• Total Opening

• Total Placement

➢ Dimensions:

• Duration

• Qualification

• Month

▪ Several types of job durations:

➢ Short-term (<10 days) ➢ Medium-term (10 to 30 days) ➢ Long-term (>30 days)

Case Study #2: Employment Agency

▪ Star Schema

➢ Fact:

• Total Opening

• Total Placement

➢ Dimensions:

• Duration

• Qualification

• Month

Case Study #2: Employment Agency

▪ Star Schema

➢ Fact:

• Total Opening

• Total Placement

➢ Dimensions:

• Duration

• Qualification

• Month

Case Study #2: Employment Agency

▪ Star Schema

➢ Fact:

• Total Opening

• Total Placement

➢ Dimensions:

• Duration

• Qualification

• Month

1. How many short-term job openings?

2. How many jobs requiring certain qualifications?

3. How many people started working in a certain

month?

Case Study #2: Employment Agency

create table QualificationDim as

select * from Qualification;

Case Study #2: Employment Agency

create table MonthDim as

select distinct

to_char(ActualStartDate, ’Month’)

as MonthName

from Placement;

Case Study #2: Employment Agency

create table DurationDim

(DurationID number,

DurationDesc varchar2(20));

insert into DurationDim values (1, ’Short-Term’);

insert into DurationDim values (2, ’Medium-Term’);

insert into DurationDim values (3, ’Long-Term’);

Case Study #2: Employment Agency

create table TempFact as

select O.QCode,

O.StartDate,

O.EndDate,

to_char(P.ActualStartDate, ’Month’)

as MonthName,

O.OpenNo,

P.CandNo

from Opening O, Placement P

where O.OpenNo = P.OpenNo (+);

Case Study #2: Employment Agency

create table TempFact as

select O.QCode,

O.StartDate,

O.EndDate,

to_char(P.ActualStartDate, ’Month’)

as MonthName,

O.OpenNo,

P.CandNo

from Opening O, Placement P

where O.OpenNo = P.OpenNo (+);
Left Outer Join

Case Study #2: Employment Agency

alter table TempFact

add (DurationID number);

update TempFact

set DurationID = 1

where EndDate – StartDate < 10; update TempFact set DurationID = 2 where EndDate - StartDate >= 10

and EndDate – StartDate <=30; update TempFact Set DurationID = 3 where EndDate - StartDate > 30;

Case Study #2: Employment Agency

create table AgencyFact as

select

QCode, DurationID, MonthName,

count(OpenNo) as TotalOpening,

count(CandNo) as TotalPlacement

from TempFact

group by QCode, DurationID, MonthName;

One-Attribute

Dimensions

One-attribute dimensions

One-attribute dimensions
● Dimensions with only one attribute, i.e. key attribute

Column-based Solution in the Fact

● Medal Type Dimension table seems unnecessary, because Medal Type also

exists in the fact table

● Hence, there is no need to reference to the Medal Type Dimension

Column-based Solution in the Fact (cont.)

Column-based solution
● Convert the dimension attribute into

new columns (fact measures) in the

fact table

● Only applicable for dimensions with

low cardinality, because it is not

practical to create a new column in

the fact table for each record of the

dimension

Row-based Solution in the Fact

● What happens if the one-attribute dimension has a lot of records?

● The solution:
○ Remove the dimension and keep the attribute of the dimension as a dimension-

less key in the fact table

Row-based solution whereby

each medal type is

represented by records in the

fact table

Row-based Solution in the Fact – Sales Case Study

● Assume that there are only four quarters (e.g. Q1, Q2, …, Q4), regardless

the year

● The Quarter could be shifted to the fact measures

● Result: Pivoted fact table

Q1 to Q4

A Sales Star Schema with Pivoted Fact

▪ What if the Quarter also embeds the Year, e.g. Q12020, Q22018, and etc.?
– Pivot Fact table cannot be the solution, because the number of possible records

in the Time Dimension can be large

▪ The solution:
– Include the Quarter attribute as a Dimension-less Key in the fact, and to remove

the Time Dimension from the star schema

– Result: Row-based solutions – not restricted by the number of records in the

one-attribute dimension

Row-based Solution in the Fact (cont.)

▪ Similarity:
– The contents of the fact table are identical.

▪ Difference:
– The dimension-less key attribute has no references to the Time Dimension,

because the Time Dimension has been removed, whereas the Quarter in the star

schema has a reference to the Time Dimension.

Star Schema with Dimension-less Key vs. One-Attribute Dimension

Star Schema with Dimension-less key

Star Schema with TimeDIM

▪ Looking at the high level design, it is

often more desirable to keep the

one-attribute dimension, i.e. the Time

Dimension in this case

– Provides a clearer design,

considering that the fact table

remain the same, with or without

the one-attribute dimensions

Star Schema with Dimension-less Key vs. One-Attribute Dimension (cont.)

One-attribute dimensions

▪ If there is more than one dimension with single attribute, and these

dimensions are not related to each other, it is often desirable to combine

them into a junk dimension

▪ Case study:
– Analyse the number of sales based on

these dimensions

Combine all one-attribute dimensions

Sales Fact Table

Combine all one-attribute dimensions (cont.)

Cartesian product of 4

dimension tables

▪ Assuming that each book can only have one category, then Category is

related to Book

▪ In this case, we can simply move the category into the book dimension

Combine with other normal dimensions – Book Sales Case Study

Combine with other normal dimensions – Book Sales Case Study (cont.)

Determinant Dimension with one-attribute only – Weather Data Case Study

▪ WeatherMonth attribute is a Determinant Dimension in this case.

▪ Hence, WeatherMonth Dimension must be kept in the star schema

– To enforce that WeatherMonth Dimension (or WeatherMonth attribute)

must be used in all queries to this star schema

WeatherMonthDI

Questions:

1. What is the granularity of this

star schema?

2. How can we have a higher

granularity in this star schema

(no category to go for)?

Answer to Question #2:
▪ Combine books into groups

based on the authors

One-attribute dimension with bridge – Book Sales Case Study

▪ The ISBN Group List can be

created by using the
listagg function

▪ The Book Group List

Dimension is a one-attribute

dimension

▪ It is not possible to remove

this dimension as it links the

fact and the bridge table

One-attribute dimension with bridge – Book Sales Case Study (cont.)

1. To move the one attribution dimensions to the fact table

– The one-attribute dimension must have low cardinality

– Else, keep the dimension attribute in the fact as a dimension-less key

– From high level point of view, sometimes it is desirable to keep the

attribute dimension in the design

2. To combine the one-attribute dimension with other dimensions

– Form a junk dimension that combines several one-attribute dimensions

– Move the one-attribute dimension into another normal dimension

– One-attribute dimensions cannot be removed if:

▪ It is a determinant dimension

▪ It is connected to a bridge table

Summary for One-Attribute Dimensions

End of Week 5 Lecture

Oracle/Requiremnts-Oracle

FIT5195 S1 2020 – Take Home Test (10%)

Due date: Week 6, Friday 1-May-2020, 11:55pm

This Take Home Test consists of two Case Studies. You are required to answer all questions.

Case Study 1: Government Hospital Data Warehouse (50 marks)
Currently, the Victorian government has a simple database system implemented to store the

collective data of all government-run hospitals. The database system includes information about their

services, staff, and patients. The simple database system consists of the following tables and

attributes:

Figure 1 : A simplified structure of the Government Hospital’s current E/R Diagram

Due to the nationwide impact of a new pandemic, the Premier of Victoria (i.e. the head of government

in the Australian state of Victoria) requires a data warehouse for analyzing trends and forecasting

purposes. The goal of the analyzing is to identify which frequently used services, popular hospital

locations, etc.

You are required to design a small Data Warehouse to keep track of the statistics. The Premier is

particularly interested in analyzing the total number (population quantity) of patients and total

service charged by services used, time periods, age groups, and hospital locations.

The Premier also wants to be able to drill down based on the cost of services range (low price < $20, medium price between $20 and $50, and high price >$50).

Tasks:

[1]. Create the operational database based on the above E/R Diagram, and then populate with
sufficient numbers of records in each table. It is suggested that each table would have 5-20

records depending on the use of that table in the operational database. You need to create a script

file containing the CREATE TABLE and INSERT INTO commands.

You can follow the list of services from this link:

https://www.monash.edu/health/medical/services.

[2]. Then develop a GOVERNMENT HOSPITAL star schema. Identify the fact table, dimensions
and attributes required to support the schema. If the star schema consists of a Bridge Table, you

have to also include the Weight Factor and List Aggregate.

The result of this task is a star schema diagram. You can use any drawing tool, such as

Lucidchart, to draw the star schema.

[3]. Validate your star schema using the Two-Column Table Methodology. You are required to
illustrate some two-column tables for this task based on your star schema design.

[4]. Write the SQL commands to create the fact and dimension tables. You need to create a script
file containing the appropriate SQL commands to create the fact and dimension tables.

[5]. Write the SQL commands to answer the following queries: (you need to make sure that there are
records in your fact and dimensions tables. For each of the following queries, write the SQL and

show the results):

a) Show the total number of patients making appointments during Winter.
b) Show the total service charged for each service cost type.
c) Show the total number of patients by each age group (infant <1, children <18, adult 18+, senior 65+) in April 2020. d) Show the total service charged for general medical consultations in each suburb. Submission checklist for Case Study 1: A. A pdf file or word document containing: i. The SQL commands to create tables and insert records to the operational database – Task 1 ii. The star schema diagram – Task 2 iii. The Two-Column Table Methodology illustration – Task 3 iv. The SQL commands to create the dimension and fact tables, as well as the contents of these tables – Task 4 v. The SQL commands to answer the queries in Task 5 and the query results B. The .sql files containing: i. The SQL commands to create the operational database. ii. The SQL commands to create the data warehouse. iii. The SQL commands to answer the queries in Task 5. (Notes: The marker of this test will simply run this file. So you need to make sure that the SQL commands are written correctly in the .sql file. Pay a particular attention to the quote, especially if you cut and paste from a Word file). C. A folder: i. Save all the files from [A] and the SQL files from [B] in one folder. ii. Name the folder CaseStudy1_yourStudentID. https://www.monash.edu/health/medical/services Case Study 2: Accident Records Data Warehouse (50 marks) Figure 2 : A simplified structure of the Accident Record’s current database system The above figure displays how the Victorian Roads and Safety (VicR&S) Department stores accident related data. When there are any road accidents, the details about the accidents such as the location, date, time and event (e.g. what exactly happened during the accident) are recorded. For each accident, the information (e.g. name, contact number, employment start date, current employment branch) about one police officer who is in charge of overseeing the events of the accident (e.g finding out who/what caused the accident) are stored. In addition, to identify a particular accident, each accident is given a unique accident number. On the other hand, one accident may be caused by several vehicles (e.g. a collision between 2 vehicles). All vehicles have owners who may or may not be the same as the vehicle driver during the accident. Therefore, the VicR&S database stores information about vehicles involved in the accident (i.e. vehicle number, vehicle model, vehicle make and vehicle manufacturer), the information about the owners of the vehicles (i.e. owner name, owner address, owner contact number) and the information about the drivers at the time of the accident (driver name, driver address, driver contact number, driver licence no, and when the driver started to drive the vehicle). Furthermore, for each accident, the severity of damage incurred by each vehicle is recorded. Currently, to reduce the number of road accidents VicR&S department is interested in the following: - The total number of accidents happening by different locations and by different lighting periods (daytime: 6AM - 5:59PM and nighttime 6PM - 5:59AM). - The total number of accidents by each vehicle model. - The number of vehicles involved in every accident event on different locations. - The number of accidents taken care of different police officer branches. Tasks: [1]. Develop an ACCIDENT RECORDS star schema. Identify the fact table, dimensions and attributes required to support the schema. If the star schema consists of a Bridge Table, you have to also include the Weight Factor and List Aggregate. The result of this task is a star schema diagram. You can use any drawing tool, such as Lucidchart, to draw the star schema. [2]. Validate your star schema using the Two-Column Table Methodology. You are required to illustrate some two-column tables for this task based on your star schema design. [3]. Write the SQL commands to create the fact and dimension tables. You need to create a script file containing the appropriate SQL commands to create the fact and dimension tables. The operational tables are accessible from the ACCIDENT account. The result of this task is the SQL commands. You will also need to show the contents of the tables that you have created. [4]. Write the SQL commands to answer the following queries: (you need to make sure that there are records in your fact and dimensions tables. For each of the following queries, write the SQL and show the results): a) Show the total number of accidents happening by different locations and by different lighting periods (daytime: 6AM - 5:59PM and nighttime 6PM - 5:59AM). b) Show the total number of accidents by each vehicle model. c) Show the number of vehicles involved in every accident event on different locations. d) Show the number of accidents taken care of by different police officer branches. [5]. You need to come up with additional two more questions and answer these questions using the SQL commands. Also explain the reason for why the management would like to have such information. Submission checklist for Case Study 2: A. A pdf file or word document containing: i. The star schema diagram – Task 1 ii. The Two-Column Table Methodology illustration – Task 2 iii. The SQL commands to create the dimension and fact tables, as well as the contents of these tables – Task 3 iv. The SQL commands to answer the queries in Task 4 and the query results v. The additional two questions with the SQL commands, query result, and your explanation – Task 5 B. The .sql files containing: i. The SQL commands to create the data warehouse. ii. The SQL commands to answer the queries in Task 4 and Task 5. (Notes: The marker of this test will simply run this file. So you need to make sure that the SQL commands are written correctly in the .sql file. Pay a particular attention to the quote, especially if you cut and paste from a Word file). C. A folder: i. Save all the files from [A] and the SQL files from [B] in one folder. ii. Name the folder CaseStudy2_yourStudentID. Submission Method: 1. A zip file: a. Save the CaseStudy1_yourStudentID and CaseStudy2_yourStudentID folders into one folder named FIT5195_Test_yourname_studentID. b. ZIP the folder FIT5195_Test_yourname_studentID.zip. This must be a ZIP file and not other types of compressed folder. The zip file should contain the prescribed folder structure as follows: ● FIT5195_Test_yourname_studentID/ ○ CaseStudy1_yourStudentID/ ○ Files for tasks ○ CaseStudy2_yourStudentID/ ○ Files for tasks 2. Upload your zip file on Moodle by the due date: Friday 1-May-2020, 11:55pm (No late submission is accepted). END OF TAKE HOME TEST Oracle/Week-1c-SQL-revision 11 SQL - Revision Assoc. Prof. David Taniar 22 33 SQL - Revision Outline: A. Introduction to SQL client environment B. Create tables C. Insert into D. Simple query retrieval E. Updating and deleting records F. Commit G. Joining multiple tables H. Aggregate functions and group by I. Alter tables 44 A. Introduction to SQL client • There are several SQL client software: • SQL Developer • SQL*Plus (Windows, Mac, Unix) • Login details: • Username: S12345678 (12345678 is your student id) • Password: student • Host string: the unit code 5 Introduction to SQL Developer Oracle username which starts with S and followed by your student ID. Password is student Location of the server This is the hoststring, which is the database name rhino.its.monash.edu FIT5195 FIT5195 6 Introduction to SQL Developer Single Statement Execution Script Execution Output of Execution Type Command 77 B. Create Tables • General Syntax: CREATE TABLE

(attribute1 data_type ,

attribute2 data_type,

…

PRIMARY KEY (attribute1),

FOREIGN KEY (attribute1)

REFERENCES (key_attribute));

Create Tables – Data Types

• Data type denotes a kind of data of an attribute value

• Character data types: VARCHAR2 and CHAR

Make VARCHAR2(20)

Model VARCHAR2(30)

• Number data types: NUMBER

Year NUMBER(4)

StampDuty NUMBER(8,2)

• Date data type: DATE

Salesdate DATE

Create Tables – Data Types

• Date Data Type

• DATE stores dates from 1/1/4712 BC to 12/31/4712 AD

• Default date format: DD-MON-YYYY

• example: 05-JUN-2011

• Example declaration: salesDate DATE

• DATE data type also stores time values

1010

Create Tables – Data Types

• Default time format: HH:MI:SS A.M.

• If no time value is given when a date is inserted, default value is

12:00:00 A.M.

• If no date value is given when a time is inserted, default date is

first day of current month

• Example salesDate field: 07-JUN-2016 12:00:00 A.M.

1111

Create Tables – Constraints

• Integrity Constraints

• Primary Key attribute

• NOT NULL constraints

• Specifies that a field cannot be NULL

• Sample Declaration: Field_name data_type NOT NULL

• Foreign Key attribute in a table refers to another record in another table

1212

Create Tables – Example 1

• Example: create CAR, CARSALES and CUSTOMER tables

• Commands:

• CAR table

CREATE TABLE Car

(carID NUMBER(5) NOT NULL,

make VARCHAR2(20) NOT NULL,

model VARCHAR2(30),

year NUMBER(4),

colour VARCHAR2(25),

advertisedPrice NUMBER(10),

PRIMARY KEY (carID)

);

1313

Create Tables – Example 2

• CUSTOMER table

CREATE TABLE Customer

(customerID NUMBER(5) NOT NULL,

cNAME VARCHAR2(20) NOT NULL,

cPhone VARCHAR2(10),

address VARCHAR2(50),

PRIMARY KEY (customerID)

);

1414

Create Tables – Example 3

• CARSALES table

CREATE TABLE Carsales

(carID NUMBER(5) NOT NULL,

customerID NUMBER(5) NOT NULL,

salesDate DATE NOT NULL,

purchasedPrice NUMBER(10) NOT NULL,

stampDuty NUMBER(8,2) NOT NULL,

PRIMARY KEY (carID,customerID),

FOREIGN KEY (carID) REFERENCES Car(carID),

FOREIGN KEY (customerID) REFERENCES Customer(customerID)

);

1515

Create Tables – by Copying

• You can create a new table by copying from an existing table:

CREATE TABLE

AS SELECT *

FROM … ;

• For example:

CREATE TABLE car

AS SELECT *

FROM dtaniar.car;

• Notes:

• It creates and copy the records from the existing table

• However, it does not copy the PK and FK

• In the above example, it copies the table from dtaniar account.

1616

Create Tables – View Tables

• Viewing Information about Tables

• To view all tables in the database, the general syntax is:

SELECT * FROM TAB;

• For example:

SELECT * FROM tab;

TNAME TABTYPE CLUSTERID

—————————— ——- ———-

CAR TABLE

CARSALES TABLE

CUSTOMER TABLE

1717

Create Tables – Describe Tables

• To view the table structure, the general syntax is:

DESCRIBE

• For example:

DESC car;

Name Null Type

————— ——– ————

CARID NOT NULL NUMBER(5)

MAKE NOT NULL VARCHAR2(20)

MODEL VARCHAR2(30)

YEAR NUMBER(4)

COLOUR VARCHAR2(25)

ADVERTISEDPRICE NUMBER(10)

1818

Create Tables – Drop Tables

• To drop an unwanted table, the general syntax is:

DROP TABLE ;

• For example:

DROP TABLE Car;

• If the table that you want to delete (e.g. table Car) is being used as a FK by

another table (e.g. table Carsales), then you cannot delete table Car.

• In this case, you need to delete table Carsales first, before deleting table Car.

1919

C. Insert Into

• General Syntax

• To insert values for every single attribute in a record:

INSERT INTO

VALUES (attribute1_value, attribute2_value,….);

• Example:

INSERT INTO Car

VALUES (1, ‘Holden’, ‘Cruze’, 2015, ‘Black’, 25780);

• Strings are enclosed in single quotes (‘) and are case-sensitive

(e.g. ‘General Practice’ is different from ‘general practice’)

Insert Into

• To insert a value of selected attributes:

INSERT INTO (attribute1,attribute2,…)

VALUES (attribute1_value, attribute2_value,….)

• Example:

INSERT INTO Car (carID, make) VALUES (’16’, ’Audi’);

2121

Insert Into

• The TO_DATE function:

• TO_DATE (‘date_value’, ’format mask’);

• Example:

INSERT INTO Carsales

VALUES (1,4, TO_DATE(’04/Feb/2015′, ‘DD/MON/YYYY’),25780,824.96);

2222

Insert Into

• The common DATE format masks

Format Mask Formatted Data

DD-MON-YYYY 05-FEB-2007

MM/DD/YYYY 02/05/2007

HH:MI AM 02:30 PM

MONTH DAY, YYYY FEB 5, 2007

MM/DD/YYYY HH:MI AM 02/05/2007 02:30 PM

Sample DATE format masks

Insert Into

• Insert multiple records one-by-one:

INSERT INTO Car VALUES (2, ‘BMW’, ‘520d’, 2016, ‘Grey’, 98800);

INSERT INTO Car VALUES (3, ‘Audi’, ‘A5’, 2016, ‘Black’, 68200);

INSERT INTO Car VALUES (4, ‘Holden’, ‘Commodore’, 2008, ‘Grey’, 12650);

• Insert multiple records at once:

INSERT ALL

INTO Car VALUES (2, ‘BMW’, ‘520d’, 2016, ‘Grey’, 98800)

INTO Car VALUES (3, ‘Audi’, ‘A5’, 2016, ‘Black’, 68200)

INTO Car VALUES (’ 4, ‘Holden’, ‘Commodore’, 2008, ‘Grey’, 12650)

SELECT * FROM DUAL;

2424

D. Simple Query Retrieval

• Simple Retrieval

• Retrieve all Records

• General Syntax :

SELECT *

FROM ;

• Example: retrieve everything from the CAR table:

SELECT * FROM car;

• Output:

2525

Simple Query Retrieval

• Retrieve Specific Fields

• General Syntax:

SELECT

FROM ;

• Example: select only the model from the CAR table

SELECT model FROM car;

• Output:

2626

Simple Query Retrieval

• Eliminating Duplicated Records (DISTINCT qualifier)

• General Syntax:

SELECT DISTINCT

FROM ;

• Example: eliminating duplicates for the MAKE values:

SELECT DISTINCT make FROM car;

2727

Simple Query Retrieval

• Output (with DISTINCT):

• Original Output (without DISTINCT):

2828

Simple Query Retrieval

• DISTINCT Multiple Attributes

• Example: eliminating duplicates for the MAKE and YEAR values:

SELECT DISTINCT make, year

FROM car;

MAKE YEAR

——————– ———-

Volkwagen 2015

Toyota 2013

Mini 2009

Holden 2015

BMW 2016

Honda 2012

Subaru 2014

Holden 2008

Jeep 2006

Mazda 2011

Audi 2016

Lexus 2009

Honda 2010

Nissan 2010

Mercedes Benz 2006

15 rows selected

2929

Simple Query Retrieval

• Conditional Retrieval

• Search Conditions specified for more complex data retrieval

• The WHERE Clause

• Operators:

• equal (=)

• greater than (>)

• less than (<) • greater than or equal to (>=)

• less than or equal to (<=) • Not equal (<>)

3030

Simple Query Retrieval

• General Syntax:

SELECT

FROM

WHERE

;

• Example:

SELECT year, model, advertisedPrice

FROM car

WHERE make=’Holden’

YEAR MODEL ADVERTISEDPRICE

———- ———— —————

2015 Cruze 25780

2008 Commodore 12650

3131

Simple Query Retrieval

• “BETWEEN”

• Example: list the carid, salesDate and purchasedPrice after 2014 and

before 2016

SELECT carid, salesdate, purchasedPrice

FROM carsales

WHERE salesdate

BETWEEN TO_DATE(’01-JAN-2014′, ‘DD-MON-YYYY’)

AND TO_DATE(’31-DEC-2015′,’DD-MON-YYYY’);

CARID SALESDATE PURCHASEDPRICE

———- ———– ————–

1 04-FEB-2015 25780

8 12-DEC-2015 13999

3232

Simple Query Retrieval

• Using a String Comparison

• Example: list the carid, salesDate and purchasedPrice after 2014 and

before 2016

SELECT carid, salesdate, purchasedPrice

FROM carsales

WHERE TO_CHAR(salesdate, ‘YYYYMMDD’) >’20140101’

AND TO_CHAR(salesdate, ‘YYYY’MMDD’) < ‘20151231’; CARID SALESDATE PURCHASEDPRICE ---------- ----------- -------------- 1 04-FEB-2015 25780 8 12-DEC-2015 13999 D 3333 Simple Query Retrieval • “AND” or “OR” • AND: both conditions must be true • Example: SELECT make, model, year FROM car WHERE year>2014 AND make=’Holden’;

MAKE MODEL YEAR

——— —————- ———-

Holden Cruze 2015

3434

Simple Query Retrieval

• OR: either one of the condition is true

• Example:

SELECT make, model, colour, year, advertisedPrice

FROM car

WHERE colour=’Red’ OR year>2014;

MAKE MODEL COLOUR YEAR ADVERTISEDPRICE

——— ———- ———- ———- —————-

Holden Cruze Black 2015 25780

BMW 520d Grey 2016 98800

Audi A5 Black 2016 68200

Honda Civic Red 2012 13488

Lexus RX350 Red 2009 33500

Volkwagen Golf White 2015 39888

6 rows selected

3535

Simple Query Retrieval

• Other Conditions

• LIKE/NOT LIKE

• Example: displaying all MAKE from car table that has their first

character as ‘M’

SELECT make

FROM CAR

WHERE make LIKE ‘M%’;

MAKE

—————-

Mazda

Mercedes Benz

Mini

3636

Simple Query Retrieval

• IN/NOT

• suitable to perform a set member search

• Example:

• IN – displaying all MODEL that its MAKE is either ‘Holden’ or ‘Honda’

SELECT make, model

FROM car

WHERE make IN (‘Holden’, ‘Honda’);

MAKE MODEL

——————– ————-

Holden Cruze

Holden Commodore

Honda Civic

Honda City

3737

Simple Query Retrieval

• IN/NOT

• Example:

• NOT IN – displaying make, model and colour of cars that are not ‘Black’

and ‘White’

SELECT make, model, colour

FROM car

WHERE colour NOT IN (‘Black’, ‘White’);

MAKE MODEL COLOUR

————– ———— ————

BMW 520d Grey

Holden Commodore Grey

Honda Civic Red

Lexus RX350 Red

Mazda 3Series Blue

Mercedes Benz C200 Blue

Honda City Grey

3838

Simple Query Retrieval

• NULL/NOT NULL

• Example:

• NULL Operator:

SELECT carid, make

FROM car

WHERE colour IS NULL;

CARID MAKE

—— ———

12 Subaru

3939

Simple Query Retrieval
• NOT NULL Operator:

SELECT carid, make

FROM car

WHERE colour IS NOT NULL;

CARID MAKE

——- ——————–

1 Holden

2 BMW

3 Audi

4 Holden

5 Honda

6 Jeep

7 Lexus

8 Mazda

9 Nissan

10 Volkwagen

11 Mercedes Benz

13 Honda

14 Mini

15 Toyota

14 rows selected

4040

Simple Query Retrieval

• MULTIPLE OPERATORs

• Example: list cars make start with ‘M’ and was made before 2016

SELECT year, make, model

FROM car

WHERE make LIKE ‘M%’

AND year<2016; YEAR MAKE MODEL ---- ---------------- ------------- 2011 Mazda 3Series 2006 Mercedes Benz C200 2009 Mini Cooper D 4141 Simple Query Retrieval • Sorting • specify to sort the output by using ORDER BY • General Syntax: SELECT

FROM

ORDER BY [DESC];

4242

Simple Query Retrieval
• Example: retrieving all make, model, advertisedPrice and colour in a descending

order of the advertisedPrice
SELECT make, model, advertisedPrice, colour

FROM car

ORDER BY advertisedPrice DESC;

MAKE MODEL ADVERTISEDPRICE COLOUR

————— —————— ————— ——–

BMW 520d 98800 Grey

Audi A5 68200 Black

Volkwagen Golf 39888 White

Subaru Outback 33950 NULL

Lexus RX350 33500 Red

Holden Cruze 25780 Black

Toyota Corolla 23888 White

Jeep Grand Cherokee 21999 Black

Mini Cooper 19750 Black

Nissan Dualis 18888 White

Mercedes Benz C200 16995 Blue

Mazda 3Series 13999 Blue

Honda Civic 13488 Grey

Holden Commodore 12650 Grey

Honda City 7490 Grey

15 rows selected

Simple Query Retrieval

• “AS”: to rename a column

• Example: list all SalesDate, PurchasedPrice, StampDuty and

TotalPrice (TotalPrice = PurchasedPrice+StampDuty )

SELECT salesdate, purchasedPrice, stampduty,

(purchasedPrice+StampDuty) as TotalPrice

FROM carSales;

SALESDATE PURCHASEDPRICE STAMPDUTY TOTALPRICE

———– ————– ———- ———-

04-FEB-2015 25780 824.96 26604.96

13-JUL-2016 12650 506 13156

12-DEC-2015 13999 559.96 14558.96

14-JUN-2016 39888 1276.42 41164.42

18-MAY-2016 98800 5631.6 104431.6

4444

E. Updating and Deleting Records

• In the created tables

• UPDATE command – updating

• DELETE command – deletion

4545

Updating Records

• Update

• General Syntax:

UPDATE

SET =

WHERE ;

• records can be updated in only one table at a time

• update multiple fields that are within the same table

• WHERE clause – make the command updates specific records only

Updating Records

• Example:

• Updating colour of carID ‘5’ from ‘Red’ to ‘Grey’

UPDATE car

SET colour=’Grey’

WHERE carID =5;

Deleting Records

• Delete

• General Syntax:

DELETE FROM

WHERE ;

• remove specific records from a database table

• use WHERE clause to specify multiple records to delete multiple records at

one time

• If the search condition is omitted, all records in the table are deleted.

Deleting Records

• Example:

• deleting a single record from the CAR table

DELETE FROM car

WHERE carid= 5;

• deleting multiple records from the CAR table that contain MAKE

starting with ‘M’

DELETE FROM car

WHERE make LIKE ‘M%’;

• deleting all records from the CAR table

DELETE FROM car;

Deleting Records

• Notice:

• not allowed to delete a primary key record that has its corresponded foreign

key somewhere else in another table

• Example: delete a primary key CarId is 5 in the CAR table that has a foreign

key record that CarId is 5 in the CARSALES table

DELETE FROM car WHERE carid=5;

DELETE FROM car

ERROR at line 1:

ORA-02292: integrity constraint (SYSTEM.SYS_C005453) violated – child

record found

5050

F. Commit

• Commit

• When inserted data by issuing the INSERT command

• the changes are only saved in the local database buffer

• are not saved in the database

• until you COMMIT the transaction

• it is important to remember to COMMIT whenever you have finished inserting

values or make changes to the database values

5151

Commit

• General Syntax:

• Sample of inserting a new record with commit

Effect of Inserting New Record with Commit

Insert a new record

‘16’ with commit

All original

records are

currently in CAR

table

5252

G. Joining Multiple Tables

• Join

• database query to join multiple database tables together

• the data needed or the conditions specified come from more than one table.

• Syntax:

SELECT

FROM

WHERE =

AND

Joining Multiple Tables

• joining the CAR, CARSALES and CUSTOMER tables:

SELECT customer.cname, customer.cphone, carsales.salesdate, carsales.purchasedPrice,

carsales.stampDuty, car.make, car.model

FROM customer, carsales, car

WHERE customer.customerID = carsales.customerID

AND car.carID = carsales.carID;

• using prefix when joining tables:

• when more than one table is involved, a prefix for each attribute is

recommended to avoid ambiguity

SELECT ct.cname, ct.cphone, cs.salesdate,

cs.purchasedPrice,cs.stampDuty,

c.make, c.model

FROM customer ct, carsales cs, car c

WHERE ct.customerID = cs.customerID

AND c.carID = cs.carID;;

• The output of both example:

Joining Multiple Tables

• Subquery

• Example: retrieving the make, model and advertisedPrice of the cars that

are not sold

SELECT make, model, advertisedPrice FROM car

WHERE carid NOT IN

(SELECT carid FROM carsales);

MAKE MODEL ADVERTISEDPRICE

—————– —————– —————

Subaru Outback 33950

Lexus RX350 33500

Mini Cooper 19750

Nissan Dualis 18888

Mercedes Benz C200 16995

Jeep Grand Cherokee 21999

6 rows selected

5555

H. Aggregate Functions and Group By

• Aggregate Functions

• summarize the input table

• often used include:

• COUNT – count number of records in the input table

• SUM – calculate the sum of a numerical attribute

• MIN and MAX – find the smallest and the largest value of a certain

attribute

Aggregate Functions

• Count(*)

• Example: returning the number of records that is available from

the CAR table

SELECT COUNT(*)

FROM car;

COUNT(*)

——————

Aggregate Functions

• Example: returning the number of values of colour that is available from the CAR table

SELECT COUNT(colour) FROM car;

COUNT(COLOUR)

————-

• Example: returning the number of distinct values of colour that is available from the CAR table

SELECT COUNT(DISTINCT colour) FROM car;

COUNT(DISTINCTCOLOUR)

———————

COUNT(attribute) & COUNT(DISTINCT attribute)

Cause by a

record with

null value
15 records

include one

NULL value

Output:

Aggregate Functions

• MIN and MAX

• Example: calculate the minimum and maximum advertisedPrice

of the cars

SELECT MIN(advertisedPrice) AS “Lowest Price”, MAX(advertisedPrice) AS “Highest Price”

FROM car;

Lowest Price Highest Price

———— ————–

7490 98800

5959

Aggregate Functions and Group By

• Group By

• group an input table into a number of groups based on one or more

nominated attributes

• often used in conjunction with aggregate functions

• CAR table: group by Make

SELECT make, count(*)

FROM car

GROUP BY make;

MAKE COUNT(*)

——————– ———-

Holden 2

Lexus 1

Subaru 1

BMW 1

Mazda 1

Nissan 1

Audi 1

Volkwagen 1

Toyota 1

Honda 2

Mercedes Benz 1

Jeep 1

Mini 1

Aggregate Functions and Group By

• CAR table: grouping by Make with combination of getting only the record

groups that contain the count value greater than 1 is as follows

SELECT make, COUNT(DISTINCT model)

FROM car

GROUP BY make

HAVING COUNT(DISTINCT model)>1;

MAKE COUNT(DISTINCTMODEL)

——————– ——————–

Holden 2

Honda 2

Aggregate Functions and Group By

• multiple tables: selecting the number of sold cars for each colour

SELECT colour, COUNT(DINSTINCT customerId)

FROM car c, carsales cs

WHERE c.carid=cs.carid

GROUP BY colour;

COLOUR COUNT(DISTINCTCUSTOMERID)

———– ————————-

White 2

Grey 4

Blue 1

Black 2

Red 1

6262

Aggregate Functions

• Count vs. Sum

• COUNT – count number of records in the input table

• SUM – calculate the sum of a numerical attribute

SELECT , COUNT(*)

FROM

GROUP BY ;

SELECT , SUM (attribute3)

FROM

GROUP BY ;

Aggregate Functions

• Sum without Group By

• Example: calculate the sum of purchasedPrice

SELECT SUM(purchasedPrice) AS totalsales

FROM carsales;

TOTALSALES

———-

402983

Aggregate Functions

• Sum with Group By

• Example: calculate the sum of purchasedPrice for each make in

carsales table

SELECT c.make, SUM(cs.purchasedPrice) AS totalsales

FROM carsales cs, car c

WHERE c.CARID=cs.CARID

GROUP BY c.make;

MAKE TOTALSALES

——————– ———-

Holden 38430

BMW 197600

Mazda 13999

Audi 68200

Simple Query Retrieval

• SUM with Calculations

• Example: list all Cname, SalesDate and calculate the TotalPrice for

each customer (TotalPrice = PurchasedPrice+StampDuty )

SELECT ct.cname, cs.salesdate, SUM(purchasedPrice+StampDuty) as TotalPrice

FROM carSales cs, customer ct

WHERE cs.customerid=ct.customerid

GROUP BY ct.cname, cs.salesdate;

CNAME SALESDATE TOTALPRICE

——————– ———– ———-

Jone 14-JUN-2016 14027.52

Rex 12-DEC-2015 14558.96

Lily 13-JUL-2016 13156

6666

I. Alter Tables – Add New Fields

• Syntax:

ALTER TABLE

ADD (attribute_name data_type_declare constraints_declare);

• attribute name: referring to the new attribute that you want to add into the

existing table

• data type: defines the data type and the size of the new attribute

• constraint: defines the constraints that the new attribute might be enforced by

certain constraints

• Example: add a transmission attribute to the CAR table

ALTER TABLE car

ADD (transmission VARCHAR2(10));

Alter Tables – Modify Fields

• Syntax:

ALTER TABLE

MODIFY (attribute_name new_data_type);

• attribute name: refers to the attribute that you want to modify

• new data: defines the new data type that you want to use replacing the old one

• Example: change the data type of the transmission to CHAR with a size of

ALTER TABLE car

MODIFY (transmission CHAR(30));

Alter Tables – Drop Columns

• Syntax:

ALTER TABLE

DROP COLUMN attribute_name;

• Example: delete the attribute transmission from the CAR table

ALTER TABLE car

DROP COLUMN transmission ;

Oracle/Week-4a-StarSchema-StudentEnrollment

More on Student Enrollment Case Study (Tutorial 2)

1. Two dimensions vs. One dimension

The star schema of the case study used in Tutorial 2 (Student-Enrollment
system) is shown as follows.

Star Schema-1 (one semester-year dimension)

The SQL codes to create dimension and fact tables are as follows:

–Campus dimension

Create table campus_dim as

SELECT distinct Ocampus

FROM Offering;

–Semester_year dimension

Create table sem_year_dim as

SELECT distinct Oyear||Osem as sem_id, Oyear, Osem

FROM Offering;

–Subject Dimension

Create table subject_dim as

SELECT *

FROM subject;

–Grade Dimension

Create table grade_dim as

SELECT distinct Grade

FROM Enrollment;

–The Fact

CREATE TABLE student_enrollment_fact as

SELECT

o.Ocampus,

o.Oyear||o.Osem as sem_id,

s.Ucode,

e.Grade,

count(st.sid) as num_of_student,

sum(e.score) as Total_score

FROM subject s, enrollment e, offering o, student st

WHERE e.OID = o.OID

and s.Ucode = o.Ucode

and st.SID = e.SID

GROUP BY

o.Ocampus,

o.Oyear||o.Osem,

s.Ucode,

e.Grade;

The records in the data warehouse are as follows:

SQL> select * from campus_dim;

OCAMPUS

———

City

Main

SQL> select * from sem_year_dim;

SEM_ID OYEAR OSEM

———- ———- —-

20092 2009 2

20091 2009 1

SQL> select * from subject_dim;

UCODE UTITLE UCREDIT

———- ——————– ———-

IT001 Database 5

IT002 Java 5

IT003 SAP 10

IT004 Network 5

IT005 ASP.NET 5

SQL> select * from grade_dim;

GRADE

—–

SQL> select * from student_enrollment_fact;

OCAMPUS SEM_ID UCODE GRADE NUM_OF_STUDENT TOTAL_SCORE

——— ——— ———- —– ————– ———–

Main 20091 IT001 HD 3 253

Main 20091 IT001 D 2 149

Main 20091 IT002 C 1 65

Main 20091 IT002 N 1 47

Main 20092 IT002 D 2 151

City 20092 IT001 C 1 64

Main 20091 IT001 N 1 41

Main 20092 IT002 C 1 64

DE 20092 IT004 P 2 105

City 20091 IT003 C 1 63

City 20092 IT001 N 1 32

11 rows selected.

Look at particularly the sem_year_dim dimension, both in the SQL code and
the contents of this dimension. In the above version of the star schema, the
semester and the year information are actually combined into one dimension,
called sem_year_dim dimension.

Can we have two separate dimensions: one for semester, and the other for year?
The answer is “yes, you can”. The new star schema would look like as follows.
Because the original sem_year_dim is now split into two dimensions: sem_dim,
and year_dim, consequently, in the fact table, we will have two attributes:
semester and year, instead of just one attribute sem_id.

Star Schema-2 (two separate dimensions for semester and year)

The SQL codes to create these two new dimensions (sem_dim and year_dim) and
the fact tables are as follows:

–Semester dimension

Create table sem_dim as

SELECT distinct Osem as sem_id

FROM Offering;

–Year dimension

Create table year_dim as

SELECT distinct Oyear as year_id

FROM Offering;

–The Fact

CREATE TABLE student_enrollment_fact_2 as

SELECT

o.Ocampus,

o.Osem as sem_id,

o.Oyear as year_id,

s.Ucode,

e.Grade,

count(st.sid) as num_of_student,

sum(e.score) as Total_score

FROM subject s, enrollment e, offering o, student st

WHERE e.OID = o.OID

and s.Ucode = o.Ucode

and st.SID = e.SID

GROUP BY

o.Ocampus,

o.Osem,

o.Oyear,

s.Ucode,

e.Grade;

The contents of the two dimensions and the new fact are as follows:

SQL> select * from sem_dim;

SEM_ID

———-

SQL> select * from year_dim;

YEAR_ID

———-

2009

SQL> select * from student_enrollment_fact_2;

OCAMPUS SEM_ID YEAR_ID UCODE GR NUM_OF_STUDENT TOTAL_SCORE

——— —— ——– ———- — ————– ———–

DE 2 2009 IT004 P 2 105

Main 1 2009 IT001 N 1 41

Main 1 2009 IT001 D 2 149

Main 2 2009 IT002 C 1 64

Main 1 2009 IT002 N 1 47

Main 2 2009 IT002 D 2 151

City 2 2009 IT001 N 1 32

Main 1 2009 IT002 C 1 65

City 1 2009 IT003 C 1 63

Main 1 2009 IT001 HD 3 253

City 2 2009 IT001 C 1 64

11 rows selected.

Now compare the contents of Fact-1 and Fact-2. They are almost the same,
except that Fact-1 has one attribute from the sem_year_dim dimension, whereas

Fact-2 has two attributes, called sem_id (from semester_dim) and year_id (from
year_dim).
Conclusion: From the FACT point of view, it does not really matter whether we
have two separate dimensions or just one dimension. They are the same.
However, if two information, like semester and year, is often seen as one entity
or one piece of information, it would be easier if the two information is present
in one dimension.

2. Dimension Hierarchy (Hierarchy vs. No Hierarchy)

Suppose we have richer information about campus, as follows:

Campus Dimension
CampusID CampusName Address Suburb City State Postcode Country
CL Clayton

Campus
Wellington
Rd

Clayton Melbourne Victoria 3800 Australia

CA Caulfield
Campus

Dandenong
Rd

Caulfield
East

Melbourne Victoria 3145 Australia

PA Parkville
Campus

Royal
Parade

Parkville Melbourne Victoria 3052 Australia

SY Sydney
Campus

Opera
Boulevard

Sydney Sydney New
South
Wales

2001 Australia

MA Malaysia
Campus

Jalan
Lagoon

Bandar
Sunway

Sydney Selangor 47500 Malaysia

SA South Africa
Campus

Peter St Johanne
sburg

Johannesb
urg

Johanne
sburg

1725 South
Africa

In the star schema, we have two options:

Option 1 is to have one dimension called campusDIM containing all of these
attributes.

Option 1 (no hierarchy)

The SQL to create the Campus Dimension is shown as follows:
–Campus dimension

Create Table campus_dim As

SELECT Distinct Ocampus, <>

FROM Offering;

Option 2 is to have a hierarchy of three dimensions (e.g. campusDIM connected
to the Fact, campusDIM is also the parent of cityDIM – because campus-city
forms a hierarchy, and cityDIM becomes the parent of countryDIM).

Option 2 (with hierarchy)

The Campus dimension is still linked to the Fact, whereas the City dimension is
linked (through hierarchy) to the Campus, and the Country dimension is linked

to the City dimension. Therefore, there is a hierarchy from Campus to City, and to
Country.

Note that the information about city is pulled out from the campus, and then
create a hierarchy between campus and city. The information about country is
also pulled out from the campus and city.

The SQL to create the three dimensions in the hierarchy is as follows:

–Campus dimension

Create Table campus_dim As

SELECT Distinct Ocampus, <>

FROM <

–City dimension

Create Table city_dim As

SELECT Distinct Ocity, <>

FROM <

–Country dimension

Create Table country_dim As

SELECT Distinct Ocountry, <>

FROM <

The contents of the dimensions in the hierarchy (campus, city, and country
dimensions are as follows):

Campus Dimension
CampusID CampusName Address Suburb Postcode CityID
CL Clayton Campus Wellington Rd Clayton 3800 MEL
CA Caulfield Campus Dandenong Rd Caulfield East 3145 MEL
PA Parkville Campus Royal Parade Parkville 3052 MEL
SY Sydney Campus Opera Boulevard Sydney 2001 SYD
MA Malaysia Campus Jalan Lagoon Bandar Sunway 47500 KUL
SA South Africa Campus Peter St Johannesburg 1725 JNB

City Dimension
CityID CityName State CountryID
MEL Melbourne Victoria AU
SYD Sydney New South Wales AU
KUL Kuala Lumpur Selangor MA
JNB Johannesburg Johannesburg SA

Country Dimension
CountryID CountryName
AU Australia
MA Malaysia
SA South Africa

Note that the creation of the fact table is not affected at all. Either model (option
1 or option 2) will not affect the fact and other dimensions (e.g. grade, subject,
semester_year dimensions).

Hierarchy should use the higher detail dimension (e.g. campus) as the parent,
which is linked to the fact. The lower detail dimension (e.g. city) becomes the
child in the hierarchy. The lowest detail dimension (e.g. country) becomes the
lowest in the hierarchy.

One of the most common mistakes is the wrong ordering of the dimensions in
the hierarchy, such as in the following star schema:

An incorrect hierarchy (upside down)

Hierarchy vs No Hierarchy

Since there are two options in implementing a star schema: with and without
hierarchy, which option is more preferred?

1. One table vs. many tables
Without hierarchy, there is only one table (e.g. the campus dimension
table). With hierarchy, there are three tables (e.g. campus, city, and
country dimension tables). The consequence is in the join query
processing when producing reports.

When you want to produce a report involving the fact and campus, with
the no hierarchy option, we only need to use two tables: the campus
dimension table, and the fact table. Hence, the join query processing is
simple.

With the hierarchy option, we need to join the fact with three tables
(campus, city, and country). Consequently, the join query processing will
be more complex

2. Normalized vs. Unnormalized
With the hierarchy option, the tables are normalized, which follows the
relational model. In contrast, the non-hierarchy option, the table (e.g. the
campus dimension table) is unnormalized, which is basically 1NF with
visible replication on information which is prone to anomalies (e.g. insert,
update, and delete anomalies).

Since no update, insert, or delete operations are performed to the
dimension tables (once they have created), hence, anomalies become
irrelevant. Therefore, the unnormalized option may be preferred, also due
to a simpler join query processing.

3. Drilling down and rolling up
People often mistakenly thought that the hierarchy model is good for
drilling down. The hierarchy is not from country to city and to campus.
The correct hierarchy is from the detail to the more general, such as from
campus to city, and to country. Hence, we are not drilling down from
campus to country, because drilling down should from something that is
general (such as country) to something more detail (such as city and
campus). In other words, the hierarchy model is not about drilling down
information exploration.

How about rolling up? Rolling up is exploring information from the detail
to a more general. For example, retrieve number of students located at a
campus level, and then rolling up to the country level.

With the hierarchy option, we still need to do two queries: one for the
campus and the second for the country.

Select CampusName, Sum(Num_of_student)
From Fact F, CampusDim C
Where F.CampusID = C.CampusID
Group By CampusName;

Select CountryName, Sum(Num_of_student)
From Fact F, CampusDim C1, CityDim C2, CountryDim C3
Where F.CampusID = C1.CampusID
And C1.CityID = C2.CityID
And C2.CountryID = C3.CountryID
Group By CountryName;

Whereas with the non-hierarchy option, we still need to do two queries:

Select CampusName, Sum(Num_of_student)
From Fact F, CampusDim C
Where F.CampusID = C.CampusID
Group By CampusName;

Select Country, Sum(Num_of_student)
From Fact F, CampusDim C
Where F.CampusID = C.CampusID
Group By Country;

From the query point of view, both need two queries.
From the query processing point of view, the non-hierarchy option uses
one join operation only, because it only needs to join the fact and one
dimension.
From the conceptual point of view, the hierarchy model does not actually
offer a better roll up or drill down features.

If we insist to have a drill down, that is from country to campus, we still
need to do the two queries above, but we will do the country query first,
and then the campus query second.

Oracle/Week-4b-Determinant Dimensions

Determinant
Dimensions

The
Petrol
Price
Case
Study

1.
Snapshot
of
an
operational
database

The
following
is
a
sample
operational
database
that
keeps
the
petrol
prices
(of

different
kind
of
petrol)
of
petrol
stations
within
a
period
of
time

2.
Star
Schema

The
requirements
for
the
data
warehouse
are
to
answer
questions
related
to
(a)

average
petrol
price,
(b)
min
petrol
price,
and
(c)
max
petrol
prices.

From
the
sample
data
above,
we
choose
the
following
three
angles
to
view
the

fact
measures:
(i)
day
of
week,
(ii)
suburb,
and
(iii)
petrol
company.

From
the
Day
of
Week
point
of
view,
it
is
sensible
to
answer
the
question

something
like
which
day
is
has
the
lowest
petrol
price.
We
use
a
two-‐column

table
methodology
to
visualize
a
conceptual
model
of
the
star
schema.
Hence,
the

two-‐column
table
for
the
day
of
week
is
as
follows:

Day
of
week
Total
Petrol

Price

Num
of
Petrol

Station

Min
Petrol

Price

Max
Petrol

Price

Monday

Tuesday

…

Notes:

1. In
the
above
2-‐column
table,
the
first
column
(say
column
A,
which
is
Day

of
Week)
is
a
category.
The
second
columns
(say,
columns
B1,
B2,
B3,
and

B4)
are
the
fact
measures.

2. One
of
the
requirements
for
the
fact
measure
is
average
petrol
price.
Since,

the
fact
table
should
not
include
an
average
as
a
fact
measure,
we
include

two
attributes:
total
petrol
price,
and
number
of
petrol
stations,
instead.

The
visualization
of
the
above
two-‐column
table
for
Day
of
Week
is
rather

incomplete,
because
there
is
no
data
in
the
fact
measure
columns.
Take
a
look
at

the
Min
Petrol
Price
column
in
the
above
table.
What
kind
of
value
should
be
in

that
column?
Supposed
Monday
has
a
value
in
the
Min
Petrol
Price
column,
what

does
that
value
min?
It
does
not
specify
which
petrol
type,
whether
it
is
Unleaded

91,
or
Premium
95,
or
Premium
98,
for
example.
Therefore,
the
Min
Petrol
Price

column,
although
seems
right,
does
not
make
sense
at
all.
How
can
this
problem

be
solved?

One
solution
is
to
have
one
Min
Petrol
Price
for
each
petrol
type
(e.g.
Unleaded

91,
Premium
95,
Premium
98,
E10,
Diesel,
and
LPG).
Therefore,
we
have
6

attributes
to
represent
Min
Petrol
Prices,
namely:
Min
Unleaded
91
Price,
Min

Premium
95
Price,
Min
Premium
98
Price,
Min
E10
Price,
Min
Diesel
Price,
and

Min
LPG
Price.

Min
Petrol
Price
Day
of

week

Total

Petrol

Price

Num
of

Petrol

Station

Unleaded
P95
P98
E10
Diesel
LPG

Max

Petrol

Price

Monday

119.9c
132.9c
139.9c
114.9c
115.9c
55.7c

Tuesday

118.9c
131.9c
136.9c
114.9c
115.9c
56.9c

…

We
need
to
do
the
same
for
the
Total
Price,
and
Max
Petrol
Price;
each
of
which

should
be
divided
into
6
columns.
The
new
two-‐column
table
for
Day
of
Week
is

then
shown
as
follows:

Total
Petrol
Price
Min
Petrol
Price
Max
Petrol
Price
Day
of

Week
U
95
98
E
D
LPG

Num
of

Petrol

Station

U
95
98
E
D
LPG
U
95
98
E
D
LPG

Monday

Tuesday

The
matter
will
become
more
complicated
if
number
of
petrol
stations
that
has

Unleaded
91
is
different
from
that
of
Premium
98,
for
example.
Then
to
calculate

an
average
of
Unleaded
91
on
Monday,
and
an
average
of
Premium
98
on

Monday
cannot
be
simply
a
division
between
Tot91
and
Num
of
Petrol
Stations,

and
between
Tot98
and
Num
of
Petrol
Stations,
because
the
based
line,
which
is

Num
of
Petrol
Stations
for
different
fuel
type
is
different.
Therefore,
we
need
to

divide
the
Num
of
Petrol
Stations
column
into
6
columns
–
one
for
each
petrol

type.
As
a
result,
we
will
end
up
with
24
columns
to
store
the
fact
measures,

instead
of
just
4
columns,
as
initially
thought.

Using
the
same
principle,
we
can
have
a
two-‐column
table
for
Suburb,
and

another
one
for
Company.

Total
Petrol
Price
Num
of
Petrol

Station

Min
Petrol
Price
Max
Petrol
Price
Suburb

U
95
98
E10
D
L
U
95
98
E10
D
L
U
95
98
E10
D
L
U
95
98
E10
D
L

Box
Hill

Clayton

Total
Petrol
Price
Num
of
Petrol

Station

Min
Petrol
Price
Max
Petrol
Price
Company

U
95
98
E
D
L
U
95
98
E
D
L
U
95
98
E
D
L
U
95
98
E
D
L

Caltex

7-‐Eleven

Based
on
the
above
three
two-‐column
tables,
we
are
now
confident
to
have
the

following
star
schema:

3.
Star
Schema
version-2

Is
there
another
way
to
reduce
the
number
of
fact
measures?
The
above
star

schema
has
24
fact
measures,
because
there
are
6
different
kinds
of
petrol.
The

answer
is
yes!!!

We
can
have
a
new
dimension
called
Petrol
Type
DIM,
which
stores
the
different

kinds
of
petrol.
With
this
new
dimension,
the
fact
measure
is
reduced
to
4,
which

is
the
original
fact
measures
we
had
in
the
beginning.
Therefore,
the
new
star

schema
becomes
as
follows:

With
this
new
star
schema,
as
an
example,
we
can
retrieve
data
that
shows
the

Min
Petrol
Price
of
Premium
98
on
Monday.
Or
another
example
is
to
get
the

Average
Petrol
Price
(which
is
Total
Petrol
Price/Num
of
Petrol
Station)
of

Unleaded
Petrol
in
7-‐Eleven
petrol
station
located
at
Clayton.

Note
that
when
retrieving
data
from
a
data
warehouse,
there
is
no
obligation
that

we
must
use
all
dimensions.
In
the
Monday
Premium
98
example,
only
two

dimensions
are
used,
which
are
Day
of
Week
DIM,
and
Petrol
Type
DIM.
And
in

the
second
example,
only
three
dimensions
are
used:
Petrol
Type
DIM,
Company

DIM,
and
Suburb
DIM.
So,
to
retrieve
data
from
the
data
warehouse,
we
can
use

as
few
as
one
dimension
table;
but
we
can
also
use
all
dimension
tables.
We
are

not
restricted
to
which
dimension
we
have
to
use.

However,
this
imposes
a
new
problem.
For
example,
the
above
star
schema

allows
us
to
retrieve
the
Min
Petrol
Price
of
Monday,
which
uses
only
Day
of

Week
DIM.
This
data
retrieval
does
not
make
any
sense;
which
petrol
type
are

we
referring
to?

Based
on
this
case
study,
it
is
clear
that
the
Petrol
Type
DIM
holds
the
key
to
any

data
retrieval.
We
must
use
the
Petrol
Type
DIM
in
any
retrieval,
such
as
the
Min

Petrol
Price
of
Premium
98
on
Monday,
or
the
Average
Petrol
Price
of
Unleaded

in
7-‐Eleven
petrol
station
located
in
Clayton
suburb.

The
Petrol
Type
DIM,
in
this
example,
is
called
a
“Determinant
Dimension”
–
a

dimension
which
needs
to
be
used
in
the
data
retrieval,
because
the
fact

measures
are
determined
by
this
dimension.
In
order
to
differentiate
between

Determinant
Dimension
and
all
other
dimensions,
Determinant
Dimension
is

denoted
by
a
star.
The
fact
measures
are
also
denoted
by
a
star.
The
new
star

schema
with
a
Determinant
Dimension
is
as
follows:

Oracle/Week-4c-Determinant -vs-non-determinant

Determinant
vs.
Non-Determinant

Dimensions

1.
The
Olympic
Games
Case
Study

The
Olympic
Games
committee
maintains
an
operational
database
that
stores
all

matches,
games,
as
well
as
the
medal
winners
of
the
Olympic
Games
over
the

years
(https://www.olympic.org/).

We
would
like
to
build
a
data
warehouse
to
analyze
the
medal
counts,
by
each

country,
sport,
and
at
which
Olympic
Games.

There
are
two
possible
star
schemas.
Version-‐1
star
schema
contains
3

dimensions,
whereas
Version-‐2
star
schema
contains
4
dimensions.
For

simplicity,
a
very
minimal
number
of
attributes
are
included
in
each
dimension.

Version-‐1
star
schema
has
CountryDIM,
SportDIM,
and
OlympicGamesDIM
as
the

dimensions.
Three
fact
measures
are
included
in
the
fact
table,
namely
number

of
gold
medals,
silver
medals,
and
bronze
medals.

Version-1
Star
Schema

Version-‐2
star
schema
has
four
dimensions
–
with
an
additional
of
Medal
Type

DIM
(which
is
either
Gold,
Silver,
or
Bronze),
but
there
is
only
one
fact
measure

in
the
Fact
table,
which
is
Num
of
Medals.

Version-2
Star
Schema
with
the
Medal
Type
DIM

What
is
the
difference
between
these
two
versions
of
star
schema?

Is
Medal
Type
DIM
a
Determinant
Dimension?

In
order
to
answer
these
questions,
we
need
to
visualize
two-‐column
tables
for

each
dimension
category.

2.
Two-Column
Tables
for
Version-1
Star
Schema
(without
the

Medal
Type
DIM)

The
first
two-‐column
table
is
from
the
Country
point
of
view,
which
is
as
follows:

Country
Num
of
Gold
Num
of
Silver
Num
of
Bronze

USA
733
602
488

China
199
143
133

Australia
167
170
189

These
are
the
number
of
Gold,
Silver
and
Bronze
medals
that
these
country
got

on
all
Olympic
Games
(several
Olympic
Games)
recorded
in
the
operational

database.
Note
that
this
two-‐column
table
methodology
is
to
help
the
data

warehouse
designer
to
visualize
the
view
of
the
fact
measures
from
each

dimension.

The
second
two-‐column
table
is
from
the
Sport
point
of
view.
Assuming
that
in

the
operational
database,
it
records
20
past
Olympic
Games,
and
at
each
Olympic,

there
is
only
one
gold
for
100m
Butterfly
Men,
for
instance.

Sport
Num
of

Gold

Num
of

Silver

Num
of

Bronze

Swimming
100m
Butterfly
Men
20
20
20

Swimming
400m
Freestyle
Women
20
20
20

Swimming
4x100m
Medley
Relay
Men
20
20
20

The
third
two-‐column
table
is
from
the
Olympic
Name
point
of
view.

Olympic
Name
Num
of

Gold

Num
of

Silver

Num
of

Bronze

London
2012
302
304
356

Beijing
2008
302
303
353

Athens
2004
301
301
327

All
the
three
two-‐column
tables
above
seem
to
be
reasonably
correct.
The
first

columns
are
the
categories,
while
the
other
columns
are
the
fact
measures
which

are
numeric
and
aggregate
values.
Because
these
three
two-‐column
tables
make

sense,
we
are
confident
that
version-‐1
star
schema
is
correct.

3.
Two-Column
Tables
for
Version-2
Star
Schema
(with
the

Medal
Type
DIM)

The
two-‐column
tables
for
the
first
three
dimensions,
namely
Country,
Sport,
and

Olympic
Names
are
as
follows:

Country
Num
of
Medals

USA
1823

China
475

Australia
526

Sport
Num
of
Medals

Swimming
100m
Butterfly
Men
60

Swimming
400m
Freestyle
Women
60

Swimming
4x100m
Medley
Relay
Men
60

Olympic
Name
Num
of
Medals

London
2012
962

Beijing
2008
958

Athens
2004
929

The
question
is
whether
these
two-‐column
tables
make
sense.
If
we
look
at
the

country,
it
makes
sense
to
see
how
many
medals
Australia
has
received
in
all

Olympic
Games;
the
same
with
Sport,
and
Olympic
Names.
Finding
how
many

medals
(regardless
the
medal
types)
for
each
country,
for
each
sport,
and
for

each
Olympic
seems
to
be
reasonable.

The
fourth
two-‐column
table
for
version-‐2
star
schema
is
the
Medal
Type,
which

is
as
follows:

Medal
Type
Num
of
Medals

Gold
4115

Silver
4095

Bronze
4474

This
two-‐column
table
on
Medal
Type
seems
to
be
reasonable
too.
Hence,

version-‐2
star
schema
(with
Medal
Type
DIM)
is
correct.

So
in
conclusion,
both
star
schemas
(with
or
without
Medal
Type
DIM)
are

correct.
Now
going
back
to
the
original
question:
Is
Medal
Type
DIM
a

Determinant
Dimension?

4.
Determinant
or
Non-Determinant
Dimensions

A
“Determinant
Dimension”
is
a
dimension
that
the
fact
measure
relies
on,
and

consequently,
all
data
retrieval
from
the
data
warehouse
must
include
this

dimension.
If
the
data
retrieval
from
the
data
warehouse
does
not
include
this

determinant
dimension,
the
retrieval
result
will
not
make
sense
at
all.

In
the
previous
case
study
on
Petrol
Price,
Petrol
Type
DIM
is
a
Determinant

Dimension
(refer
to
the
star
schema
below).
Note
that
a
Determinant
Dimension

is
denoted
by
a
star.
The
fact
measures
affected
by
the
determinant
dimension

are
also
starred.

The
fact
measures:
Total
Petrol
Price,
Num
of
Petrol
Station,
Min
Petrol
Price,

and
Max
Petrol
Price,
depend
on
the
Petrol
Type,
which
is
indicated
by
the
Petrol

Type
DIM.
That
means
analyzing
the
min
petrol
price
from
the
day
of
week
point

of
view,
must
include
the
petrol
type.
Otherwise,
it
doesn’t
make
sense
to

retrieve
data
to
show
that
on
Monday
the
lowest
(min)
petrol
price
is,
for

example,
109.90cents.
As
it
does
not
indicate
which
petrol
type
it
is,
this
lowest

petrol
price
is
meaningless.
Therefore,
a
better
data
retrieval
is
to
retrieve
the

record
to
show
that
for
example,
on
Monday,
the
min
“Unleaded”
petrol
price
is

109.90cents
(e.g.
Unleaded
is
a
petrol
type
obtained
from
the
Petrol
Type
DIM).

Now
going
back
to
the
Olympic
Games
case
study
(refer
to
Version-‐2
star
schema

with
Medal
Type
DIM).
Is
Medal
Type
DIM
a
Determinant
Dimension?

The
answer
to
this
question
can
be
answered
by
another
question.
To
retrieve

the
data
from
version-‐2
star
schema,
must
we
have
the
information
from
Medal

Type
DIM?
The
answer
is
clearly
no,
because
we
can
simply
retrieve
a
record

from
the
fact
to
show
that
Australia
in
London
2012
Olympic
Games
received
10

medals
in
Swimming.
This
covers
three
dimensions,
namely
Country
(Australia),

Olympic
Name
(London
2012),
and
Sport
(Swimming).
In
this
example,
Medal

Type
DIM
is
not
involved,
and
the
information
retrieved
still
makes
sense.

So
the
answer
to
the
question
whether
Medal
Type
DIM
is
a
Determinant

Dimension
or
not,
the
answer
is
clearly
No!!

The
next
question
is:
what
is
the
difference
between
the
Olympic
Games
case

study
and
the
Petrol
Price
case
study.
Both
are
very
similar,
but
the
Olympic

Games
Medal
Type
DIM
is
not
a
determinant
dimension,
whereas
the
Petrol
Type

DIM
is
a
determinant
dimension.

The
answer
is
the
aggregate
function
used
in
the
fact
measure.
In
the
Olympic

Games
case
study,
the
fact
measure
function
is
COUNT,
which
is
count
of
medals.

The
breakdown
of
the
medals
is
gold,
silver,
and
bronze;
but
the
main
aggregate

function
of
the
fact
measure
is
number
of
medals,
which
is
a
count.
If
the
fact

measure
is
a
count,
then
the
dimension
(e.g.
Medal
Type
DIM)
is
not
a

determinant
dimension,
because
we
can
still
analyze
the
fact
measure
which
is

the
total
medals
from
other
dimensions,
without
the
medal
type
dimension.

On
the
other
hand,
the
Petrol
Price
case
study
uses
AVG,
MIN,
and
MAX
as
the

aggregation
functions.
Note
that
we
do
not
store
average
as
a
fact
measure,
but

total
price
and
number
of
stations.
These
two
fact
measures
will
be
used
to

calculate
the
average.
Although
the
average
is
not
explicitly
stored
in
the
fact,

implicitly,
the
total
price
and
number
of
stations
represent
the
average.
If
the

aggregate
function
to
calculate
the
fact
measures
is
not
COUNT,
then
a

determinant
dimension
is
needed.
In
this
case,
Petrol
Type
DIM
is
hence
a

determinant
dimension,
because
Min
Petrol
Price,
for
example,
does
not
have

any
meaning
without
petrol
type.

5.
Version-1
(without
Medal
Type
DIM)
vs.
version-2
(with

Medal
Type
DIM)

As
both
versions
in
the
Olympic
Games
case
study
are
correct,
let’s
compare
and

contract
these
two
versions.
In
order
to
do
this,
let’s
have
a
look
at
the
records
in

the
respective
fact
tables.

The
fact
table
for
version-‐1
star
schema
(without
Medal
Type
DIM)
has
6

attributes:
three
from
the
dimensions,
and
the
other
three
for
the
fact
measures.

The
contents
of
the
fact
table
are
as
follows:

Fact
(version-1
star
schema)

Country
Sport
Olympic
Name
Num
of

Gold

Num
of

Silver

Num
of

Bronze

USA
Swimming
London
2012
16
9
6

China
Swimming
London
2012
5
1
4

Australia
Swimming
London
2012
1
6
3

The
fact
table
for
version-‐2
star
schema
(with
Medal
Type
DIM)
consists
of
5

columns:
4
from
the
dimension,
but
only
one
fact
measure.

Fact
(version-2
star
schema)

Country
Sport
Olympic
Name
Medal
Type
Num
of
Medals

USA
Swimming
London
2012
Gold
16

USA
Swimming
London
2012
Silver
9

USA
Swimming
London
2012
Bronze
6

China
Swimming
London
2012
Gold
5

China
Swimming
London
2012
Silver
1

China
Swimming
London
2012
Bronze
4

Australia
Swimming
London
2012
Gold
1

Australia
Swimming
London
2012
Silver
6

Australia
Swimming
London
2012
Bronze
3

From
the
storage
point
of
view,
it
is
clear
that
version-‐1
is
the
winner.
It
has
only

3
records,
whereas
in
version-‐2,
the
same
information
is
represented
in
9

records.

From
the
modeling
point
of
view,
some
may
prefer
version-‐2,
because
the
model

is
concise
and
more
compact.

When
the
number
of
fact
measure
is
reasonably
large
(like
in
the
Petrol
Price

case
study),
the
star
schema
with
a
determinant
dimension
looks
very
slim
and

compact
–
hence,
it
is
easy
to
understand.
But
consequently,
the
storage

requirement
increases
as
well.

In
contrast,
with
many
different
petrol
types,
if
the
star
schema
does
not
use
a

determinant
dimension,
the
number
of
attributes
in
the
fact
will
dramatically

increase,
and
the
schema
looks
more
complex
and
crowded;
but
the
storage
cost

is
lower.

Oracle/Week-5-AVG-in-Fact

Page 1/5

Average in the Fact?

Should we store an “average” fact measure in the fact table?
No!!!!!!!

Consider the following example. Suppose we have the following 16 records as
our data source in the operational database. Note that there are 9 Semester One
and 7 Semester Two records, respectively. Out of the 8 Database unit recodes, 6
of them are Semester One, and 2 of them are Semester Two.

An Operational Database
Ucode Utitle Semester Sfname Score

IT001 Database 1 Mirriam 81
IT001 Database 1 Allan 41
IT001 Database 1 Ben 74
IT001 Database 1 Kate 85
IT001 Database 1 Larry 87
IT001 Database 1 Leonard 75
IT001 Database 2 Juan 64
IT001 Database 2 Andy 32
IT002 Java 1 Ally 65
IT002 Java 1 Menson 47
IT002 Java 2 Mirriam 78
IT002 Java 2 Ben 73
IT002 Java 2 Larry 64
IT003 SAP 1 Ally 63
IT004 Network 2 Juan 53
IT004 Network 2 Menson 52

The star schema of the above operational database contains one fact and two
dimensions. The dimensions are: Subject and Semester (One or Two); and the
fact measure is Average Score.

The fact table aggregates these score records based on their dimensions, which
are subject and semester. If we store Average Score in the fact table, this is how
the fact table will look like:

Page 2/5

Fact Table
UCode Semester Average_Score

IT001 1 73.833

IT001 2 48

IT002 1 56

IT002 2 71.667

IT003 1 63

IT004 2 52.5

The dimension tables look like as follows:

SubjectDIM Table
UCode UTitle

IT001 Database
IT002 Java
IT003 SAP

IT004 Network

SemesterDIM Table
Semester

1
2

Looking at the Fact Table, the average score for the unit Database in Semester
One is 73.833 (average of the first 6 score records); the average score for the unit
Database in Semester Two is 48 ((64 + 32)/2).

Is this fact table correct?
It looks correct. But actually it is incorrect.

For example, if we want to query the fact table to find out what is the average
score of the Database unit, by looking at the above fact table, the answer would
be (73.833+48)/2= 60.9165.

The SQL to query the Fact Table is as follows:

Select Avg(Average_Score)
From FactTable
Where UCode = ‘IT001’;

Is this correct? No.

In the Operational Database, there are 8 records for Database unit in Semester
One and Two (see the first eight records in the operational database). If we sum
all the score of these eight records and divided by eight records, the result will be
539/8=67.375; not 60.9165.

Let’s do further comparisons:

Page 3/5

The average score for Java unit in Semester One and Two using the above fact is
(56+71.667)/2=63.833. The actual average score for Java unit in Semester One
and Two is not 63.833, but 65.4 (see the next 5 records in the above score list,
and sum these scores and then divide by 5, 327/5). So again, the above fact table,
which stores the average score, will not produce correct results.

Ok, now let’s calculate further. The average score for Semester One using the
above fact is (73.833+56+63)/3=64.278. In the above score list records, there are
nine Semester One records, and the average is in fact 68.667.

For Semester Two, using the above fact the average scare for Semester Two is
(48+71.667+52.5)/3=57.389; whereas the actual average score for the seven
Semester Two records is 59.4286.

So, storing average as a fact measurement is not a good idea.
How do we solve the above problems?

In the fact table, we should store the “total score” and “number of students” in
each aggregate group. Hence, the fact table should look like this.

Fact Table 2
UCode Semester Total_Score NumberofStudents

IT001 1 443 6

IT001 2 96 2

IT002 1 112 2

IT002 2 215 3

IT003 1 63 1

IT004 2 105 2

Note that the dimension tables remain unchanged:

SubjectDIM Table
UCode UTitle

IT001 Database
IT002 Java
IT003 SAP

IT004 Network

SemesterDIM Table
Semester

1
2

Using the correct fact table above, it is easy to calculate the average score of
Database unit, which is (443+96)/(6+2)=67.375.

The SQL to query the Correct Fact Table is as follows:

Page 4/5

Select Sum(Total_Score)/Sum(NumberofStudents)
From FactTable2
Where UCode = ‘IT001’;

The correct star schema is then as follows:

Conclusion

The problem of AVG in the Fact is known as the “Average of an Average”
problem. This problem is well known in Mathematics. Average of an average will
simply produce an incorrect average result (almost all the time). Hence, it is not
desirable to have an average measure in the fact – unless the analysis ALWAYS
uses all the dimensions.

How about Min or Max in the Fact? Can we do it? – Yes we can.
Because Max of Max is always a global max, and Min of Min is always a global
min. For example, using the above sample data, assume we have Max_Score and
Min_Score in the Fact, as follows:

Fact Table 3
Ucode Semester Min_Score Max_Score

IT001 1 41 87

IT001 2 32 64

IT002 1 47 65

IT002 2 64 78

IT003 1 63 63

IT004 2 52 53

SubjectDIM Table
UCode UTitle

IT001 Database
IT002 Java
IT003 SAP

IT004 Network

Page 5/5

SemesterDIM Table
Semester

1
2

(*Note: the dimension tables are unchanged)

Assuming we want to get the Max_Score of IT001, then the max of {87, 64} will
produce 87, and 87 is the maximum score of IT001, because 87 is the max in
semester 1, which is greater than any max of IT001 (e.g. in semester 2). In other
words, “Max of Max” is correct.

The SQL to retrieve the maximum score of IT001 is as follows:

Select Max(Max_Score)
From FactTable3
Where UCode = ‘IT001’;

The same applies to “Min of Min”. If we want to get the minimum score of IT001,
the result will be 32, which is the minimum between 41 and 32.

Select Min(Min_Score)
From FactTable3
Where UCode = ‘IT001’;

We certainly don’t want to mix between min and max. For example, retrieving
the minimum of Max_Score would be meaningless; the same as retrieving the
maximum of Min_Score.

As a final conclusion:

• Average in the fact is not desirable, although technically it satisfies the
two criteria of the fact (e.g. must be a numerical and aggregate value)

• Min and Max in the fact can still be used, since min_score and max_score
are valid fact measures (e.g. they are numerical and aggregated values)

• In general, count and sum are more common. Count is “number of”, and
Sum is “total of”.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Oracle Advanced person Needed ”

Get high-quality paper

NEW! AI matching with writer

Order an Essay Now & Get These Features For Free:

Turnitin Report

Formatting

Title Page

Citation

Outline

Place an Order

Type of paper

Academic level

Deadline

Pages (275 words)

$0.00