Provide a reflection of at least 500 words (or 2 pages double spaced) of how the knowledge, skills, or theories of this course have been applied or could be applied, in a practical manner to your current work environment. If you are not currently working, share times when you have or could observe these theories and knowledge could be applied to an employment opportunity in your field of study.
Requirements:
Provide a 500 word (or 2 pages double spaced) minimum reflection.
Use of proper APA formatting and citations. If supporting evidence from outside resources is used those must be properly cited.
Share a personal connection that identifies specific knowledge and theories from this course.
Demonstrate a connection to your current work environment. If you are not employed, demonstrate a connection to your desired work environment.
You should NOT provide an overview of the assignments assigned in the course. The assignment asks that you reflect how the knowledge and skills obtained through meeting course objectives were applied or could be applied in the workplace.
www.circuitmix.com
www.circuitmix.com
FUNDAMENTALS
OF DATABASE
MANAGEMENT
SYSTEMS
Second Edition
MARK L. GILLENSON
Fogelman College of Business and Economics
University of Memphis
John Wiley & Sons, Inc.
www.circuitmix.com
CREDITS
VP & PUBLISHER Don Fowley
EDITOR Beth Lang Golub
EDITORIAL ASSISTANT Elizabeth Mills
MARKETING MANAGER Christopher Ruel
DESIGNER James O’Shea
SENIOR PRODUCTION MANAGER Janis Soo
SENIOR PRODUCTION EDITOR Joyce Poh
This book was set in 10/12 TimesNewRoman by LaserWords and printed and bound by RR Donnelley. The
cover was printed by RR Donnelley.
This book is printed on acid free paper.
Founded in 1807, John Wiley & Sons, Inc. has been a valued source of knowledge and understanding for
more than 200 years, helping people around the world meet their needs and fulfill their aspirations. Our
company is built on a foundation of principles that include responsibility to the communities we serve and
where we live and work. In 2008, we launched a Corporate Citizenship Initiative, a global effort to address
the environmental, social, economic, and ethical challenges we face in our business. Among the issues we are
addressing are carbon impact, paper specifications and procurement, ethical conduct within our business and
among our vendors, and community and charitable support. For more information, please visit our website:
www.wiley.com/go/citizenship.
Copyright © 2012, 2005 John Wiley & Sons, Inc. All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical,
photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976
United States Copyright Act, without either the prior written permission of the Publisher, or authorization
through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc. 222 Rosewood
Drive, Danvers, MA 01923, website www.copyright.com. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030-5774, (201)748-6011, fax (201)748-6008, website http://www.wiley.com/go/permissions.
Evaluation copies are provided to qualified academics and professionals for review purposes only, for use in
their courses during the next academic year. These copies are licensed and may not be sold or transferred to a
third party. Upon completion of the review period, please return the evaluation copy to Wiley. Return
instructions and a free of charge return mailing label are available at www.wiley.com/go/returnlabel. If you
have chosen to adopt this textbook for use in your course, please accept this book as your complimentary
desk copy. Outside of the United States, please contact your local sales representative.
Library of Congress Cataloging-in-Publication Data
Gillenson, Mark L.
Fundamentals of database management systems / Mark L. Gillenson.—2nd ed.
p. cm.
Includes index.
ISBN 978-0-470-62470-8 (pbk.)
1. Database management. I. Title.
QA76.9.D3G5225 2011
005.74—dc23
2011039274
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
www.circuitmix.com
http://www.wiley.com/go/citizenship
http://www.copyright.com
http://www.wiley.com/go/permissions
http://www.wiley.com/go/returnlabel
OTHER JOHN WILEY & SONS, INC. DATABASE BOOKS
BY MARK L. GILLENSON
Strategic Planning, Systems Analysis, and Database Design
(with Robert Goldberg), 1984
DATABASE Step-by-Step
1st edition, 1985
2nd edition, 1990
www.circuitmix.com
To my mother Sunny’s memory
and to my favorite mother-in-law, Moo
www.circuitmix.com
BRIEF CONTENTS
Preface xiii
About The Author xvii
CHAPTER 1 DATA: THE NEW CORPORATE RESOURCE 1
CHAPTER 2 DATA MODELING 19
CHAPTER 3 THE DATABASE MANAGEMENT SYSTEM CONCEPT 41
CHAPTER 4 RELATIONAL DATA RETRIEVAL: SQL 67
CHAPTER 5 THE RELATIONAL DATABASE MODEL: INTRODUCTION 105
CHAPTER 6 THE RELATIONAL DATABASE MODEL: ADDITIONAL CONCEPTS 137
CHAPTER 7 LOGICAL DATABASE DESIGN 157
CHAPTER 8 PHYSICAL DATABASE DESIGN 199
CHAPTER 9 OBJECT-ORIENTED DATABASE MANAGEMENT 247
CHAPTER 10 DATA ADMINISTRATION, DATABASE ADMINISTRATION, AND DATA
DICTIONARIES 269
CHAPTER 11 DATABASE CONTROL ISSUES: SECURITY, BACKUP AND RECOVERY,
CONCURRENCY 291
CHAPTER 12 CLIENT/SERVER DATABASE AND DISTRIBUTED DATABASE 315
CHAPTER 13 THE DATA WAREHOUSE 335
CHAPTER 14 DATABASES AND THE INTERNET 365
Index 385
www.circuitmix.com
www.circuitmix.com
CONTENTS
Preface xiii
About The Author xvii
CHAPTER 1 DATA: THE NEW CORPORATE RESOURCE 1
Introduction 2
The History of Data 2
The Origins of Data 2
Data Through the Ages 5
Early Data Problems Spawn Calculating Devices 7
Swamped with Data 8
Modern Data Storage Media 9
Data in Today’s Information Systems Environment 12
Using Data for Competitive Advantage 12
Problems in Storing and Accessing Data 12
Data as a Corporate Resource 13
The Database Environment 14
Summary 15
CHAPTER 2 DATA MODELING 19
Introduction 20
Binary Relationships 20
What is a Binary Relationship? 20
Cardinality 23
Modality 24
More About Many-to-Many Relationships 25
Unary Relationships 28
One-to-One Unary Relationship 28
One-to-Many Unary Relationship 29
Many-to-Many Unary Relationship 29
Ternary Relationships 31
Example: The General Hardware Company 31
Example: Good Reading Book Stores 34
Example: World Music Association 35
Example: Lucky Rent-A-Car 36
Summary 37
www.circuitmix.com
viii Contents
CHAPTER 3 THE DATABASE MANAGEMENT SYSTEM CONCEPT 41
Introduction 42
Data Before Database Management 43
Records and Files 43
Basic Concepts in Storing and Retrieving Data 46
The Database Concept 48
Data as a Manageable Resource 48
Data Integration and Data Redundancy 49
Multiple Relationships 56
Data Control Issues 58
Data Independence 60
DBMS Approaches 60
Summary 63
CHAPTER 4 RELATIONAL DATA RETRIEVAL: SQL 67
Introduction 68
Data Retrieval with the SQL SELECT Command 68
Introduction to the SQL SELECT Command 68
Basic Functions 70
Built-In Functions 81
Grouping Rows 83
The Join 85
Subqueries 86
A Strategy for Writing SQL SELECT Commands 89
Example: Good Reading Book Stores 90
Example: World Music Association 92
Example: Lucky Rent-A-Car 95
Relational Query Optimizer 97
Relational DBMS Performance 97
Relational Query Optimizer Concepts 97
Summary 99
CHAPTER 5 THE RELATIONAL DATABASE MODEL: INTRODUCTION 105
Introduction 106
The Relational Database Concept 106
Relational Terminology 106
Primary and Candidate Keys 109
Foreign Keys and Binary Relationships 111
Data Retrieval from a Relational Database 124
Extracting Data from a Relation 124
The Relational Select Operator 125
The Relational Project Operator 125
Combination of the Relational Select and Project Operators 126
Extracting Data Across Multiple Relations: Data Integration 127
Example: Good Reading Book Stores 129
Example: World Music Association 130
Example: Lucky Rent-A-Car 132
Summary 132
www.circuitmix.com
Contents ix
CHAPTER 6 THE RELATIONAL DATABASE MODEL: ADDITIONAL CONCEPTS 137
Introduction 138
Relational Structures for Unary and Ternary Relationships 139
Unary One-to-Many Relationships 139
Unary Many-to-Many Relationships 143
Ternary Relationships 146
Referential Integrity 150
The Referential Integrity Concept 150
Three Delete Rules 152
Summary 153
CHAPTER 7 LOGICAL DATABASE DESIGN 157
Introduction 158
Converting E-R Diagrams into Relational Tables 158
Introduction 158
Converting a Simple Entity 158
Converting Entities in Binary Relationships 160
Converting Entities in Unary Relationships 164
Converting Entities in Ternary Relationships 166
Designing the General Hardware Co. Database 166
Designing the Good Reading Bookstores Database 170
Designing the World Music Association Database 171
Designing the Lucky Rent-A-Car Database 173
The Data Normalization Process 174
Introduction to the Data Normalization Technique 175
Steps in the Data Normalization Process 177
Example: General Hardware Co. 185
Example: Good Reading Bookstores 186
Example: World Music Association 188
Example: Lucky Rent-A-Car 188
Testing Tables Converted from E-R Diagrams with Data Normalization 189
Building the Data Structure with SQL 191
Manipulating the Data with SQL 192
Summary 193
CHAPTER 8 PHYSICAL DATABASE DESIGN 199
Introduction 200
Disk Storage 202
The Need for Disk Storage 202
How Disk Storage Works 203
File Organizations and Access Methods 207
The Goal: Locating a Record 207
The Index 207
Hashed Files 215
Inputs to Physical Database Design 218
The Tables Produced by the Logical Database Design Process 219
Business Environment Requirements 219
Data Characteristics 219
www.circuitmix.com
x Contents
Application Characteristics 220
Operational Requirements: Data Security, Backup, and Recovery 220
Physical Database Design Techniques 221
Adding External Features 221
Reorganizing Stored Data 224
Splitting a Table into Multiple Tables 226
Changing Attributes in a Table 227
Adding Attributes to a Table 228
Combining Tables 230
Adding New Tables 232
Example: Good Reading Book Stores 233
Example: World Music Association 234
Example: Lucky Rent-A-Car 235
Summary 237
CHAPTER 9 OBJECT-ORIENTED DATABASE MANAGEMENT 247
Introduction 248
Terminology 250
Complex Relationships 251
Generalization 251
Inheritance of Attributes 253
Operations, Inheritance of Operations, and Polymorphism 254
Aggregation 255
The General Hardware Co. Class Diagram 256
The Good Reading Bookstores Class Diagram 256
The World Music Association Class Diagram 259
The Lucky Rent-A-Vehicle Class Diagram 260
Encapsulation 260
Abstract Data Types 262
Object/Relational Database 263
Summary 264
CHAPTER 10 DATA ADMINISTRATION, DATABASE ADMINISTRATION, AND DATA
DICTIONARIES 269
Introduction 270
The Advantages of Data and Database Administration 271
Data as a Shared Corporate Resource 271
Efficiency in Job Specialization 272
Operational Management of Data 273
Managing Externally Acquired Databases 273
Managing Data in the Decentralized Environment 274
The Responsibilities of Data Administration 274
Data Coordination 274
Data Planning 275
Data Standards 275
Liaison to Systems Analysts and Programmers 276
Training 276
Arbitration of Disputes and Usage Authorization 277
Documentation and Publicity 277
www.circuitmix.com
Contents xi
Data’s Competitive Advantage 277
The Responsibilities of Database Administration 278
DBMS Performance Monitoring 278
DBMS Troubleshooting 278
DBMS Usage and Security Monitoring 279
Data Dictionary Operations 279
DBMS Data and Software Maintenance 280
Database Design 280
Data Dictionaries 281
Introduction 281
A Simple Example of Metadata 282
Passive and Active Data Dictionaries 284
Relational DBMS Catalogs 287
Data Repositories 287
Summary 287
CHAPTER 11 DATABASE CONTROL ISSUES: SECURITY, BACKUP AND RECOVERY,
CONCURRENCY 291
Introduction 292
Data Security 293
The Importance of Data Security 293
Types of Data Security Breaches 294
Methods of Breaching Data Security 294
Types of Data Security Measures 296
Backup and Recovery 303
The Importance of Backup and Recovery 303
Backup Copies and Journals 303
Forward Recovery 304
Backward Recovery 305
Duplicate or ‘‘Mirrored’’ Databases 306
Disaster Recovery 306
Concurrency Control 308
The Importance of Concurrency Control 308
The Lost Update Problem 308
Locks and Deadlock 309
Versioning 310
Summary 311
CHAPTER 12 CLIENT/SERVER DATABASE AND DISTRIBUTED DATABASE 315
Introduction 316
Client/Server Databases 316
Distributed Database 321
The Distributed Database Concept 321
Concurrency Control in Distributed Databases 325
Distributed Joins 327
Partitioning or Fragmentation 329
Distributed Directory Management 330
Distributed DBMSs: Advantages and Disadvantages 331
Summary 332
www.circuitmix.com
xii Contents
CHAPTER 13 THE DATA WAREHOUSE 335
Introduction 336
The Data Warehouse Concept 338
The Data is Subject Oriented 338
The Data is Integrated 339
The Data is Non-Volatile 339
The Data is Time Variant 339
The Data Must Be High Quality 340
The Data May Be Aggregated 340
The Data is Often Denormalized 340
The Data is Not Necessarily Absolutely Current 341
Types of Data Warehouses 341
The Enterprise Data Warehouse (EDW) 342
The Data Mart (DM) 342
Which to Choose: The EDW, the DM, or Both? 342
Designing a Data Warehouse 343
Introduction 343
General Hardware Co. Data Warehouse 344
Good Reading Bookstores Data Warehouse 348
Lucky Rent-A-Car Data Warehouse 350
What About a World Music Association Data Warehouse? 351
Building a Data Warehouse 352
Introduction 352
Data Extraction 352
Data Cleaning 354
Data Transformation 356
Data Loading 356
Using a Data Warehouse 357
On-Line Analytic Processing 357
Data Mining 357
Administering a Data Warehouse 360
Challenges in Data Warehousing 361
Summary 362
CHAPTER 14 DATABASES AND THE INTERNET 365
Introduction 366
Database Connectivity Issues 367
Expanded Set of Data Types 373
Database Control Issues 374
Performance 374
Availability 375
Scalability 376
Security and Privacy 376
Data Extraction into XML 379
Summary 381
INDEX 385
www.circuitmix.com
PREFACE
PURPOSE OF THIS BOOK
A course in database management has become well established as a required
course in both undergraduate and graduate management information systems degree
programs. This is as it should be, considering the central position of the database
field in the information systems environment. Indeed, a solid understanding of the
fundamentals of database management is crucial for success in the information
systems field. An IS professional should be able to talk to the users in a business
setting, ask the right questions about the nature of their entities, their attributes, and
the relationships among them, and quickly decide whether their existing data and
database designs are properly structured or not. An IS professional should be able
to design new databases with confidence that they will serve their owners and users
well. An IS professional should be able to guide a company in the best use of the
various database-related technologies.
Over the years, at the same time that database management has increased
in importance, it has also increased tremendously in breadth. In addition to such
fundamental topics as data modeling, relational database concepts, logical and
physical database design, and SQL, a basic set of database topics today includes
object-oriented databases, data administration, data security, distributed databases,
data warehousing, and Web databases, among others. The dilemma faced by
database instructors and by database books is to cover as much of this material as
is reasonably possible so that students will come away with a solid background
in the fundamentals without being overwhelmed by the tremendous breadth and
depth of the field. Exposure to too much material in too short a time at the expense
of developing a sound foundation is of no value to anyone. We believe that a
one-semester course in database management should provide a firm grounding in
the fundamentals of databases and provide a solid survey of the major database
subfields, while deliberately not being encyclopedic in its coverage. With these
goals in mind, this book:
■ Is designed to be a carefully and clearly written, friendly, narrative introduction
to the subject of database management that can reasonably be completed in a
one-semester course.
■ Provides a clear exposition of the fundamentals of database management while
at the same time presentng a broad survey of all of the major topics of the field.
www.circuitmix.com
xiv Preface
It is an applied book of important basic concepts and practical material that can
be used immediately in business.
■ Makes extensive use of examples. Four major examples are used throughout the
text where appropriate, plus two minicases that are included among the chapter
exercises at the end of every chapter. Having multiple examples solidifies the
material and helps the student not miss the point because of the peculiarities of a
particular example.
■ Starts with the basics of data and file structures and then builds up in a progressive,
step-by-step way through the distinguishing characteristics of database.
■ Has a story and accompanying photograph of a real company’s real use of
database management at the beginning of every chapter. This is both for
motivational purposes and to give the book a more practical, real-world feel.
■ Includes a chapter on SQL that concentrates on the data-retrieval aspect and
applies to essentially every relational database product on the market.
NEW IN THE SECOND EDITION
It is important to reflect advances in the database management systems environment
in this book as the world of information systems continues to progress. Furthermore,
we want to continue adding materials for the benefit of the students who use this
book. Thus we have made the following changes to the second edition.
■ A ‘‘mobile chapter’’ on data retrieval with SQL that can be covered early in the
book, where it appears as Chapter 4, or later in the book after the chapters on
database design. This is introduced in response to a large reviewer survey that
indicated a roughly 50–50 split between instructors who like to introduce data
retrieval with SQL early in their courses to engage their students in hands-on
exercises as soon as possible to pique their interest and instructors who feel that
data retrieval with SQL should come after database design.
■ Internet-accessible databases that match the four main examples running through
the book’s chapters for hands-on student practice in data retrieval with SQL, plus
additional hands-on material.
■ The conversion of the book’s entity-relationship diagrams to today’s standard
practice format that is compatible with MS Visio, among other software tools.
■ The addition of examples for creating and updating databases using SQL.
■ The addition of ‘‘It’s Your Turn’’ exercises and the new formatting of the
‘‘Concepts in Action’’ real example vignettes.
■ The merging of the material about disk devices and access methods and file
organizations into the chapter on physical database design, to create a complete
package on this subject in one chapter.
ORGANIZATION OF THIS BOOK
The book effectively divides into two halves. After the introduction in Chapter 1,
Chapters 2 lays the foundation of data modeling. Chapter 3 describes the fundamental
concepts of databases and contrasts them with ordinary files. Importantly, this is
done separately from and prior to the discussion of relational databases. Chapter 4 is
the ‘‘mobile chapter’’ on data retrieval with SQL that can be covered as Chapter 4
www.circuitmix.com
Preface xv
or can be covered after the chapters on database design. Chapters 5 and 6 explain
the major concepts of relational databases. In turn, this is done separately from and
prior to the discussion of logical database design in Chapter 7 and physical database
design (yes, a whole chapter on this subject) in Chapter 8. Separating out general
database concepts from relational database concepts from relational database design
serves to bring the student along gradually and deliberately with the goal of a solid
understanding at the end.
Then, in the second half of the book, each chapter describes one or more of
the major database subfields. These latter chapters are generally independent and
for the most part can be approached in any order. They include Chapter 9 on object-
oriented database, Chapter 10 on data administration, database administration, and
data dictionaries, Chapter 11 on security, backup and recovery, and concurrency,
Chapter 12 on client/server database and distributed database, Chapter 13 on the
data warehouse, and Chapter 14 on database and the Internet.
SUPPLEMENTS
(www.wiley.com/college/gillenson)
The Web site includes several resources designed to aid the learning process:
■ PowerPoint slides for each chapter that instructors can use as is or tailor as they
wish and that students can use both to take notes on in the classroom and to help
in studying at home.
■ Quizzes for each chapter that students can take on their own to test their
knowledge.
■ For instructors: The Instructors’ Manual, written by the author. For each chapter
it includes a guide to presenting the chapter, discussion stimulation points, and
answers to every question, exercise, and minicase at the end of each chapter.
■ For instructors: The Test Bank, written by the author. Questions are organized
by chapter and are designed to test the level of understanding of the chapter’s
concepts, as well as such basic knowledge as the definitions of key terms presented
in the chapter.
Database Software
Now available to educational institutions adopting this Wiley textbook is a free
3-year membership to the MSDN Academic Alliance. The MSDN AA is designed
to provide the easiest and most inexpensive way for academic departments to make
the latest Microsoft software available in labs, classrooms, and on student and
instructor PCs.
Database software, including Access and SQL Server, is available through
this Wiley and Microsoft publishing partnership, free of charge with the adoption
of Gillenson’s textbook. (Note that schools that have already taken advantage of
this opportunity through Wiley are not eligible again, and Wiley cannot offer free
membership renewals.) Each copy of the software is the full version with no time
limitation, and can be used indefinitely for educational purposes. Contact your
Wiley sales representative for details. For more information about the MSDN AA
program, go to http://msdn.microsoft.com/academic.
www.circuitmix.com
xvi Preface
ACKNOWLEDGMENTS
I would like to thank the reviewers of the manuscript for their time, their efforts,
and their insightful comments:
Paul Bergstein University of Massachusetts Dartmouth
Susan Bickford Tallahassee Community College
Jim Q. Chen St. Cloud State University
Shamsul Chowdhury Roosevelt University
Deloy Cole Greenville College
Terrence Fries Indiana University of Pennsylvania
Dick Grant Seminole Community College
Betsy Headrick Chattanooga State Community College
Shamim Khan Columbus State University
Barbara Klein University of Michigan—Dearborn
Karl Konsdorf Sinclair Community College
Yunkai Liu Gannon University
Margaret McClintock Mississippi University for Women
Thomas Mertz Kansas State University
Keith R. Nelms Piedmont College
Bob Nielson Dixie State College
Rachida F. Parks Pennsylvania State University
Lara Preiser-Houy California State University Pomona
Il-Yeol Song Drexel University
Brian West Univeristy of Louisiana at Lafayette
R. Alan Whitehurst Southern Virginia University
Diana Wolfe Oklahoma State University at Oklahoma City
Hong Zhou Saint Joseph College
In addition, I would like to acknowledge and thank several people who read
and provided helpful comments on specific chapters and portions of the manuscript:
Mark Cooper of FedEx Corp., Satish Puranam of the University of Memphis, David
Tegarden of Virginia Tech, and Trent Sanders.
I would also like to thank the people and companies who agreed to participate
in the Concepts in Action vignettes that appear at the beginning of each chapter and,
in some cases, which appear later in the chapters. I strongly believe that business
students should not have to study subjects like database management in a vacuum.
Rather, they should be regularly reminded of the real ways in which real companies
put these concepts and techniques to use. Whether the products involved are power
tools, auto parts, toys, or books, it is important always to remember that database
management supports businesses in which millions and billions of dollars are at stake
every year. Thus, the people and companies who participated in these vignettes have
significantly added to the educational experience that the students using this book.
Finally, I would like to thank the crew at John Wiley & Sons for their
continuous support and professionalism, in particular Rachael Leblond, my editor
for this edition of the book, and Beth Lang Golub, my long-time editor and friend,
and her excellent staff.
Mark L. Gillenson
Memphis, TN
April 2011
www.circuitmix.com
ABOUT THE AUTHOR
Dr. Mark L. Gillenson has been practicing, researching, teaching, writing, and,
most importantly, thinking, about data and database management for over 35
years, split between working for the IBM Corporation and being a professor in the
academic world. While working for IBM he designed databases for IBM’s corporate
headquarters, consulted on database issues for some of IBM’s largest customers,
taught database management at the prestigious IBM Systems Research Institute in
New York, and conducted database seminars throughout the United States and on
four continents. In one such seminar, he taught introduction to database to an IBM
development group that went on to develop one of IBM’s first relational database
management system products, SQL/DS.
Dr. Gillenson conducted some of the earliest studies on data and database
administration and has written extensively about that subject as well as about
database design. He is an associate editor of the Journal of Database Management,
with which he has been associated since its inception. This is his third book on
database management, all published by John Wiley & Sons, Inc. Dr. Gillenson is
currently a professor of MIS in the Fogelman College of Business and Economics of
The University of Memphis. His degrees are from Rensselaer Polytechnic Institute
and The Ohio State University.
Oh, and speaking of interesting kinds of data, as a graduate student
Dr. Gillenson invented the world’s first computerized facial compositor and
codeveloped an early computer graphics system that, among other things, was
used to produce some of the special effects in the first Star Wars movie.
www.circuitmix.com
www.circuitmix.com
C H A P T E R 1
DATA: THE NEW
CORPORATE RESOURCE
T he development of database management systems, as well as the development of
modern computers, came about as a result of society’s recognition of the crucial
importance of storing, managing, and retrieving its rapidly expanding volumes of business
data. To understand how far we have come in this regard, it is important to know where
we began and how the concept of managing data has developed. This chapter begins
with the historical background of the storage and uses of data and then continues with a
discussion of the importance of data to the modern corporation.
OBJECTIVES
■ Explain why humankind’s interest in data dates back to ancient times.
■ Describe how data needs have historically driven many information technology
developments.
■ Describe the evolution of data storage media during the last century.
■ Relate the idea of data as a corporate resource that can be used to gain a
competitive advantage to the development of the database management systems
environment.
CHAPTER OUTLINE
Introduction
The History of Data
The Origins of Data
Data Through the Ages
Early Data Problems Spawn
Calculating Devices
Swamped with Data
Modern Data Storage Media
Data in Today’s Information Systems
Environment
Using Data for Competitive
Advantage
Problems in Storing and
Accessing Data
Data as a Corporate Resource
The Database Environment
Summary
www.circuitmix.com
2 C h a p t e r 1 Data: The New Corporate Resource
INTRODUCTION
What a fascinating world we live in today! Technological advances are all around
us in virtually every aspect of our daily lives. From cellular telephones to satellite
television to advanced aircraft to modern medicine to computers—especially
computers—high tech is with us wherever we look. Businesses of every description
and size rely on computers and the information systems they support to a degree that
would have been unimaginable just a few short years ago. Businesses routinely use
automated manufacturing and inventory-control techniques, automated financial
transaction procedures, and high-tech marketing tools. As consumers, we take
for granted being able to call our banks, insurance companies, and department
stores to instantly get up-to-the-minute information on our accounts. And everyone,
businesses and consumers alike, has come to rely on the Internet for instant
worldwide communications. Beneath the surface, the foundation for all of this
activity is data: the stored facts that we need to manage all of our human endeavors.
This book is about data. It’s about how to think about data in a highly
organized and deliberate way. It’s about how to store data efficiently and how to
retrieve it effectively. It’s about ways of managing data so that the exact data that
we need will be there when we need it. It’s about the concept of assembling data
into a highly organized collection called a ‘‘database’’ and about the sophisticated
software known as a ‘‘database management system’’ that controls the database
and oversees the database environment. It’s about the various approaches people
have taken to database management and about the roles people have assumed in
the database environment. We will see many real-world examples of data usage
throughout this book.
Computers came into existence because we needed help in processing and
using the massive amounts of data we have been accumulating. Is the converse true?
Could data exist without computers? The answer to this question is a resounding
‘‘yes.’’ In fact, data has existed for thousands of years in some very interesting, if
by today’s standards crude, forms. Furthermore, some very key points in the history
of the development of computing devices were driven, not by any inspiration about
computing for computing’s sake, but by a real need to efficiently handle a pesky data
management problem. Let’s begin by tracing some of these historical milestones in
the evolution of data and data management.
THE HISTORY OF DATA
The Origins of Data
What is data? To start, what is a single piece of data? A single piece of data is a
single fact about something we are interested in. Think about the world around you,
about your environment. In any environment there are things that are important to
you and there are facts about those things that are worth remembering. A ‘‘thing’’
can be an obvious object like an automobile or a piece of furniture. But the concept
of an object is broad enough to include a person, an organization like a company, or
an event that took place such as a particular meeting. A fact can be any characteristic
of an object. In a university environment it may be the fact that student Gloria
Thomas has completed 96 credits; or it may be the fact that Professor Howard Gold
graduated from Ohio State University; or it may be the fact that English 349 is being
www.circuitmix.com
The History of Data 3
CONCEPTS
IN ACT ION
1-A AMAZON.COM
When one thinks of online shopping,
one of the first companies that comes to mind is certainly
Amazon.com. This highly innovative company, based in
Seattle, WA, was one of the first online stores and has
consistently been one of the most successful. Amazon.com
seeks to be the world’s most customer-centric company,
where customers can find and discover anything they
might want to buy online. Amazon.com and its sellers list
millions of unique new and used items in categories such
as electronics, computers, kitchen products and house-
wares, books, music, DVDs, videos, camera and photo
items, toys, baby and baby registry, software, computer
and video games, cell phones and service, tools and
hardware, travel services, magazine subscriptions, and
outdoor living products. Through Amazon Marketplace,
zShops and Auctions, any business or individual can sell
virtually anything to Amazon.com’s millions of customers.
Demonstrating the reach of the Internet, Amazon.com
has sold to people in over 220 countries.
‘‘Photo Courtesy of Amazon.com’’
Initially implemented in 1995 and continually
improved ever since, Amazon.com’s ‘‘order pipeline’’
is a very sophisticated, information-intensive system that
accepts, processes, and fulfills customer orders. When
someone visits Amazon.com’s Web site, its system tries
to enhance the shopping experience by offering the
customer products on a personalized basis, based on
past buying patterns. Once an order is placed, the system
validates the customer’s credit-card information and sends
the customer an email order confirmation. It then goes
through a process of determining how best to fulfill the
order, including deciding which of several fulfillment sites
from which to ship the goods. When the order is shipped,
the system emails the customer a shipping confirmation.
Throughout the entire process, the system keeps track of
the current status of every order at any point in time.
Amazon.com’s order pipeline system is totally built
on relational database technology. Most of it uses Oracle
running on Hewlett Packard Unix systems. In order to
www.circuitmix.com
4 C h a p t e r 1 Data: The New Corporate Resource
achieve high degrees of scalability and availability, the
system is organized around the concept of distributed
databases, including replicated data that is updated
simultaneously at several domestic and international
locations. The system is integrated with the Oracle Finan-
cials enterprise resource planning (ERP) system and the
transactional data is shared with the company’s account-
ing and finance functions. In addition, Amazon.com
has built a multiterabyte data warehouse that imports its
transactional data and creates a decision support system
with a menu-based facility system of its own design.
Programs utilizing the data warehouse send personally
targeted promotional mailers to the company’s customers.
Amazon.com’s database includes hundreds of
individual tables. Among these are catalog tables listing
its millions of individual books and other products,
acustomer table with millions of records, personalization
tables, promotional tables, shopping-cart tables that
handle the actual purchase transactions, and order-history
tables. An order processing subsystem that determines
which fulfillment center to ship goods from uses tables that
keep track of product inventory levels in these centers.
held in Room 830 of Alumni Hall. In a commercial environment, it may be the fact
that employee John Baker’s employee number is 137; or it may be the fact that one
of a company’s suppliers, the Superior Products Co., is located in Chicago; or it
may be the fact that the refrigerator with serial number 958304 was manufactured
on November 5, 2004.
Actually, people have been interested in data for at least the past 12,000 years.
While today we often associate the concept of data with the computer, historically
there have been many more primitive methods of data storage and handling.
In the ancient Middle East, shepherds kept track of their flocks with pebbles,
Figure 1.1. As each sheep left its pen to graze, the shepherd placed one pebble in
a small sack. When all of the sheep had left, the shepherd had a record of how
many sheep were out grazing. When the sheep returned, the shepherd discarded one
pebble for each animal, and if there were more pebbles than sheep, he knew that
some of his sheep still hadn’t returned or were missing. This is, indeed, a primitive
but legitimate example of data storage and retrieval. What is important to realize
about this example is that the count of the number of sheep going out and coming
back in was all that the shepherd cared about in his ‘‘business environment’’ and
that his primitive data storage and retrieval system satisfied his needs.
Excavations in the Zagros region of Iran, dated to 8500 B.C., have unearthed
clay tokens or counters that we think were used for record keeping in primitive
F IGURE 1.1
Shepherd using pebbles to
keep track of sheep
www.circuitmix.com
The History of Data 5
FIGURE 1.2
Ancient clay tokens used to
record goods in transit
forms of accounting. Such tokens have been found at sites from present-day Turkey
to Pakistan and as far afield as the present-day Khartoum in Sudan, dating as long
ago as 7000 B.C. By 3000 B.C., in the present-day city of Susa in Iran, the use
of such tokens had reached a greater level of sophistication. Tokens with special
markings on them, Figure 1.2, were sealed in hollow clay vessels that accompanied
commercial goods in transit. These primitive bills of lading certified the contents
of the shipments. The tokens represented the quantity of goods being shipped and,
obviously, could not be tampered with without the clay vessel being broken open.
Inscriptions on the outside of the vessels and the seals of the parties involved
provided a further record. The external inscriptions included such words or concepts
as ‘‘deposited,’’ ‘‘transferred,’’ and ‘‘removed.’’
At about the same time that the Susa culture existed, people in the city-state
of Uruk in Sumeria kept records in clay texts. With pictographs, numerals, and
ideographs, they described land sales and business transactions involving bread,
beer, sheep, cattle, and clothing. Other Neolithic means of record keeping included
storing tallies as cuts and notches in wooden sticks and as knots in rope. The former
continued in use in England as late as the medieval period; South American Indians
used the latter.
Data Through the Ages
As in Susa and Uruk, much of thevery early interest in data can be traced to the rise
of cities. Simple subsistence hunting, gathering, and, later, farming had only limited
use for the concept of data. But when people live in cities they tend to specialize
in the goods and services they produce. They become dependent on one another,
bartering and using money to trade these goods and services for mutual survival.
This trade encouraged record keeping—the recording of data—to track how much
somone has produced and what it can be bartered or sold for.
www.circuitmix.com
6 C h a p t e r 1 Data: The New Corporate Resource
F IGURE 1.3
New types of data with the
advance of civilization
BILL OF LADING
MARCH 2005
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
S M T W T F S
Family Tree
As time went on, more and different kinds of data and records were kept.
These included calendars, census data, surveys, land ownership records, marriage
records, records of church contributions, and family trees, Figure 1.3. Increasingly
sophisticated merchants had to keep track of inventories, shipments, and wage
payments in addition to production data. Also, as farming went beyond the
subsistence level and progressed to the feudal manor stage, there was a need
to keep data on the amount of produce to consume, to barter with, and to keep as
seed for the following year.
The Crusades took place from the late eleventh to the late thirteenth centuries.
One side effect of the Crusades was a broader view of the world on the part of the
Europeans, with an accompanying increase in interest in trade. A common method of
trade in that era was the establishment of temporary partnerships among merchants,
ships captains, and owners to facilitate commercial voyages. This increased level of
commercial sophistication brought with it another round of increasingly complex
record keeping, specifically, double-entry bookkeeping.
Double-entry bookkeeping originated in the trading centers of fourteenth-
century Italy. The earliest known example, from a merchant in Genoa, dates to the
year 1340. Its use gradually spread, but it was not until 1494, in Venice (about
25 years after Venice’s first movable type printing press came into use), that
a Franciscan monk named Luca Pacioli published his ‘‘Summa de Arithmetica,
Geometrica, Proportioni et Proportionalita’’ a work important in spreading the use
of double-entry bookkeeping. Of course, as a separate issue, the increasing use of
paper and the printing press furthered the advance of record keeping as well.
As the dominance of the Italian merchants declined, other countries became
more active in trade and thus in data and record keeping. Furthermore, as the use
of temporary trading partnerships declined and more stable long-term mercantile
organizations were established, other types of data became necessary. For example,
annual as opposed to venture-by-venture statements of profit and loss were needed.
In 1673 the ‘‘Code of Commerce’’ in France required every businessman to draw up
a balance sheet every two years. Thus the data had to be periodically accumulated
for reporting purposes.
www.circuitmix.com
The History of Data 7
Early Data Problems Spawn Calculating Devices
It was also in the seventeenth century that data began to prompt people to take
an interest in devices that could ‘‘automatically’’ process their data, if only in
a rudimentary way. Blaise Pascal produced one of the earliest and best known
such devices in France in the 1640s, reputedly to help his father track the data
associated with his job as a tax collector, Figure 1.4. This was a small box containing
interlocking gears that was capable of doing addition and subtraction. In fact, it was
the forerunner of today’s mechanical automobile odometers.
In 1805, Joseph Marie Jacquard of France invented a device that automatically
reproduced patterns used in textile weaving. The heart of the device was a series
of cards with holes punched in them; the holes allowed strands of material to
be interwoven in a sequence that produced the desired pattern, Figure 1.5. While
Jacquard’s loom wasn’t a calculating device as such, his method of storing fabric
patterns, a form of graphic data, as holes in punched cards was a very clever
means of data storage that would have great importance for computing devices to
follow. Charles Babbage, a nineteenth-century English mathematician and inventor,
picked up Jacquard’s concept of storing data in punched cards. Beginning in 1833,
Babbage began to think about an invention that he called the ‘‘Analytical Engine.’’
Although he never completed it (the state of the art of machinery was not developed
enough), included in its design were many of the principles of modern computers.
The Analytical Engine was to consist of a ‘‘store’’ for holding data items and a
‘‘mill’’ for operating upon them. Babbage was very impressed by Jacquard’s work
with punched cards. In fact, the Analytical Engine was to be able to store calculation
instructions in punched cards. These would be fed into the machine together with
punched cards containing data, would operate on that data, and would produce the
desired result.
F IGURE 1.4
Blaise Pascal and his
adding machine Photo courtesy of IBM Archives
www.circuitmix.com
8 C h a p t e r 1 Data: The New Corporate Resource
F IGURE 1.5
The Jacquard loom recorded
patterns in punched-cards Photo courtesy of IBM Archives
Swamped with Data
In the late 1800s, an enormous (for that time) data storage and retrieval problem and
greatly improved machining technology ushered in the era of modern information
processing. The 1880 U.S. Census took about seven years to compile by hand. With
a rapidly expanding population fueled by massive immigration, it was estimated that
with the same manual techniques, the compilation of the 1890 census would not be
completed until after the 1900 census data had begun to be collected. The solution
to processing census data was provided by a government engineer named Herman
Hollerith. Basing his work on Jacquard’s punched-card concept, he arranged to
have the census data stored in punched cards. He built devices to punch the holes
into cards and devices to sort the cards, Figure 1.6. Wire brushes touching the
cards completed circuits when they came across the holes and advanced counters.
The equipment came to be classified as ‘‘electromechanical,’’ ‘‘electro’’ because
it was powered by electricity and ‘‘mechanical’’ because the electricity powered
mechanical counters that tabulated the data. By using Hollerith’s equipment, the
total population count of the 1890 census was completed a month after all the data
was in. The complete set of tabulations, including data on questions that had never
before even been practical to ask, took two years to complete. In 1896, Hollerith
formed the Tabulating Machine Company to produce and commercially market his
devices. That company, combined with several others, eventually formed what is
today the International Business Machines Corporation (IBM).
Towards the turn of the century, immigrants kept coming and the U.S.
population kept expanding. The Census Bureau, while using Hollerith’s equipment,
continued experimenting on its own to produce even more advanced data-tabulating
machinery. One of its engineers, James Powers, developed devices to automatically
feed cards into the equipment and automatically print results. In 1911 he formed the
Powers Tabulating Machine Company, which eventually formed the basis for the
www.circuitmix.com
The History of Data 9
FIGURE 1.6
Herman Hollerith and his
tabulator/sorter, circa 1890
UNIVAC division of the Sperry Corporation, which eventually became the Unisys
Corporation.
From the days of Hollerith and Powers through the 1940s, commercial data
processing was performed on a variety of electromechanical punched-card-based
devices. They included calculators, punches, sorters, collators, and printers. The data
was stored in punched cards, while the processing instructions were implemented as
collections of wires plugged into specially designed boards that in turn were inserted
into slots in the electromechanical devices. Indeed, electromechanical equipment
overlapped with electronic computers, which were introduced commercially in the
mid-1950s.
In fact, the introduction of electronic computers in the mid-1950s coincided
with a tremendous boom in economic development that raised the level of data
storage and retrieval requirements another notch. This was a time of rapid
commercial growth in the post-World War II U.S.A. as well as the rebuilding
of Europe and the Far East. From this time onward, the furious pace of new data
storage and retrieval requirements with more and more commercial functions and
procedures were automated and the technological advances in computing devices
has been one big blur. From this point on, it would be virtually impossible to
tie advances in computing devices to specific, landmark data storage and retrieval
needs. And there is no need to try to do so.
Modern Data Storage Media
Paralleling the growth of equipment to process data was the development of new
media on which to store the data. The earliest form of modern data storage was
punched paper tape, which was introduced in the 1870s and 1880s in conjunction
with early teletype equipment. Of course we’ve already seen that Hollerith in the
1890s and Powers in the early 1900s used punched cards as a storage medium. In
www.circuitmix.com
10 C h a p t e r 1 Data: The New Corporate Resource
Y O U R
T U R N
1.1 THE DeVELOPMENT OF DATA
The need to organize and store data
has arisen many times and in many ways throughout
history. In addition to the data-focused events presented in
this chapter, what other historical events can you think of
that have made people think about organizing and storing
data? As a hint, you might think about the exploration
and conquest of new lands, wars, changes in type of
governments such as the introduction of democracy, and
the implications of new inventions such as trains, printing
presses, and electricity.
QUESTION:
Develop a timeline showing several historical events that
influenced the need to organize and store data. Include
a few noted in this chapter as well as a few that you
can think of independently.
fact, punched cards were the only data storage medium used in the increasingly
sophisticated electromechanical accounting machines of the 1920s, 1930s, and
1940s.They were still used extensively in the early computers of the 1950s and
1960s and could even be found well into the 1970s in smaller information systems
installations, to a progressively reduced degree.
The middle to late 1930s saw the beginning of the era of erasable magnetic
storage media, with Bell Laboratories experimenting with magnetic tape for sound
storage. By the late 1940s, there was early work on the use of magnetic tape for
recording data. By 1950, several companies, including RCA and Raytheon, were
developing the magnetic tape concept for commercial use. Both UNIVAC and
Raytheon offered commercially available magnetic tape units in 1952, followed by
IBM in 1953, Figure 1.7. During the mid-1950s and into the mid-1960s, magnetic
F IGURE 1.7
Early magnetic tape drive,
circa 1953
www.circuitmix.com
The History of Data 11
tape gradually became the dominant data-storage medium in computers. Magnetic
tape technology has been continually improved since then and is still in limited use
today, particularly for archived data.
The original concept that eventually grew into the magnetic disk actually
began to be developed at MIT in the late 1930s and early 1940s. By the early 1950s,
several companies including UNIVAC, IBM, and Control Data had developed
prototypes of magnetic ‘‘drums’’ that were the forerunners of magnetic disk
technology. In 1953, IBM began work on its 305 RAMAC (Random Access
Memory Accounting Machine) fixed disk storage device. By 1954 there was a
multi-platter version, which became commercially available in 1956, Figure 1.8.
During the mid-1960s a massive conversion from tape to magnetic disk as
the preeminent data storage medium began and disk storage is still the data storage
medium of choice today. After the early fixed disks, the disk storage environment
became geared towards the removable disk-pack philosophy, with a dozen or more
packs being juggled on and off a single drive as a common ratio. But, with the
increasingly tighter environmental controls that fixed disks permitted, more data per
square inch (or square centimeter) could be stored on fixed disk devices. Eventually,
the disk drives on mainframes and servers, as well as the fixed disks or ‘‘hard
drives’’ of PCs, all became non-removable, sealed units. But the removable disk
concept stayed with us a while in the form of PC diskettes and the Iomega Corp.’s
Zip Disks, and today in the form of so-called external hard drives that can be easily
moved from one computer to another simply by plugging them into a USB port.
These have been joined by the laser-based, optical technology compact disk (CD),
introduced as a data storage medium in 1985. Originally, data could be recorded
on these CDs only at the factory and once created, they were non-erasable. Now,
data can be recorded on them, erased, and re-recorded in a standard PC. Finally,
solid-state technology has become so miniaturized and inexpensive that a popular
option for removable media today is the flash drive.
F IGURE 1.8
IBM RAMAC disk
storage device, circa 1956
www.circuitmix.com
12 C h a p t e r 1 Data: The New Corporate Resource
DATA IN TODAY’S INFORMATION SYSTEMS ENVIRONMENT
Using Data for Competitive Advantage
Today’s computers are technological marvels. Their speeds, compactness, ease of
use, price as related to capability, and, yes, their data storage capacities are truly
amazing. And yet, our fundamental interest in computers is the same as that of the
ancient Middle-Eastern shepherds in their pebbles and sacks: they are the vehicles
we need to store and utilize the data that is important to us in our environment.
Indeed, data has become indispensable in every kind of modern business
and government organization. Data, the applications that process the data, and
the computers on which the applications run are fundamental to every aspect of
every kind of endeavor. When speaking of corporate resources, people used to
list such items as capital, plant and equipment, inventory, personnel, and patents.
Today, any such list of corporate resources must include the corporation’s data. It
has even been suggested that data is the most important corporate resource because
it describes all of the others.
Data can provide a crucial competitive advantage for a company. We
routinely speak of data and the information derived from it as competitive weapons
in hotly contested industries. For example, FedEx had a significant competitive
advantage when it first provided access to its package tracking data on its Web
site. Then, once one company in an industry develops a new application that takes
advantage of its data, the other companies in the industry are forced to match it to
remain competitive. This cycle continually moves the use of data to ever-higher
levels, making it an ever more important corporate resource than before. Examples
of this abound. Banks give their customers online access to their accounts. Package
shipping companies provide up-to-the-minute information on the whereabouts of
a package. Retailers send manufacturers product sales data that the manufacturers
use to adjust inventories and production cycles. Manufacturers automatically send
their parts suppliers inventory data and expect the suppliers to use the data to keep
a steady stream of parts flowing.
Problems in Storing and Accessing Data
But being able to store and provide efficient access to a company’s data while also
maintaining its accuracy so that it can be used to competitive advantage is anything
Y O U R
T U R N
1.2 DATA AS A COMPETITIVE WEAPON
Think about a company with which
you or your family regularly does business. This might be
a supermarket, a department store, or a pharmacy, as
examples. What kind of data do you think they collect
about their suppliers, their inventory, their sales, and their
customers? What kind of data do you think they should
collect and how do you think they might be able to use it
to gain a competitive advantage?
QUESTION:
Choose one of the companies that you or your family
does business with and develop a plan for the kinds
of data it might collect and the ways in which it might
use the data to gain a business advantage over its
competitors.
www.circuitmix.com
Data in Today’s Information Systems Environment 13
but simple. In fact, several factors make it a major challenge. First and foremost,
the volume or amount of data that companies have is massive and growing all
the time. Walmart estimates that its data warehouse (a type of database we will
explore later) alone contains hundreds of terabytes (trillions of characters) of data
and is constantly growing. The number of people who want access to the data is
also growing: at one time, only a select group of a company’s own employees were
concerned with retrieving its data, but this has changed. Now, not only do vastly
more of a company’s employees demand access to the company’s data but also so
do the company’s customers and trading partners. All major banks today give their
depositors Internet access to their accounts. Increasingly tightly linked ‘‘supply
chains’’ require that companies provide other companies, such as their suppliers and
customers, with access to their data. The combination of huge volumes of data and
large numbers of people demanding access to it has created a major performance
challenge. How do you sift through so much data for so many people and give them
the data that they want in an acceptably small amount of time? How much patience
would you have with an insurance company that kept you on the phone for five or
ten minutes while it retrieved claim data about which you had a question? Of course,
the tremendous advances in computer hardware, including data storage hardware,
have helped—indeed, it would have been impossible to have gone as far as we have
in information systems without them. But as the hardware continues to improve,
the volumes of data and the number of people who want access to it also increase,
making it a continuing struggle to provide them with acceptable response times.
Other factors that enter into data storage and retrieval include data security,
data privacy, and backup and recovery. Data security involves a company protecting
its data from theft, malicious destruction, deliberate attempts to make phony changes
to the data (e.g. someone trying to increase his own bank account balance), and even
accidental damage by the company’s own employees. Data privacy implies assuring
that even employees who normally have access to the company’s data (much less
outsiders) are given access only to the specific data they need in their work. Put
another way, sensitive data such as employee salary data and personal customer
data should be accessible only by employees whose job functions require it. Backup
and recovery means the ability to reconstruct data if it is lost or corrupted, say in
a hardware failure. The extreme case of backup and recovery is known as disaster
recovery when an information system is destroyed by fire, a hurricane, or other
calamity.
Another whole dimension involves maintaining the accuracy of a company’s
data. Historically, and in many cases even today, the same data is stored several,
sometimes many, times within a company’s information system. Why does this
happen? For several reasons. Many companies are simply not organized to share
data among multiple applications. Every time a new application is written, new data
files are created to store its data. As recently as the early 1990s, I spoke to a database
administration manager (more on this type of position later) in the securities industry
who told me that one of the reasons he was hired was to reduce duplicate data
appearing in as many as 60–70 files! Furthermore, depending on how database files
are designed, data can even be duplicated within a single file. We will explore this
issue much more in this book, but for now, suffice it to say that duplicate data, either
in multiple files or in a single file, can cause major data accuracy problems.
Data as a Corporate Resource
Every corporate resource must be carefully managed so that the company can
keep track of it, protect it, and distribute it to those people and purposes in the
www.circuitmix.com
14 C h a p t e r 1 Data: The New Corporate Resource
company that need it. Furthermore, public companies have a responsibility to
their shareholders to competently manage the company’s assets. Can you imagine
a company’s money just sort of out there somewhere without being carefully
managed? In fact, the chief financial officer with a staff of accountants and financial
professionals is responsible for the money, with outside accounting firms providing
independent audits of it. Typically vice presidents of personnel and their staffs are
responsible for the administrative functions necessary to manage employee affairs.
Production managers at various levels are responsible for parts inventories, and so
on. Data is no exception.
But data may just be the most difficult corporate resource to manage. In data,
we have a resource of tremendous volume, billions, trillions, and more individual
pieces of data, each piece of which is different from the next. And it has the
characteristic that much of it is in a state of change at any one time. It’s not as if
we’re talking about managing a company’s employees. Even the largest companies
have only a few hundred thousand of them, and they don’t change all that frequently.
Or the money a company has: sure, there is a lot of it, but it’s all the same in the
sense that a dollar that goes to payroll is the same kind of dollar that goes to paying
a supplier for raw materials.
As far back as the early to mid-1960s, barely ten years after the introduction
of commercially viable electronic computers, some forward-looking companies
began to realize that storing each application’s data separately, in simple files, was
becoming problematic and would not work in the long run, for just the reasons
that we’ve talked about: the increasing volumes of data (even way back then), the
increasing demand for data access, the need for data security, privacy, backup,
and recovery, and the desire to share data and cut down on data redundancy.
Several things were becoming clear. The task was going to require both a new
kind of software to help manage the data and progressively faster hardware to
keep up with the increasing volumes of data and data access demands. And
data-management specialists would have to be developed, educated, and made
responsible for managing the data as a corporate resource.
Out of this need was born a new kind of software, the database management
system (DBMS), and a new category of personnel, with titles like database
administrator and data management specialist. And yes, hardware has progressively
gotten faster and cheaper for the performance it provides. The integration of these
advances adds up to much more than the simple sum of their parts. They add up to
the database environment.
The Database Environment
Back in the early 1960s, the emphasis in what was then called data processing was on
programming. Data was little more than a necessary afterthought in the application
development process and in running the data-processing installation. There was a
good reason for this. By today’s standards, the rudimentary computers of the time
had very small main memories and very simplistic operating systems. Even relatively
basic application programs had to be shoehorned into main memory using low-level
programming techniques and a lot of cleverness. But then, as we progressed further
into the 1960s and beyond, two things happened simultaneously that made this
picture change forever. One was that main memories became progressively larger
and cheaper and operating systems became much more powerful. Plus, computers
www.circuitmix.com
Summary 15
progressively became faster and cheaper on a price/performance basis. All these
changes had the effect of permitting the use of higher-level programming languages
that were easier for a larger number of personnel to use, allowing at least some of
the emphasis to shift elsewhere. Well, nature hates a vacuum, and at the same time
that all of this was happening, companies started becoming aware of the value of
thinking of data as a corporate resource and using it as a competitive weapon.
The result was the development of database management systems (DBMS)
software and the creation of the ‘‘database environment.’’ Supported by ever-
improved hardware and specialized database personnel, the database environment
is designed largely to correct all the problems of the non-database environment.
It encourages data sharing and the control of data redundancy with important
improvements in data accuracy. It permits storage of vast volumes of data with
acceptable access and response times for database queries. And it provides the tools
to control data security, data privacy, and backup and recovery.
This book is a straightforward introduction to the fundamentals of database
in the current information systems environment. It is designed to teach you the
important concepts of the database approach and also to teach you specific skills, such
as how to design relational databases, how to improve database performance, and
how to retrieve data from relational databases using the SQL language. In addition,
as you proceed through the book you will explore such topics as entity-relationship
diagrams, object-oriented database, database administration, distributed database,
data warehousing, Internet database issues, and others.
We start with the basics of database and take a step-by-step approach to
exploring all the various components of the database environment. Each chapter
progressively adds more to an understanding of both the technical and managerial
aspects of the field. Database is avery powerful concept. Overall it provides ingenious
solutions to a set of very difficult problems. As a result, it tends to be a multifaceted
and complex subject that can appear difficult when one attempts to swallow it in
one gulp. But database is approachable and understandable if we proceed carefully,
cautiously, and progressively step by step. And this is an understanding that no one
involved in information systems can afford to be without.
SUMMARY
Recognition of the commercial importance of data, of storing it, and of retrieving
it can be traced back to ancient times. As trade routes lengthened and cities grew
larger, data became increasingly important. Eventually, the importance of data led
to the development of electromechanical calculating devices and then to modern
electronic computers, complete with magnetic and optical disk-based data storage
media.
While the use of data has given many companies a competitive advantage in
their industries, the storage and retrieval of today’s vast amounts of data holds many
challenges. These include speedy retrieval of data when many people try to access
the data at the same time, maintaining the accuracy of the data, the issue of data
security, and the ability to recover the data if it is lost.
The recognition that data is a critical corporate resource and that managing data
is a complex task has led to the development and continuing refinement of specialized
software known as database management systems, the subject of this book.
www.circuitmix.com
16 C h a p t e r 1 Data: The New Corporate Resource
KEY TERMS
Balance sheet
Barter
Calculating devices
Census
Compact disk
Competitive advantage
Corporate resource
Data
Data storage
Database
Database environment
Database management system
Disk drive
Double-entry bookkeeping
Electromechanical equipment
Electronic computer
Flash drive
Information processing
Magnetic disk
Magnetic drum
Magnetic tape
Optical disk
Punched cards
Punched paper tape
Record keeping
Tally
Token
QUESTIONS
1. What did the Middle Eastern shepherds’ pebbles and
sacks, Pascal’s calculating device, and Hollerith’s
punched-card devices all have in common?
2. What did the growth of cities have to do with the
need for data?
3. What did the growth of trade have to do with the
need for data?
4. What did Jacquard’s textile weaving device have to
do with the development of data?
5. Choose what you believe to be the:
a. One most important
b. Two most important
c. Three most important landmark events in the
history of data. Defend your choices.
6. Do you think that computing devices would have
been developed even if specific data needs had not
come along? Why or why not?
7. What did the need for data among ancient Middle
Eastern shepherds have in common with the need
for data of modern corporations?
8. List several problems in storing and accessing data
in today’s large corporations. Which do you think is
the most important? Why?
9. How important an issue do you think data accuracy
is? Explain.
10. How important a corporate resource is data com-
pared to other corporate resources? Explain.
11. What factors led to the development of database
management systems?
EXERCISES
1. Draw a timeline showing the landmark events in
the history of data from ancient times to the present
day. Do not include the development of computing
devices in this timeline.
2. Draw a timeline for the last four hundred years
comparing landmark events in the history of data to
landmark events in the development of computing
devices.
3. Draw a timeline for the last two hundred years
comparing the development of computing devices
to the development of data storage media.
4. Invent a fictitious company in one of the following
industries and list several ways in which the
company can use data to gain a competitive
advantage.
a. Banking
b. Insurance
c. Manufacturing
d. Airlines
5. Invent a fictitious company in one of the following
industries and describe the relationship between
data as a corporate resource and the company’s
other corporate resources.
a. Banking
b. Insurance
c. Manufacturing
d. Airline
www.circuitmix.com
Minicases 17
MINICASES
1. Worldwide, vacation cruises on increasingly larger ships
have been steadily growing in popularity. People like the
all-inclusive price for food, room, and entertainment, the
variety of shipboard activities, and the ability to unpack
just once and still visit several different places. The
first of the two minicases used throughout this book is
the story of Happy Cruise Lines. Happy Cruise Lines
has several ships and operates (begins its cruises) from
a number of ports. It has a variety of vacation cruise
itineraries, each involving several ports of call. The
company wants to keep track of both its past and future
cruises and of the passengers who sailed on the former
and are booked on the latter. Actually, you can think of
a cruise line as simply a somewhat specialized instance
of any passenger transportation company, including
airlines, trains, and buses. Beyond that, a cruise line
is, after all, a business and like any other business of any
kind it must be concerned about its finances, employees,
equipment, and so forth.
a. Using this introductory description of (and hints
about) Happy Cruise Lines, make a list of the things
in Happy Cruise Lines’ business environment about
which you think the company would want to maintain
data. Do some or all of these qualify as ‘‘corporate
resources?’’ Explain.
b. Develop some ideas about how the data you identified
in part a above can be used by Happy Cruise Lines to
gain a competitive advantage over other cruise lines.
2. Sports are universally enjoyed around the globe.
Whether the sport is a team or individual sport, whether
a person is a participant or a spectator, and whether
the sport is played at the amateur or professional
level, one way or another this kind of activity can be
enjoyed by people of all ages and interests. Furthermore,
professional sports today are a big business involving
very large sums of money. And so, the second of
the two minicases to be used throughout this book is
the story of the professional Super Baseball League.
Like any sports league, the Super Baseball League
wants to maintain information about its teams, coaches,
players, and equipment, among other things. If you are
not particularly familiar with baseball or simply prefer
another sport, bear in mind that most of the issues
that will come up in this minicase easily translate to
any team sport at the amateur, college, or professional
levels. After all, all team sports have teams, coaches,
players, fans, equipment, and so forth. When specialized
equipment or other baseball-specific items come up, we
will explain them.
a. Using this introductory description of (and hints
about) the Super Baseball League, list the things in
the Super Baseball League’s business environment
about which you think the league would want to
maintain data. Do some or all of these qualify as
‘‘corporate resources,’’ where the term is broadened
to include the resources of a sports league? Explain.
b. Develop some ideas about how the data that you
identified in part a above can be used by the Super
Baseball League to gain a competitive advantage
over other sports leagues for the fans’ interest and
entertainment dollars (Euros, pesos, yen, etc.)
www.circuitmix.com
www.circuitmix.com
C H A P T E R 2
DATA MODELING
B efore reaching database management, there is an important preliminary to cover.
In order ultimately to design databases to support an organization, we must have
a clear understanding of how the organization is structured and how it functions. We
have to understand its components, what they do and how they relate to each other. The
bottom line is that we have to devise a way of recording, of diagramming, the business
environment. This is the essence of data modeling.
OBJECTIVES
■ Explain the concept and practical use of data modeling.
■ Recognize which relationships in the business environment are unary, binary,
and ternary relationships.
■ Describe one-to-one, one-to-many, and many-to-many unary, binary, and ternary
relationships.
■ Recognize and describe intersection data.
■ Model data in business environments by drawing entity-relationship diagrams
that involve unary, binary, and ternary relationships.
CHAPTER OUTLINE
Introduction
Binary Relationships
What is a Binary Relationship?
Cardinality
Modality
More About Many-to-Many
Relationships
Unary Relationships
One-to-One Unary Relationship
One-to-Many Unary Relationship
Many-to-Many Unary Relationship
Ternary Relationships
Example: The General Hardware
Company
Example: Good Reading Book Stores
Example: World Music Association
Example: Lucky Rent-A-Car
Summary
www.circuitmix.com
20 C h a p t e r 2 Data Modeling
INTRODUCTION
The diagramming technique we will use is called the entity-relationship or
E-R Model. It is well named, as it diagrams entities (together with their attributes)
and the relationships among them. Actually, there are many variations of E-R
diagrams and drawing them is as much an art as a science. We will use the E-R dia-
gramming technique provided by Microsoft Visio with the ‘‘crow’s foot’’ variation.
To begin, an entity is an object or event in our environment that we want to
keep track of. A person is an entity. So is a building, a piece of inventory sitting
on a shelf, a finished product ready for sale, and a sales meeting (an event). An
attribute is a property or characteristic of an entity. Examples of attributes include
an employee’s employee number, the weight of an automobile, a company’s address,
or the date of a sales meeting. Figure 2.1, with its rectangular shape, represents
a type of entity. The name of the entity type (SALESPERSON) is set in caps at
the top of the box. The entity type’s attributes are shown below it. The attribute
label PK and the boldface type denote the one or more attributes that constitute the
entity type’s unique identifier. Visio uses the abbreviation PK to stand for ‘‘primary
key,’’ which is a concept we define later in this book. For now, just consider these
attributes as the entity type’s unique identifier.
Entities in the real world never really stand alone. They are typically associated
with one another. Parents are associated with their children, automobile parts are
associated with the finished automobile in which they are installed, firefighters are
associated with the fire engines to which they are assigned, and so forth. Recognizing
and recording the associations among entities provides a far richer description of
an environment than recording the entities alone. In order to deal intelligently and
usefully with the associations or relationships among entities, we have to recognize
that there are several different kinds of relationships and several different aspects of
describing them. The most basic way of categorizing a relationship is by the number
of entity types involved.
F IGURE 2.1
An E-R model entity and its attributes
One
Salesperson
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
BINARY RELATIONSHIPS
What is a Binary Relationship?
The simplest kind of relationship is known as a binary relationship. A binary
relationship is a relationship between two entity types. Figure 2.2 shows a small
E-R diagram with a binary relationship between two entity types, salespersons and
www.circuitmix.com
Binary Relationships 21
CONCEPTS
IN ACT ION
2-A THE WALT DISNEY COMPANY
The Walt Disney Company is world-
famous for its many entertainment ventures but it is
especially identified with its theme parks. First there
was Disneyland in Los Angeles, then the mammoth Walt
Disney World in Orlando. These were followed by parks
in Paris and Tokyo, and one now under development in
Hong Kong. The Disney theme parks are so well run that
they create a wonderful feeling of natural harmony with
everyone and everything being in the right place at the
right time. When you’re there, it’s too much fun to stop
to think about how all this is organized and carried off
with such precision. But, is it any wonder to learn that
databases play a major part?
One of the Disney theme parks’ interesting
database applications keeps track of all of the costumes
‘‘Photo Courtesy of the Walt Disney Company’’
worn by the workers or ‘‘cast members’’ in the parks. The
system is called the Garment Utilization System or GUS
(which was also the name of one of the mice that helped
Cinderella sew her dress!). Managing these costumes is
no small task. Virtually all of the cast members, from the
actors and dancers to the ride operators, wear some
kind of costume. Disneyland in Los Angeles has 684,000
costume parts (each costume is typically made up of
several garments), each of which is uniquely bar-coded,
for its 46,000 cast members. The numbers in Orlando
are three million garments and 90,000 cast members.
Using bar-code scanning, GUS tracks the life cycle of
every garment. This includes the points in time when a
garment is in the storage facility, is checked out to a cast
member, is in the laundry, or is being repaired (in house
www.circuitmix.com
22 C h a p t e r 2 Data Modeling
or at a vendor). In addition to managing the day-to-day
movements of the costumes, the system also provides a
rich data analysis capability. The industrial engineers in
Disney’s business planning group use the accumulated
data to decide how many garments to keep in stock and
how many people to have staffing the garment check-
out windows based on the expected wait times. They
also use the data to determine whether certain fabrics
or the garments made by specific manufacturers are not
holding up well through a reasonable number of uses or
of launderings.
GUS, which was inaugurated at Disneyland in
Los Angeles in 1998 and then again at Walt Disney
World in Orlando in 2002, replaced a manual system
in which the costume data was written on index cards.
It is implemented in Microsoft’s SQL Server DBMS and
runs on a Compaq server. It is also linked to an SAP
personnel database to help maintain the status of the
cast members. If GUS is ever down, the process shifts to
a Palm Pilot-based backup system that can later update
the database. In order to keep track of the costume
parts and cast members, not surprisingly, there is a
relational table for costume parts with one record for
each garment and there is a table for cast members
with one record for each cast member. The costume
parts records include the type of garment, its size, color,
and even such details as whether its use is restricted
to a particular cast member and whether it requires
a special laundry detergent. Correspondingly, the cast
member records include the person’s clothing sizes and
other specific garment requirements.
Ultimately, GUS’s database precision serves several
purposes in addition to its fundamental managerial value.
The Walt Disney Company feels that consistency in how
its visitors or ‘‘guests’’ look at a given ride gives them
an important comfort level. Clearly, GUS provides that
consistency in the costuming aspect. In addition, GUS
takes the worry out of an important part of each cast
member’s workday. One of Disney’s creeds is to strive to
take good care of its cast members so that they will take
good care of Disney’s guests. Database management is
a crucial tool in making this work so well.
products. The E-R diagram in Figure 2.2 tells us that a salesperson ‘‘sells’’ products.
Conversely, products are ‘‘sold by’’ salespersons. That’s good information, but we
can do better than that at the price of a very small increase in effort. Just knowing that
a salesperson sells products leaves open several obvious and important questions.
Is a particular salesperson allowed to sell only one kind of product, or two, or
three, or all of the available products? Can a particular product be sold by only a
single salesperson or by all salespersons? Might we want to keep track of a new
salesperson who has just joined the company but has not yet been assigned to sell
any products (assuming that there is indeed a restriction on which salespersons can
sell which products)?
PRODUCT
PK Product
Number
Product
Name
Unit Price
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Many
Salespersons
Many
Products
Sells
Sold by
F IGURE 2.2
A binary relationship
www.circuitmix.com
Binary Relationships 23
Cardinality
One-to-One Binary Relationship Figure 2.3 shows three binary relationships of
different cardinalities, representing the maximum number of entities that can be
involved in a particular relationship. Figure 2.3a shows a one-to-one (1-1) binary
relationship, which means that a single occurrence of one entity type can be
associated with a single occurrence of the other entity type and vice versa. A
particular salesperson is assigned to one office. Conversely, a particular office (in
this case they are all private offices!) has just one salesperson assigned to it. Note the
‘‘bar’’ or ‘‘one’’ symbol on either end of the relationship in the diagram indicating
the maximum one cardinality. The way to read these diagrams is to start at one
entity, read the relationship on the connecting line, pick up the cardinality on the
other side of the line near the second entity, and then finally reach the other entity.
Thus, Figure 2.3a, reading from left to right, says, ‘‘A salesperson works in one
(really at most one, since it is a maximum) office.’’ The bar or one symbol involved
OFFICE
PK Office
Number
Telephone
Size
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Works in
Occupied by
CUSTOMER
PK Customer
Number
Costomer
Name
HQ City
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Sells to
Buys from
PRODUCT
PK Product
Number
Product
Name
Unit Price
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Sells
Sold by
One
Salesperson
One
Office
Many
Customers
Many
Products
One
Salesperson
Many
Salespersons
(a.) One-to-one (1–1) binary relationship
(b.) One-to-many (1–M) binary relationship
(c.) Many-to-many (M–M) binary relationship
F IGURE 2.3
Binary relationships with cardinalities
www.circuitmix.com
24 C h a p t e r 2 Data Modeling
in this statement is the one just to the left of the office entity box. Conversely,
reading from right to left, ‘‘An office is occupied by one salesperson.’’
One-to-Many Binary Relationship Associations can also be multiple in nature.
Figure 2.3b shows a one-to-many (1-M) binary relationship between salespersons
and customers. The ‘‘crow’s foot’’ device attached to the customer entity box
represents the multiple association. Reading from left to right, the diagram indicates
that a salesperson sells to many customers. (Note that ‘‘many,’’ as the maximum
number of occurrences that can be involved, means a number that can be 1, 2, 3, …n.
It also means that the number is not restricted to being exactly one, which would
require the ‘‘one’’ or ‘‘bar’’ symbol instead of the crow’s foot.) Reading from
right to left, Figure 2.3b says that a customer buys from only one salesperson. This
is reasonable, indicating that in this company each salesperson has an exclusive
territory and thus each customer can be sold to by only one salesperson from the
company.
Many-to-Many Binary Relationship Figure 2.3c shows a many-to-many (M-M)
binary relationship among salespersons and products. A salesperson is authorized
to sell many products; a product can be sold by many salespersons. By the way,
in some circumstances, in either the 1-M or M-M case, ‘‘many’’ can be either an
exact number or have a known maximum value. For example, a company rule may
set a limit of a maximum of ten customers in a sales territory. Then the ‘‘many’’ in
the 1-M relationship of Figure 2.3b can never be more than 10 (a salesperson can
have many customers but not more than 10). Sometimes people include this exact
number or maximum next to or even instead of the crow’s foot in the E-R diagram.
Modality
Figure 2.4 shows the addition of the modality, the minimum number of entity
occurrences that can be involved in a relationship. In our particular salesperson
environment, every salesperson must be assigned to an office. On the other hand, a
given office might be empty or it might be in use by exactly one salesperson. This
situation is recorded in Figure 2.4a, where the ‘‘inner’’ symbol, which can be a zero
or a one, represents the modality—the minimum—and the ‘‘outer’’ symbol, which
can be a one or a crow’s foot, represents the cardinality—the maximum. Reading
Figure 2.4a from left to right tells us that a salesperson works in a minimum of one
and a maximum of one office, which is another way of saying exactly one office.
Reading from right to left, an office may be occupied by or assigned to a minimum
of no salespersons (i.e. the office is empty) or a maximum of one salesperson.
Similarly, Figure 2.4b indicates that a salesperson may have no customers
or many customers. How could a salesperson have no customers? (What are we
paying her for?!?) Actually, this allows for the case in which we have just hired
a new salesperson and have not as yet assigned her a territory or any customers.
On the other hand, a customer is always assigned to exactly one salesperson. We
never want customers to be without a salesperson—how would they buy anything
from us when they need to? We never want to be in a position of losing sales! If
a salesperson leaves the company, the company’s procedures require that another
salesperson or, temporarily, a sales manager be immediately assigned the departing
salesperson’s customers. Figure 2.4c says that each salesperson is authorized to sell
at least one or many of our products and each product can be sold by at least one
www.circuitmix.com
Binary Relationships 25
Works in
Occupied by
OFFICE
PK Office
Number
Telephone
Size
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Sells to
Buys from
CUSTOMER
PK Customer
Number
Customer
Name
HQ City
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Sells
Sold by
PRODUCT
PK Product
Number
Product
Name
Unit Price
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
(a.) One-to-one (1–1) binary relationship
(b.) One-to-many (1–M) binary relationship
(c.) Many-to-many (M–M) binary relationship
One
Salesperson
One
Salesperson
One
Salesperson
No
Salespersons
One
Office
Many
Customers
No
Customers
Many
Products
One
Product
Many
Salespersons
Modality
Cardinality
F IGURE 2.4
Binary relationships with cardinalities (maximums) and modalities (minimums)
or many of our salespersons. This includes the extreme, but not surprising, case in
which each salesperson is authorized to sell all the products and each product can
be sold by all the salespersons.
More About Many-to-Many Relationships
Intersection Data Generally, we think of attributes as facts about entities. Each
salesperson has a salesperson number, a name, a commission percentage, and a
year of hire. At the entity occurrence level, for example, one of the salespersons
has salesperson number 528, the name Jane Adams, a commission percentage of
15 %, and the year of hire of 2003. In an E-R diagram, these attributes are written
or drawn together with the entity, as in Figure 2.1 and the succeeding figures. This
certainly appears to be very natural and obvious. Are there ever any circumstances
in which an attribute can describe something other than an entity?
www.circuitmix.com
26 C h a p t e r 2 Data Modeling
Consider the many-to-many relationship between salespersons and products
in Figure 2.4c. As usual, salespersons are described by their salesperson number,
name, commission percentage, and year of hire. Products are described by their
product number, name, and unit price. But, what if there is a requirement to keep
track of the number of units (call it ‘‘quantity’’) of a particular product that a
particular salesperson has sold? Can we add the quantity attribute to the product
entity box? No, because for a particular product, while there is a single product
number, product name, and unit price, there would be lots of ‘‘quantities,’’ one
for each salesperson selling the product. Can we add the quantity attribute to the
salesperson entity box? No, because for a particular salesperson, while there is a
single salesperson number, salesperson name, commission percentage, and year of
hire, there will be lots of ‘‘quantities,’’ one for each product that the salesperson
sells. It makes no sense to try to put the quantity attribute in either the salesperson
entity box or the product entity box. While each salesperson has a single salesperson
number, name, commission percentage, and year of hire, each salesperson has
many ‘‘quantities,’’ one for each product he sells. Similarly, while each product
has a single product number, product name, and unit price, each product has many
‘‘quantities,’’ one for each salesperson who sells that product. But an entity box in
an E-R diagram is designed to list the attributes that simply and directly describe
the entity, with no complications involving other entities. Putting quantity in either
the salesperson entity box or the product entity box just will not work.
The quantity attribute doesn’t describe either the salesperson alone or the
product alone. It describes the combination of a particular salesperson and a
particular product. In general, we can say that it describes the combination of a
particular occurrence of one entity type and a particular occurrence of the other
entity type. Let’s say that since salesperson number 137 joined the company, she has
sold 170 units of product number 24 013. The quantity 170 doesn’t make sense as
a description or characteristic of salesperson number 137 alone. She has sold many
different kinds of products. To which one does the quantity 170 refer? Similarly,
the quantity 170 doesn’t make sense as a description or characteristic of product
number 24 013 alone. It has been sold by many different salespersons.
In fact, the quantity 170 falls at the intersection of salesperson number 137 and
product number 24013. It describes the combination of or the association between
that particular salesperson and that particular product and it is known as intersection
data. Figure 2.5 shows the many-to-many relationship between salespersons and
F IGURE 2.5
Many-to-many binary relationship with
intersection data
Sells
Sold by
PRODUCT
PK Product
Number
Product
Name
Unit Price
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Quantity
www.circuitmix.com
Binary Relationships 27
products with the intersection data, quantity, represented in a separate box attached
to the relationship line. That is the natural place to draw it. Pictorially, it looks as
if it is at the intersection between the two entities, but there is more to it than that.
The intersection data describes the relationship between the two entities. We know
that an occurrence of the Sells relationship specifies that salesperson 137 has sold
some of product 24013. The quantity 170 is an attribute of this occurrence of that
relationship, further describing this occurrence of the relationship. Not only do we
know that salesperson 137 sold some of product 24013 but we know how many
units of that product that salesperson sold.
Associative Entity Since we know that entities can have attributes and now we
see that many-to-many relationships can have attributes, too, does that mean that
entities and many-to-many relationships can in some sense be treated in the same
way within E-R diagrams? Indeed they can! Figure 2.6 shows the many-to-many
relationship Sells converted into the associative entity SALES. An occurrence of
the SALES associative entity does exactly what the many-to-many relationship did:
it indicates a relationship between a salesperson and a product, specifically the
fact that a particular salesperson has been involved in selling a particular product,
and includes any intersection data that describes this relationship. Note very, very
carefully the reversal of the cardinalities and modalities when the many-to-many
relationship is converted to an associative entity. SALES is now a kind of entity in
its own right. Again, a single occurrence of the new SALES entity type records the
fact that a particular salesperson has been involved in selling a particular product.
A single occurrence of SALES relates to a single occurrence of SALESPERSON
and to a single occurrence of PRODUCT, which is why the diagram indicates that
a sales occurrence involves exactly one salesperson and exactly one product. On
the other hand, since a salesperson sells many products, the diagram shows that a
salesperson will tie into many sales occurrences. Similarly, since a product is sold
by many salespersons, the diagram shows that a product will tie into many sales
occurrences.
If the many-to-many relationship E-R diagram style of Figure 2.5 is equivalent
to the associative entity style of Figure 2.6, which one should you use? This is
an instance in which this type of diagramming is an art with a lot of leeway for
personal taste. However, you should be aware that over time the preference has
shifted towards the associative entity style of Figure 2.6, and that is what we will
use from here on in this book.
Sold
Sold by
Sold
Sold
Product
PRODUCT
PK Product
Number
Product
Name
Unit Price
SALESSALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
PK
Quantity
PK Product
Number
Salesperson
Number
F IGURE 2.6
Associative entity with intersection data
www.circuitmix.com
28 C h a p t e r 2 Data Modeling
The Unique Identifier in Many-to-Many Relationships Since, as we have just seen,
a many-to-many relationship can appear to be a kind of an entity, complete with
attributes, it also follows that it should have a unique identifier, like other entities.
(If this seems a little strange or even unnecessary here, it will become essential later
in the book when we actually design databases based on these E-R diagrams.) In
its most basic form, the unique identifier of the many-to-many relationship or the
associative entity is the combination of the unique identifiers of the two entities
in the many-to-many relationship. So, the unique identifier of the many-to-many
relationship of Figure 2.5 or, as shown in Figure 2.6, of the associative entity, is the
combination of the Salesperson Number and Product Number attributes.
Sometimes, an additional attribute or attributes must be added to this
combination to produce uniqueness. This often involves a time element. As currently
constructed, the E-R diagram in Figure 2.6 indicates the quantity of a particular
product sold by a particular salesperson since the salesperson joined the company.
Thus, there can be only one occurrence of SALES combining a particular salesperson
with a particular product. But if, for example, we wanted to keep track of the sales on
an annual basis, we would have to include a year attribute and the unique identifier
would be Salesperson Number, Product Number, and Year. Clearly, if we want to
know how many units of each product were sold by each salesperson each year,
the combination of Salesperson Number and Product Number would not be unique
because for a particular salesperson and a particular product, the combination of
those two values would be the same each year! Year must be added to produce
uniqueness, not to mention to make it clear in which year a particular value of the
Quantity attribute applies to a particular salesperson-product combination.
The third and last possibility occurs when the nature of the associative entity
is such that it has its own unique identifier. For example, a company might specify
a unique serial number for each sales record. Another example would be the many-
to-many relationship between motorists and police officers who give traffic tickets
for moving violations. (Hopefully it’s not too many for each motorist!) The unique
identifier could be the combination of police officer number and motorist driver’s
license number plus perhaps date and time. But, typically, each traffic ticket has a
unique serial number and this would serve as the unique identifier.
UNARY RELATIONSHIPS
Unary relationships associate occurrences of an entity type with other occurrences
of the same entity type. Take the entity person, for example. One person may be
married to another person and vice versa. One person may be the parent of other
people; conversely, a person may have another person as one of their parents.
One-to-One Unary Relationship
Figure 2.7a shows the one-to-one unary relationship called Back-Up involving the
salesperson entity. The salespersons are organized in pairs as backup to each other
when one is away from work. Following one of the links, say the one that extends
from the right side of the salesperson entity box, we can say that salesperson
number 137 backs-up salesperson number 186. Then, going in the other direction,
salesperson number 186 backs-up salesperson 137. Notice that in each direction the
www.circuitmix.com
Unary Relationships 29
Y O U R
T U R N
2.1 MODELING YOUR WORLD—PART 1
Whether it’s a business environment
or a personal environment, the entities, attributes, and
relationships around us can be modeled with E-R
diagrams.
QUESTION:
How many binary relationships can you think of in your
school environment? The entities might be students,
professors, courses, sections, buildings, departments,
textbooks, and so forth. Make a list of the binary
relationships between pairs of these entities and
diagram them with E-R diagrams. Do any of the many-
to-many binary relationships have intersection data?
Explain.
modality of one rather than zero forbids the situation of a salesperson not having a
backup.
One-to-Many Unary Relationship
Some of the salespersons are also sales managers, managing other salespersons.
A sales manager can manage several other salespersons. Further, there can be
several levels of sales managers, i.e. several low-level sales managers can be
managed by a higher-level sales manager. Each salesperson (or sales manager) is
managed by exactly one sales manager. This situation describes a one-to-many
unary relationship. Consider Figure 2.7b and follow the downward branch out of
its salesperson entity box. It says that a salesperson manages zero to many other
salespersons, meaning that a salesperson may not be a sales manager (the zero
modality case) or may be a sales manager with several subordinate salespersons
(the many cardinality case.) Following the branch that extends from the right side
of the salesperson entity box, the diagram says that a salesperson is managed by
exactly one other salesperson (who must, of course, be a sales manager).
Many-to-Many Unary Relationship
Unary relationships also come in the many-to-many variety. One classic example
of a many-to-many unary relationship is known as the ‘‘bill of materials’’ problem.
Consider a complex mechanical object like an automobile, an airplane, or a large
factory machine tool. Any such object is made of basic parts like nuts and bolts
that are used to make other components or sub-assemblies of the object. Small sub-
assemblies and basic parts go together to make bigger sub-assemblies, and so on
until ultimately they form the entire object. Each basic part and each sub-assembly
can be thought of as a ‘‘part’’ of the object. Then, the parts are in a many-to-many
unary relationship to each other. Any one particular part can be made up of several
other parts while at the same time itself being a component of several other parts.
In Figure 2.7c, think of the products sold in hardware and home improvement
stores. Basic items like hammers and wrenches can be combined and sold as sets.
Larger tool sets can be composed of smaller sets plus additional single tools. All of
these, single tools and sets of all sizes can be classified as products. Thus, as shown
in Figure 2.7c, a product can be part of no other products or part of several other
www.circuitmix.com
30 C h a p t e r 2 Data Modeling
F IGURE 2.7
Unary relationships
COMPONENT
PK
Quantity
PK Subassembly
Number
Product
Number
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Backs-up
Backed-up by
Manages
Reports to
PRODUCT
PK Product
Number
Product
Name
Unit Price
SALESPERSON
PK
Salesperson
Name
Commission
Percentage
Year of Hire
(a.) One-to-one (1–1) unary relationship
(b.) One-to-many (1–M) unary relationship
(c.) Many-to-many (M–M) unary relationship
Part of
Includes
Part of
Includes
One
Salesperson
One
Salesperson
One
Salesperson
Many
Products
Many
Products
No
Products
No
Products
No
Salespersons
Many
Salespersons
Salesperson
Number
www.circuitmix.com
Example: The General Hardware Company 31
products. Going in the reverse direction, a product can be composed of no other
products or be composed of several other products.
TERNARY RELATIONSHIPS
A ternary relationship involves three different entity types. Assume for the
moment that any salesperson can sell to any customer. Then, Figure 2.8 shows
the most general, many-to-many-to-many ternary relationship among salespersons,
customers, and products. It means that we know which salesperson sold which
product to which customer. Each sale has intersection data consisting of the date of
the sale and the number of units of the product sold.
EXAMPLE: THE GENERAL HARDWARE COMPANY
Figure 2.9 is the E-R diagram for the General Hardware Company, parts of which
we have been using throughout this chapter. General Hardware is a wholesaler
and distributor of various manufacturers’ tools and other hardware products. Its
customers are hardware and home improvement stores, which in turn sell the
products at retail to individual consumers. Again, as a middleman it buys its goods
from the manufacturers and then sells them to the retail stores. How exactly does
CUSTOMER
PK Customer
Number
Customer
Name
HQ City
SALE
PK Salesperson
Number
PK Product
Number
PK Customer
Number
Date
Quantity
SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
PRODUCT
PK Product
Number
Product
Name
Unit Price
One
Salesperson
Many
Salespersons
One
Customer
Many
Customers
Purchased
Sold to
Sold
Sold
Product
Sold
Sold by
One
Product
Many
Products
F IGURE 2.8
Ternary relationship
www.circuitmix.com
32 C h a p t e r 2 Data Modeling
F IGURE 2.9
The General Hardware Company E-R
diagram
PK Employee
Number
Customer
Number
PK
CUSTOMER
EMPLOYEE
Employee
Name
Title
Office
Number
OFFICE
PK
Telephone
Size
Salesperson
Number
SALESPERSON
PK
Salesperson
Name
Commission
Percentage
Year of Hire
Customer
Number
CUSTOMER
PK
Customer
Name
HQ City
Product
Number
PRODUCT
PK
Product
Name
Unit Price
PK Product
Number
Salesperson
Number
SALES
PK
Quantity
Occupied by
Works in
Sells to
Buys from
Sold
Sold by
Sold
Sold
Product
Employs
Employed by
www.circuitmix.com
Example: The General Hardware Company 33
Y O U R
T U R N
2.2 MODELING YOUR WORLD—PART 2
Can you think of unary and ternary
relationships in your world?
QUESTION:
How many unary and ternary relationships can you think
of in your school environment? As in Your Turn 2-1,
make a list of the unary and ternary relationships in
the school environment and diagram them with E-R
diagrams. Do any of the many-to-many-many ternary
relationships have intersection data? Explain.
General Hardware operate? Now that we know something about E-R diagrams, let’s
see if we can figure it out from Figure 2.9!
Begin with the SALESPERSON entity box in the middle on the left.
SALESPERSON has four attributes with one of them, Salesperson Number, serving
as the unique identifier of the salespersons. Looking upwards from SALESPERSON,
a salesperson works in exactly one office (indicated by the double ones or bars
encountered on the way to the OFFICE entity). OFFICE has three attributes;
Office Number is the unique identifier. Looking back downwards from the OFFICE
entity box, an office has either no salespersons working in it (the zero modality
symbol) or one salesperson (the one or bar cardinality symbol). Starting again
at the SALESPERSON entity box and moving to the right, a salesperson has no
customers or many customers. (Remember that the customers are hardware or
home improvement stores.) The CUSTOMER entity has three attributes; Customer
Number is the unique identifier. In the reverse direction, a customer must have
exactly one General Hardware salesperson.
Below the CUSTOMER entity is the CUSTOMER EMPLOYEE entity.
According to the figure, a customer must have at least one but can have many
employees. An employee works for exactly one customer. This is actually a special
situation. General Hardware only has an interest in maintaining data about the people
who are its customers’ employees as long as their employer remains a customer of
General Hardware. If a particular hardware store or home improvement chain stops
buying goods from General Hardware, then General Hardware no longer cares about
that store’s or chain’s employees. Furthermore, while General Hardware assumes
that each of its customers assigns their employees unique employee numbers, those
numbers can be assumed to be unique only within that customer store or chain.
Thus, the unique identifier for a customer employee must be the combination
of the Customer Number and the Employee Number attributes. In this situation,
CUSTOMER EMPLOYEE is called a dependent or weak entity.
Returning to the SALESPERSON entity box and looking downward, there
is a one-to-many relationship between salespersons and sales. But, below that,
there is also a one-to-many relationship from products to sales. Also note that the
unique identifier of SALES is the combination of Salesperson Number and Product
Number. This is the signal that there is a many-to-many relationship between
salespersons and products! A salesperson is authorized to sell at least one and
generally many products. A product is sold by at least one and generally many
salespersons. The PRODUCT entity has three attributes, with Product Number being
www.circuitmix.com
34 C h a p t e r 2 Data Modeling
the unique identifier. The attribute Quantity is intersection data in the many-to-many
relationship and so becomes an attribute in the associative entity SALES that links
salespersons with the products they have sold in a many-to-many relationship.
EXAMPLE: GOOD READING BOOK STORES
Figure 2.10 shows the E-R diagram for Good Reading Bookstores. Good Reading
is a chain of bookstores that wants to keep track of the books that it sells, their
publishers, their authors, and the customers who buy them. The BOOK entity has
four attributes. Book Number is the unique identifier. A book has exactly one
publisher. Publisher Name is the unique identifier of the PUBLISHER entity. A
publisher may have (and generally has) published many books that Good Reading
carries; however, Good Reading also wants to be able to keep track of some
publishers that currently have no books in Good Reading’s inventory (note the
zero-modality symbol from PUBLISHER towards BOOK). A book must have at
least one author but can have many (where in this case ‘‘many’’ means a few,
generally two or three at most). For a person to be of interest to Good Reading
as an author, she must have written at least one and possibly many books that
Good Reading carries. Note that there is a many-to-many relationship between the
Publisher
Name
PUBLISHER
PK
City
Country
President
Year Founded
Customer
Number
Author
Number
Book
Number
BOOK
PK
Book Name
Publication
Year
Pages
PK Author
Number
Book
Number
WROTE
PK
PK Customer
Number
Book
Number
CUSTOMER
PK
Customer
Name
Street
City
State
Country
AUTHOR
PK
Author Name
Year Born
Year Died
SALE
PK
Date
Price
Quantity
Published
Published by Wrote
Written by
Wrote
Written by
Bought
Bought by
Sold
In sale
F IGURE 2.10
Good Reading Bookstores entity-relationship diagram
www.circuitmix.com
Example: World Music Association 35
Y O U R
T U R N
2.3 MODELING YOUR WORLD—PART 3
Now it’s time to put the university
environment all together.
QUESTION:
Create one comprehensive E-R diagram for your university
environment that you developed in Your Turn Parts 1
and 2.
BOOK and AUTHOR that is realized in the associative entity WROTE, which has
no intersection data. The company wants to keep track of which authors wrote
which books, but there are no attributes that further describe that many-to-many
relationship. The associative entity SALE indicates that there is a many-to-many
relationship between books and customers. A book can be involved in many sales
and so can a customer. But a particular sale involves just one book and one customer.
Date, Price, and Quantity are intersection data in the many-to-many relationship
between the BOOK and CUSTOMER entities.
Does this make sense? Might a customer have bought several copies of the
same book on the same date? After all, that’s what the presence of the Quantity
attribute implies. And might she have then bought more copies of the same book on
a later date? Yes to both questions! A grandmother bought a copy of a book for each
of three of her grandchildren one day and they liked it so much that she returned and
bought five more copies of the same book for her other five grandchildren several
days later. By the way, notice that the modality 0 going from book to sale says
that a book may not have been involved in any sales (maybe it just came out). The
modality of 1 going from customer to book says that for a person to be considered
a customer, he must have participated in at least one sale, which is reasonable.
EXAMPLE: WORLD MUSIC ASSOCIATION
The World Music Association (WMA) is an organization that maintains information
about its member orchestras and the recordings they have made. The WMA
E-R diagram in Figure 2.11 shows the information about the orchestras and their
musicians across the top and the information about the recordings in the rest of
the diagram. Each orchestra has at least one and possibly many musicians. (In this
case, the modality expressing ‘‘at least one’’ is a technicality. Certainly an orchestra
must have many musicians.) A musician might not work for any orchestra (perhaps
she is currently unemployed but WMA wants to keep track of her anyway) or may
work for just one orchestra. A musician may not be a college graduate or may have
several college degrees. A degree belongs to just one musician (for the moment we
ignore the possibility that more than one musician earned the same degree from
the same university in the same year). Since the DEGREE entity is dependent on
the MUSICIAN entity, the unique identifier for DEGREE is the combination of the
Musician Number and Degree (e.g. B.A.) attributes.
Looking downward from the ORCHESTRA entity box, an orchestra may
have made no recordings of a particular composition or may have made many. In
the reverse direction, a composition may not have been recorded by any orchestra
www.circuitmix.com
36 C h a p t e r 2 Data Modeling
Orchestra
Name
ORCHESTRA
PK
City
Country
Music
Director
Orchestra
Name
Composer
Name
Musician
Number
MUSICIAN
PK
Musician
Name
Instrument
Annual
Salary
PK Degree
Musician
Number
DEGREE
PK
University
Year
PK Composer
Name
Composition
Name
RECORDING
PK
Composition
Name
PK
Composer
Name
PK
Year
Price
COMPOSER
PK
Country
Date of Birth
COMPOSITION
PK
Year
Employs
Employed by Earned by
Earned
Recorded
Contains
Wrote
Written by
Recorded
Recorded by
F IGURE 2.11
World Music Association entity-relationship diagram
(but we still want to maintain data about it) or may have been recorded by many
orchestras. For a particular recording, we note the year of the recording and the retail
price, as intersection data of the many-to-many relationship between orchestras and
compositions. In fact, RECORDING is an associative entity. A composer may have
several compositions to his credit but must have at least one to be of interest to
WMA. A composition is associated with exactly one composer. COMPOSITION
is a dependent entity to COMPOSER, which means that the unique identifier of
COMPOSITION is the combination of Composer Name and Composition Name.
After all, there could be Beethoven’s ‘‘Third Symphony’’ and Mozart’s ‘‘Third
Symphony.’’ This has an important implication for the RECORDING associative
entity. To uniquely identify a recording (and attach the year and price intersection
data to it) requires an Orchestra Name, Composition Name, and Composer Name.
EXAMPLE: LUCKY RENT-A-CAR
Lucky Rent-A-Car’s business environment is, obviously, centered on its cars. This
is literally true in its E-R diagram, shown in Figure 2.12. A car was manufactured by
exactly one manufacturer. A manufacturer manufactured at least one and generally
many of Lucky’s cars. A car has had many maintenance events (but a brand new
car may not have had any, yet.) A car may not have been rented to any customers
(again, the case of a brand new car) or to many customers. A customer may have
rented many cars from Lucky, and to be in Lucky’s business environment must
www.circuitmix.com
Summary 37
FIGURE 2.12
Lucky Rent-A-Car entity-relationship
diagram
PK Customer
Number
Car Serial
Number
RENTAL
PK
Rental Date
Return Date
Total Cost
Manufacturer
Name
MANUFACTURER
PK
Manufacturer
Country
Sales Rep
Name
Sales Rep
Number
Car Serial
Number
CAR
PK
Model
Year
Class
Customer
Number
CUSTOMER
PK
Customer
Name
Customer
Address
Customer
Credit Rating
MAINTENANCE
EVENT
Manufactured
Manufactured by
Rented
Car rented
Repaired
Car Repaired
Rented
Rented by
Repair
Number
PK
Date
Procedure
Mileage
Repair Time
have rented at least one. Rental Date, Return Date, and Total Cost are intersection
data to the many-to-many relationship between CAR and CUSTOMER, as shown
in the associative entity RENTAL.
SUMMARY
Being able to express entities, attributes, and relationships is an important
preliminary step towards database management. The Entity-Relationship Model
is a diagramming technique that gives us this capability. The E-R model can
display unary relationships (relationships between entities of the same type,) binary
relationships (relationships between entities of two different types), and ternary
relationships (relationships between entities of three different types). Based on the
number of distinct entities involved in a relationship, we expand this to one-to-one,
one-to-many, and many-to-many unary relationships, one-to-one, one-to-many, and
www.circuitmix.com
38 C h a p t e r 2 Data Modeling
many-to-many binary relationships, and ternary relationships (which we consider
to in general be many-to-many-to-many.)
Other terms and concepts discussed include cardinality (the maximum number
of entities that can be involved in a particular relationship), modality (the minimum
number of entity occurrences that can be involved in a relationship), intersection
data (data that describes a many-to-many relationship), and associative entities.
KEY TERMS
Attribute
Associate entity
Binary relationship
Cardinality
Data modeling
Entity
Entity-relationship (E-R) diagram
Entity-relationship (E-R) model
Intersection data
Many-to-many relationship
Modality
One-to-many relationship
One-to-one relationship
Relationship
Ternary relationship
Unary relationship
Unique identifier
QUESTIONS
1. What is data modeling? Why is it important?
2. What is the Entity-Relationship model?
3. What is a relationship?
4. What are the differences among a unary relationship,
a binary relationship, and a ternary relationship?
5. Explain and compare the cardinality of a relationship
and the modality of a relationship.
6. Explain the difference between a one-to-one, a one-
to-many, and a many-to-many binary relationship.
7. What is intersection data in a many-to-many binary
relationship? What does the intersection data
describe?
8. Can a many-to-many binary relationship have no
intersection data? Explain.
9. Can intersection data be placed in the entity box
of one of the two entities in the many-to-many
relationship? Explain.
10. What is an associative entity? How does intersection
data relate to an associative entity?
11. Describe the three cases of unique identifiers for
associative entities.
12. Describe the concept of the unary relationship.
13. Explain how a unary relationship can be described
as one-to-one, one-to-many, and many-to-many if
only one entity type is involved in the relationship.
14. Describe the ternary relationship concept.
15. Can a ternary relationship have intersection data?
Explain.
16. What is a dependent entity? (See the description in
the General Hardware example.)
EXERCISES
1. Draw an entity-relationship diagram that describes
the following business environment.
The city of Chicago, IL, wants to maintain
information about its extensive system of high
schools, including its teachers and their university
degrees, its students, administrators, and the subjects
that it teaches.
Each school has a unique name, plus an address,
telephone number, year built, and size in square
feet. Students have a student number, name, home
address, home telephone number, current grade,
and age. Regarding a student’s school assignment,
the school system is only interested in keeping
track of which school a student currently attends.
Each school has several administrators, such as the
principal and assistant principals. Administrators are
identified by an employee number and also have a
name, telephone number, and office number.
www.circuitmix.com
Minicases 39
Teachers are also identified by an employee
number and each has a name, age, subject specialty
such as English (assume only one per teacher),
and the year that they entered the school system.
Teachers tend to move periodically from school to
school and the school system wants to keep track
of the history of which schools the teacher has
taught in, including the current school. Included
will be the year in which the teacher entered the
school, and the highest pay rate that the teacher
attained at the school. The school system wants
to keep track of the universities that each teacher
attended, including the degrees earned and the
years in which they were earned. The school
system wants to record each university’s name,
address, year founded, and Internet URL (address).
Some teachers, as department heads, supervise other
teachers. The school system wants to keep track of
these supervisory relationships but only for teachers’
current supervisors.
The school system also wants to keep track of
the subjects that it offers (e.g. French I, Algebra III,
etc.). Each subject has a unique subject number, a
subject name, the grade level in which it is normally
taught, and the year in which it was introduced in
the school system. The school system wants to keep
track of which teacher taught which student which
subject, including the year this happened and the
grade received.
2. The following entity-relationship diagram describes
the business environment of Video Centers of
Europe, Ltd., which is a chain of videotape and
DVD rental stores. Write a verbal description of
how VCE conducts its business, based on this E-R
diagram.
Recorded on
Contains
Rents
Rented by
Acts in
Has actor
Owns
Located in
Is rented
Involves
Name
ACTOR
PK
Date of Birth
Nationality
Store
Number
STORE
PK
City
Country
Telephone
Title
MOVIE
PK
Length
Year Made
Serial
Number
DISK
PK
Type (DVD
or Blu Ray)
Customer
Number
CUSTOMER
PK
Name
Address
Telephone
Serial
Number
Customer
Number
Date
RENTAL
PK
PK
PK
Rental Price
Figure for Exercise 2
MINICASES
1. Draw an entity-relationship diagram that describes the
following business environment.
Happy Cruise Lines has several ships and a variety
of cruise itineraries, each involving several ports of
call. The company wants to maintain information on
the sailors who currently work on each of its ships.
It also wants to keep track of both its past and future
cruises and of the passengers who sailed on the former
and are booked on the latter.
Each ship has at least one and, of course, normally
many sailors on it. The unique identifier of each ship
is its ship number. Other ship attributes include ship
name, weight, year built, and passenger capacity. Each
sailor has a unique sailor identification number, as well
as a name, date of birth, and nationality. Some of the
sailors are in supervisory positions, supervising several
other sailors. Each sailor reports to just one supervisor.
A cruise is identified by a unique cruise serial number.
Other cruise descriptors include a sailing date, a return
date, and a departure port (which is also the cruise’s
ending point). Clearly, a cruise involves exactly one
ship; over time a ship sails on many cruises, but there
www.circuitmix.com
40 C h a p t e r 2 Data Modeling
is a requirement to be able to list a new ship that has
not yet sailed on any cruises at all. Each cruise stops
at at least one and usually several ports of call, each of
which is normally host to many cruises, over time. In
addition, the company wants to maintain information
about ports that it has not yet used in its cruises but may
use in the future. A port is identified by its name and the
country it is in. Other information about a port includes
its population, whether a passport is required for
passengers to disembark there, and its current docking
fee, which is assumed to be the same for all ships.
Passenger information includes a unique passenger
number, name, home address, nationality, and date of
birth. A cruise typically has many passengers on it
(certainly at least one). Hoping for return business,
the company assumes that each passenger may have
sailed on several of its cruises (and/or may be booked
for a future cruise). For a person to be of interest to
the company, he or she must have sailed on or be
booked on at least one of the company’s cruises. The
company wants to keep track of how much money each
passenger paid (or will pay) for each of their cruises,
as well as their satisfaction rating of the cruise, if it has
been completed.
2. Draw an entity-relationship diagram that describes the
following business environment. The Super Baseball
League wants to maintain information about its teams,
their coaches, players, and bats. The information about
players is historical. For each team, the league wants
to keep track of all of the players who have ever played
on the team, including the current players. For each
player, it wants to know about every team the player
ever played for. On the other hand, coach affiliation
and bat information is current, only.
The league wants to keep track of each team’s team
number, which is unique, its name, the city in which
it is based, and the name of its manager. Coaches
have a name (which is assumed to be unique only
within its team) and a telephone number. Coaches
have units of work experience that are described by
the type of experience and the number of years of
that type of experience. Bats are described by their
serial numbers (which are unique only within a team)
and their manufacturer’s name. Players have a player
number that is unique across the league, a name, and
an age.
A team has at least one and usually several coaches.
A coach works for only one team. Each coach has
several units of work experience or may have none.
Each unit of work experience is associated with the
coach to whom it belongs. Each team owns at least one
and generally many bats. Currently and historically,
each team has and has had many players. To be of
interest to the league, a player must have played on at
least one and possibly many teams during his career.
Further, the league wants to keep track of the number
of years that a player has played on a team and the
batting average that he compiled on that team.
www.circuitmix.com
C H A P T E R 3
THE DATABASE
MANAGEMENT SYSTEM
CONCEPT
D ata has always been the key component of information systems. In the beginning
of the modern information systems era, data was stored in simple files. As
companies became more and more dependent on their data for running their businesses,
shortcomings in simple files became apparent. These shortcomings led to the development
of the database management system concept, which provides a solid basis for the modern
use of data in organizations of all descriptions.
OBJECTIVES
■ Define data-related terms such as entity and attribute and storage-related terms
such as field, record, and file.
■ Identify the four basic operations performed on stored data.
■ Compare sequential access of data with direct access of data.
■ Discuss the problems encountered in a non-database information systems
environment.
■ List the five basic principles of the database concept.
■ Describe how data can be considered to be a manageable resource.
■ List the three problems created by data redundancy.
■ Describe the nature of data redundancy among many files.
■ Explain the relationship between data integration and data redundancy in one file.
■ State the primary defining feature of a database management system.
■ Explain why the ability to store multiple relationships is an important feature of
the database approach.
■ Explain why providing support for such control issues as data security, backup
and recovery, and concurrency is an important feature of the database approach.
■ Explain why providing support for data independence is an important feature of
the database approach.
www.circuitmix.com
42 C h a p t e r 3 The Database Management System Concept
CHAPTER OUTLINE
Introduction
Data Before Database Management
Records and Files
Basic Concepts in Storing and
Retrieving Data
The Database Concept
Data as a Manageable Resource
Data Integration and Data
Redundancy
Multiple Relationships
Data Control Issues
Data Independence
DBMS Approaches
Summary
INTRODUCTION
Before the database concept was developed, all data in information systems (then
generally referred to as ‘‘data processing systems’’) was stored in simple linear
files. Some applications and their programs required data from only one file. Some
applications required data from several files. Some of the more complex applications
used data extracted from one file as the search argument (the item to be found)
for extracting data from another file. Generally, files were created for a single
application and were used only for that application. There was no sharing of files or
of data among applications and, as a result, the same data often appeared redundantly
in multiple files. In addition to this data redundancy among multiple files, a lack of
sophistication in the design of individual files often led to data redundancy within
those individual files.
As information systems continued to grow in importance, a number of
the ground rules began to change. Hardware became cheaper—much cheaper
relative to the computing power that it provided. Software development took on a
more standardized, ‘‘structured’’ form. Large backlogs of new applications to be
implemented built up, making the huge amount of time spent on maintaining existing
programs more and more unacceptable. It became increasingly clear that the lack of
a focus on data was one of the major factors in this program maintenance dilemma.
Furthermore, the redundant data across multiple files and even within individual
files was causing data accuracy nightmares (to be explained further in this chapter),
just as companies were relying more and more on their information systems to
substantially manage their businesses. As we will begin to see in this chapter, the
technology that came to the rescue was the database management system.
Summarizing, the problems included:
■ Data was stored in different formats in different files.
■ Data was often not shared among different programs that needed it, necessitating
the duplication of data in redundant files.
■ Little was understood about file design, resulting in redundant data within
individual files.
■ Files often could not be rebuilt after damage by a software error or a hardware
failure.
■ Data was not secure and was vulnerable to theft or malicious mischief by people
inside or outside the company.
■ Programs were usually written in such a manner that if the way that the data was
stored changed, the program had to be modified to continue working.
■ Changes in everything from access methods to tax tables required programming
changes.
www.circuitmix.com
Data Before Database Management 43
This chapter will begin by presenting some basic definitions and concepts
about data. Then it will describe the type of file environment that existed before
database management emerged. Then it will describe the problems inherent in the
file environment and show how the database concept overcame them and set the
stage for a vastly improved information systems environment.
DATA BEFORE DATABASE MANAGEMENT
As we said in Chapter 1, pieces of data are facts in our environment that are
important to us. Usually we have many facts to describe something of interest to us.
For example, let’s consider the facts we might be interested in about an employee
of ours named John Baker. Our company is a sales-oriented company and John
Baker is one of our salespersons. We want to remember that his employee number
(which we will now call his salesperson number) is 137. We are also interested in
the facts that his commission percentage on the sales he makes is 10%, his home
city is Detroit, his home state is Michigan, his office number is 1284, and he was
hired in 1995. There are, of course, reasons that we need to keep track of these facts
about John Baker, such as generating his paycheck every week. It certainly seems
reasonable to collect together all of the facts about Baker that we need and to hold
all of them together. Figure 3.1 shows all of these facts about John Baker presented
in an organized way.
Records and Files
Since we have to generate a paycheck each week for every employee in our
company, not just for Baker, we are obviously going to need a collection of facts
like those in Figure 3.1 for every one of our employees. Figure 3.2 shows a portion
of that collection.
F IGURE 3.1
Facts about salesperson Baker
Salesperson Salesperson Office Commission Year of
Number Name City State Number Percentage Hire
137 Baker Detroit MI 1284 10 1995
F IGURE 3.2
Salesperson file
Salesperson Salesperson Office Commission Year of
Number Name City State Number Percentage Hire
119 Taylor New York NY 1211 15 2003
137 Baker Detroit MI 1284 10 1995
186 Adams Dallas TX 1253 15 2001
204 Dickens Dallas TX 1209 10 1998
255 Lincoln Atlanta GA 1268 20 2003
361 Carlyle Detroit MI 1227 20 2001
420 Green Tucson AZ 1263 10 1993
www.circuitmix.com
44 C h a p t e r 3 The Database Management System Concept
CONCEPTS
IN ACT ION
3-A MEMPHIS LIGHT, GAS AND WATER
Memphis Light, Gas and Water
(MLGW) is the largest ‘‘three-service’’ (electricity, natu-
ral gas and water) municipal utility system in the United
States. It serves over 400,000 customers in Memphis and
Shelby County, TN, and has 2,600 employees. MLGW is
the largest of the 159 distributors of the federal Tennessee
Valley Authority’s electricity output. It brings in natural
gas via commercial pipelines and it supplies water from
a natural aquifer beneath the city of Memphis.
Like any supplier of electricity, MLGW is particularly
sensitive to electrical outages. It has developed a two-
stage application system to determine the causes of
outages and to dispatch crews to fix them. The first
stage is the Computer-Aided Restoration of Electric
Service (CARES) system, which was introduced in 1996.
Beginning with call-in patterns as customers report
outages, CARES uses automated data from MLGW’s
electric grid, wiring patterns to substations, and other
information, to function as an expert system to determine
the location and nature of the problem. It then feeds
its conclusion to the second-stage Mobile Dispatching
System (MDS), which was introduced in 1999. MDS
‘‘Photo Courtesy of Memphis Light, Gas, and Water Division’’
sends a repairperson to an individual customer’s location
if that is all that has been affected or sends a crew to
a malfunctioning or damaged piece of equipment in the
grid that is affecting an entire neighborhood. There is a
feedback loop in which the repairperson or crew reports
back to indicate whether the problem has been fixed or
a higher-level crew is required to fix it.
The CARES and MDS systems are supported by
an Oracle database running on Hewlett-Packard and
Compaq Alpha Unix platforms. The database includes
a wide range of tables: a Customer Call table has one
record per customer reporting call; an Outage table has
one record per outage; a Transformer table has one
record for each transformer in the grid; a Device table
has records for other devices in the grid. These can
also interface to the Customer Information System, which
has a Customer table with one record for each of the
over 400,000 customers. In addition to its operational
value, CARES and other systems feed a System Reliability
Monitoring database that generates reports on outages
and can be queried to gain further knowledge of outage
patterns for improving the grid.
www.circuitmix.com
Data Before Database Management 45
Let’s proceed by revisiting some terminology from Chapter 2, and introducing
some additional terminology along with some additional concepts. What we have
been loosely referring to as a ‘‘thing’’ or ‘‘object’’ in our environment that we want
to keep track of is called an entity. Remember that this is the real physical object or
event, not the facts about it. John Baker, the real, living, breathing person whom you
can go over to and touch, is an entity. A collection of entities of the same type (e.g.,
all the company’s employees) is called an entity set. An attribute is a property of,
a characteristic of, or a fact that we know about an entity. Each characteristic or
property of John Baker, including his salesperson number 137, his name, city of
Detroit, state of Michigan, office number 1284, commission percentage 10, and year
of hire 1995, are all attributes of John Baker. Some attributes have unique values
within an entity set. For example, the salesperson numbers are unique within the
salesperson entity set, meaning each salesperson has a different salesperson number.
We can use the fact that salesperson numbers are unique to distinguish among the
different salespersons.
Using the structure in Figure 3.2, we can define some standard file-structure
terms and relate them to the terms entity, entity set, and attribute. Each row in
Figure 3.2 describes a single entity. In fact, each row contains all the facts that we
know about a particular entity. The first row contains all the facts about salesperson
119, the second row contains all the facts about salesperson 137, and so on. Each
row of a structure like this is called a record. The columns representing the facts
are called fields. The entire structure is called a file. The file in Figure 3.2, which
is about the most basic kind of file imaginable, is often called a simple file or a
simple linear file (linear because it is a collection of records listed one after the
other in a long line). Since the salesperson attribute is unique, the salesperson field
values can be used to distinguish the individual records of the file. Speaking loosely
at this point, the salesperson number field can be referred to as the key field or key
of the file.
Tying together the two kinds of terminology that we have developed, a record
of a file describes an entity, a whole file contains the descriptions of an entire entity
set, and a field of a record contains an attribute of the entity described by that
record. In Figure 3.2, each row is a record that describes an entity, specifically a
single salesperson. The whole file, row by row or record by record, describes each
salesperson in the collection of salespersons. Each column of the file represents a
different attribute of salespersons. At the row or entity level, the salesperson name
field for the third row of the file indicates that the third salesperson, salesperson
186, has Adams as his salesperson name attribute, i.e. he is named Adams.
One last terminology issue is the difference between the terms ‘‘type’’ and
‘‘occurrence.’’ Let’s talk about it in the context of a record. If you look at a file,
like that in Figure 3.2, there are two ways to describe ‘‘a record.’’ One, which is
referred to as the record type, is a structural description of each and every record in
the file. Thus, we would describe the salesperson record type as a record consisting
of a salesperson number field, a salesperson name field, a city field, and so forth.
This is a general description of what any of the salesperson records looks like. The
other way of describing a record is referred to as a record occurrence or a record
instance. A specific record of the salesperson file is a record occurrence or instance.
Thus, we would say that, for example, the set of values {186, Adams, Dallas, TX,
1253, 15, 2001} is an occurrence of the salesperson record type.
www.circuitmix.com
46 C h a p t e r 3 The Database Management System Concept
Y O U R
T U R N
3.1 ENTITIES AND ATTRIBUTES
Entities and their attributes are all
around us in our everyday lives. Normally, we don’t stop
to think about the objects or events in our world formally
as entities with their attributes, but they’re there.
QUESTION:
Choose an object in your world that you interact with
frequently. It might be a university, a person, an
automobile, your home, etc. Make a list of some of
the chosen entity’s attributes. Then, generalize them to
‘‘type.’’ For example, you may have a backpack (an
entity) that is green in color (an attribute of that entity).
Generalize that to the entity set of all backpacks and
to the attribute type color. Next, go through the same
exercise for an event in your life, such as taking a
particular exam, your last birthday party, eating dinner
last night, etc.
Basic Concepts in Storing and Retrieving Data
Having established the idea of a file and its records, we can now, in simple terms at
this point, envision a company’s data as a large collection of files. The next step is to
discuss how we might want to access data from these files and otherwise manipulate
the data in them.
Retrieving and Manipulating Data There are four fundamental operations that can
be performed on stored data, whether it is stored in the form of a simple linear file,
such as that of Figure 3.2, or in any other form. They are:
■ Retrieve or Read
■ Insert
■ Delete
■ Update
It is convenient to think of each of these operations as basically involving one
record at a time, although in practice they can involve several records at once, as
we will see later in the book. Retrieving or reading a record means looking at a
record’s contents without changing them. For example, using the Salesperson file
of Figure 3.2, we might read the record for salesperson 204 because we want to
find out what year she was hired. Insertion means adding a new record to the file,
as when a new salesperson is hired. Deletion means deleting a record from the
file, as when a salesperson leaves the company. Updating means changing one or
more of a record’s field values, for example if we want to increase salesperson
420’s commission percentage from 10 to 15. There is clearly a distinction between
retrieving or reading data and the other three operations. Retrieving data allows a
user to refer to the data for some business purpose without changing it. All of the
other three operations involve changing the data. Different topics in this book will
focus on one or another of these operations simply because a particular one of the
four operations may be more important for a particular topic than the others.
One particularly important concept concerning data retrieval is that, while
information systems applications come in a countless number of variations, there
are fundamentally only two kinds of access to stored data that any of them require.
www.circuitmix.com
Data Before Database Management 47
These two ways of retrieving data are known as sequential access and direct
access.
Sequential Access The term sequential access means the retrieval of all or a portion
of the records of a file one after another, in some sequence, starting from the
beginning, until all the required records have been retrieved. This could mean all the
records of the file, if that is the goal, or all the records up to some point, such as up
to the point that a record being searched for is found. The records will be retrieved
in some order and there are two possibilities for this. In ‘‘physical’’ sequential
access, the records are retrieved one after the other, just as they are stored on the disk
device (more on these devices later). In ‘‘logical’’ sequential access the records
are retrieved in order based on the values of one or a combination of the fields.
Assuming the records of the Salesperson file of Figure 3.2 are stored on the
disk in the order shown in the figure, if they are retrieved in physical sequence they
will be retrieved in the order shown in the figure. However, if, for example, they
are to be retrieved in logical sequence based on the Salesperson Name field, then
the record for Adams would be retrieved first, followed by the record for Baker,
followed by the record for Carlyle, and so on in alphabetic order. An example of
an application that would require the sequential retrieval of the records of this file
would be the weekly payroll processing. If the company wants to generate a payroll
check for each salesperson in the order of their salesperson numbers, it can very
simply retrieve the records physically sequentially, since that’s the order in which
they are stored on the disk. If the company wants to produce the checks in the order
of the salespersons’ names, it will have to perform a logical sequential retrieval
based on the Salesperson Name field. It can do this either by sorting the records on
the Salesperson Name field or by using an index (see below) that is built on this
field.
We said that sequential access could involve retrieving a portion of the records
of a file. This sense of sequential retrieval usually means starting from the beginning
of the file and searching every record, in sequence, until finding a particular record
that is being sought. Obviously, this could take a long time for even a moderately
large file and so is not a particularly desirable kind of operation, which leads to the
concept of direct access.
Direct Access The other mode of access is direct access. Direct access is the
retrieval of a single record of a file or a subset of the records of a file based on
one or more values of a field or a combination of fields in the file. For example, in
the Salesperson file of Figure 3.2, if we need to retrieve the record for salesperson
204 to find out her year of hire, we would perform a direct access operation on
the file specifying that we want the record with a value of 204 in the Salesperson
Number field. How do we know that we would retrieve only one record? Because
the Salesperson Number field is the unique, key field of the file, there can only be
one record (or none) with any one particular value. Another possibility is that we
want to retrieve the records for all the salespersons with a commission percentage of
10. The subset of the records retrieved would consist of the records for salespersons
137, 204, and 420.
Direct access is a crucial concept in information systems today. If you
telephone a bank with a question about your account, you would not be happy
having to wait on the phone while the bank’s information system performs a
sequential access of its customer file until it finds your record. Clearly this example
www.circuitmix.com
48 C h a p t e r 3 The Database Management System Concept
calls for direct access. In fact, the vast majority of information systems operations
that all companies perform today require direct access.
Both sequential access and direct access can certainly be accomplished with
data stored in simple files. But simple files leave a lot to be desired. What is the
concept of database and what are its advantages?
THE DATABASE CONCEPT
The database concept is one of the most powerful, enduring technologies in
the information systems environment. It encompasses a variety of technical and
managerial issues and features that are at the heart of today’s information systems
scene. In order to get started and begin to develop the deep understanding of
database that we seek, we will focus on five issues that establish a set of basic
principles of the database concept:
1. The creation of a datacentric environment in which a company’s data can
truly be thought of as a significant corporate resource. A key feature of this
environment is the ability to share data among those inside and outside of the
company who require access to it.
2. The ability to achieve data integration while at the same time storing data
in a non-redundant fashion. This, alone, is the central, defining feature of the
database approach.
3. The ability to store data representing entities involved in multiple relationships
without introducing data redundancy or other structural problems.
4. The establishment of an environment that manages certain data control issues,
such as data security, backup and recovery, and concurrency control.
5. The establishment of an environment that permits a high degree of data
independence.
Data as a Manageable Resource
Broadly speaking, the information systems environment consists of several
components including hardware, networks, applications software, systems software,
people, and data. The relative degree of focus placed on each of these has varied
over time. In particular, the amount of attention paid to data has undergone a
radical transformation. In the earlier days of ‘‘data processing,’’ most of the time
and emphasis in application development was spent on the programs, as opposed
to on the data and data structures. Hardware was expensive and the size of main
memory was extremely limited by today’s standards. Programming was a new
discipline and there was much to be learned about it in order to achieve the goal
of efficient processing. Standards for effective programming were unknown. In this
environment, the treatment of the data was hardly the highest-priority concern.
At the same time, as more and more corporate functions at the operational,
tactical, and strategic levels became dependent on information systems, data
increasingly became recognized as an important corporate resource. Furthermore,
the corporate community became increasingly convinced that a firm’s data
about its products, manufacturing processes, customers, suppliers, employees,
and competitors could, with proper storage and use, give the firm a significant
competitive advantage.
www.circuitmix.com
The Database Concept 49
FIGURE 3.3
Corporate resources
People
Money Plant &
Equipment
Inventory
Data
0 0 0 0 1 1 0 0
1 0 1 1 1 0 1 1
0 1 1 0 0 1 1 0
1 0 0 0 1 1 1 1
0 0 0 0 1 1 0 0
1 0 1 1 1 0 1 1
0 1 1 0 0 1 1 0
1 0 0 0 1 1 1 1
Money, plant and equipment, inventories, and people are all important
enterprise resources and, indeed, a great deal of effort has always been expended
to manage them. As corporations began to realize that data is also an important
enterprise resource, it became increasingly clear that data would have to be managed
in an organized way, too, Figure 3.3. What was needed was a software utility that
could manage and protect data while providing controlled shared access to it so that
it could fulfill its destiny as a critical corporate resource. Out of this need was born
the database management system.
As we look to the future and look back at the developments of the last few years,
we see several phenomena that emphasize the importance of data and demand its
careful management as a corporate resource. These include reengineering, electronic
commerce, and enterprise resource planning (ERP) systems that have placed an
even greater emphasis on data. In reengineering, data and information systems are
aggressively used to redesign business processes for maximum efficiency. At the
heart of every electronic commerce Web site is a database through which companies
and their customers transact business. Another very important development was that
of enterprise resource planning (ERP) systems, which are collections of application
programs built around a central shared database. ERP systems very much embody
the principles of shared data and of data as a corporate resource.
Data Integration and Data Redundancy
Data integration and data redundancy, each in their own right, are critical issues in
the field of database management.
■ Data integration refers to the ability to tie together pieces of related data within
an information system. If a record in one file contains customer name, address,
and telephone data and a record in another file contains sales data about an item
that the customer has purchased, there may come a time when we want to contact
the customer about the purchased item.
■ Data redundancy refers to the same fact about the business environment being
stored more than once within an information system. Data integration is clearly a
www.circuitmix.com
50 C h a p t e r 3 The Database Management System Concept
positive feature of a database management system. Data redundancy is a negative
feature (except for performance reasons under certain circumstances that will be
discussed later in this book).
In terms of the data structures used in database management systems, data
integration and data redundancy are tied together and will be discussed together in
this section of the book.
Data stored in an information system describes the real-world business
environment. Put another way, the data is a reflection of the environment. Over the
years that information systems have become increasingly sophisticated, they and
the data that they contain have revolutionized the ways that we conduct virtually
all aspects of business. But, as valuable as the data is, if the data is duplicated
and stored multiple times within a company’s information systems facilities, it can
result in a nightmare of poor performance, lack of trust in the accuracy of the data,
and a reduced level of competitiveness in the marketplace. Data redundancy and
the problems it causes can occur within a single file or across multiple files. The
problems caused by data redundancy are threefold:
■ First, the redundant data takes up a great deal of extra disk space. This alone can
be quite significant.
■ Second, if the redundant data has to be updated, additional time is needed to do
so since, if done correctly, every copy of the redundant data must be updated.
This can create a major performance issue.
■ Third and potentially the most significant is the potential for data integrity
problems. The term data integrity refers to the accuracy of the data. Obviously,
if the data in an information system is inaccurate, it and the whole information
system are of limited value. The problem with redundant data, whether in a single
file or across multiple files, occurs when it has to be updated (or possibly when
it is first stored). If data is held redundantly and all the copies of the data record
being updated are not all correctly updated to the new values, there is clearly
a problem in data integrity. There is an old saying that has some applicability
here, ‘‘The person with one watch always knows what time it is. The person with
several watches is never quite sure,’’ Figure 3.4.
Data Redundancy Among Many Files Beginning with data redundancy across multiple
files, consider the following situation involving customer names and addresses.
Frequently, different departments in an enterprise in the course of their normal
everyday work need the same data. For example, the sales department, the accounts
receivable department, and the credit department may need customer name and
F IGURE 3.4
With several watches the correct time
might not be clear
www.circuitmix.com
The Database Concept 51
FIGURE 3.5
Three files with redundant data
Sales file
Customer Customer
Number Name Address
2746795 John Jones 123 Elm Street
Accounts Receivable file
Customer Customer
Number Name Address
2746795 John Jones 123 Elm Street
Credit file
Customer Customer
Customer
Customer
Customer
Number Name Address
2746795 John Jones 123 Elm Street
address data. Often, the solution to this multiple need is redundant data. The sales
department has its own stored file that, among other things, contains the customer
name and address, and likewise for the accounts receivable and credit departments,
Figure 3.5.
One day customer John Jones, who currently lives at 123 Elm Street, moves
to 456 Oak Street. If his address is updated in two of the files but not the third, then
the company’s data is inconsistent, Figure 3.6. Two of the files indicate that John
Jones lives at 456 Oak Street but one file still shows him living at 123 Elm Street.
The company can no longer trust its information system. How could this happen?
It could have been a software or a hardware error. But more likely it was because
whoever received the new information and was responsible for updating one or two
of the files simply did not know of the existence of the third. As mentioned earlier,
F IGURE 3.6
Three files with a data integrity problem
Sales file
Customer Customer
Number Name Address
2746795 John Jones 456 Oak Street
Accounts Receivable file
Customer Customer
Number Name Address
2746795 John Jones 456 Oak Street
Credit file
Customer Customer
Customer
Customer
Customer
Number Name Address
2746795 John Jones 123 Elm Street
www.circuitmix.com
52 C h a p t e r 3 The Database Management System Concept
at various times in information systems history it has not been unusual in large
companies for the same data to be held redundantly in sixty or seventy files! Thus,
the possibility of data integrity problems is great.
Multiple file redundancy begins as more a managerial issue than single file
redundancy, but it also has technical components. The issue is managerial to the
extent that a company’s management does not encourage data sharing among
departments and their applications. But it is technical when it comes to the reality
of whether the company’s software systems are capable of providing shared access
to the data without compromising performance and data security.
Data Integration and Data Redundancy Within One File Data redundancy in a single
file results in exactly the same three problems that resulted from data redundancy
in multiple files: wasted storage space, extra time on data update, and the potential
for data integrity problems. To begin developing this scenario, consider Figure 3.7,
which shows two files from the General Hardware Co. information system. General
Hardware is a wholesaler of hardware, tools, and related items. Its customers are
hardware stores, home improvement stores, and department stores, or chains of
such stores. Figure 3.7a shows the Salesperson file, which has one record for each
of General Hardware’s salespersons. Salesperson Number is the unique identifying
‘‘key’’ field and as such is underlined in the figure. Clearly, there is no data
redundancy in this file. There is one record for each salesperson and each individual
fact about a salesperson is listed once in the salesperson’s record.
Figure 3.7b shows General Hardware’s Customer file. Customer Number is
the unique key field. Again, there is no data redundancy, but two questions have
F IGURE 3.7
General Hardware Company files
(a) Salesperson file
Salesperson Salesperson Commission Year
Number Name Percentage of Hire
137 Baker 10 1995
186 Adams 15 2001
204 Dickens 10 1998
361 Carlyle 20 2001
(b) Customer file
Customer Customer Salesperson
Number Name Number HQ City
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington
1826 City Hardware 137 New York
2198 Western Hardware 204 New York
2267 Central Stores 186 New York
www.circuitmix.com
The Database Concept 53
to be answered regarding the Salesperson Number field appearing in this file. First,
why is it there? After all, it seems already to have a good home as the unique
identifying field of the Salesperson file. The Salesperson Number field appears in
the Customer file to record which salesperson is responsible for a given customer
account. In fact, there is a one-to-many relationship between salespersons and
customers. A salesperson can and generally does have several customer accounts,
while each customer is serviced by only one General Hardware salesperson. The
second question involves the data in the Salesperson Number field in the Customer
file. For example, salesperson number 137 appears in four of the records (plus once
in the first record of the Salesperson file!). Does this constitute data redundancy?
The answer is no. For data to be redundant (and examples of data redundancy will be
coming up shortly), the same fact about the business environment must be recorded
more than once. The appearance of salesperson number 137 in the first record of
the Salesperson file establishes 137 as the identifier of one of the salespersons.
The appearance of salesperson number 137 in the first record of the Customer file
indicates that salesperson number 137 is responsible for customer number 0121. This
is a different fact about the business environment. The appearance of salesperson
number 137 in the third record of the Customer file indicates that salesperson
number 137 is responsible for customer number 0933. This is yet another distinct
fact about the business environment. And so on through the other appearances of
salesperson number 137 in the Customer file.
Retrieving data from each of the files of Figure 3.7 individually is
straightforward and can be done on a direct basis if the files are set-up for direct
access. Thus, if there is a requirement to find the name or commission percentage
or year of hire of salesperson number 204, it can be satisfied by retrieving the
record for salesperson number 204 in the Salesperson file. Similarly, if there is a
requirement to find the name or responsible salesperson (by salesperson number!)
or headquarters city of customer number 1525, we simply retrieve the record for
customer number 1525 in the Customer file.
But, what if there is a requirement to find the name of the salesperson
responsible for a particular customer account, say for customer number 1525? Can
this requirement be satisfied by retrieving data from only one of the two files of
Figure 3.7? No, it cannot! The information about which salesperson is responsible
for which customers is recorded only in the Customer file and the salesperson
names are recorded only in the Salesperson file. Thus, finding the salesperson
name will be an exercise in data integration. In order to find the name of the
salesperson responsible for a particular customer, first the record for the customer
in the Customer file would have to be retrieved. Then, using the salesperson number
found in that record, the correct salesperson record can be retrieved from the
Salesperson file to find the salesperson name. For example, if there is a need to
find the name of the salesperson responsible for customer number 1525, the first
operation would be to retrieve the record for customer number 1525 in the Customer
file. As shown in Figure 3.7b, this would yield salesperson number 361 as the
number of the responsible salesperson. Then, accessing the record for salesperson
361 in the Salesperson file in Figure 3.7a determines that the name of the salesperson
responsible for customer 1525 is Carlyle. While it’s true that the data in the record
in the Salesperson file and the data in the record in the Customer file have been
integrated, the data integration process has been awfully laborious.
This kind of custom-made, multicommand, multifile access (which, by the
way, could easily require more than two files, depending on the query and the files
www.circuitmix.com
54 C h a p t e r 3 The Database Management System Concept
involved) is clumsy, potentially error prone, and expensive in terms of performance.
While the two files have the benefit of holding data non-redundantly, what is lacking
is a good level of data integration. That is, it is overly difficult to find and retrieve
pieces of data in the two files that are related to each other. For example, customer
number 1525 and salesperson name Carlyle in the two files in Figure 3.7 are related
to each other by virtue of the fact that the two records they are in both include
a reference to salesperson number 361. Yet, as shown above, ultimately finding
the salesperson name Carlyle by starting with the customer number 1525 is an
unacceptably laborious process.
A fair question to ask is, if we knew that data integration was important in
this application environment and if we knew that there would be a frequent need to
find the name of the salesperson responsible for a particular customer, why were
the files structured as in Figure 3.7 in the first place? An alternative arrangement is
shown in Figure 3.8. The single file in Figure 3.8 combines the data in the two files
of Figure 3.7. Also, the Customer Number field values of both are identical.
The file in Figure 3.8 was created by merging the salesperson data from
Figure 3.7a into the records of Figure 3.7b, based on corresponding salesperson
numbers. As a result, notice that the number of records in the file in Figure 3.8
is identical to the number of records in the Customer file of Figure 3.7b. This is
actually a result of the ‘‘direction’’ of the one-to-many relationship in which each
salesperson can be associated with several customers. The data was ‘‘integrated’’
in this merge operation. Notice, for example, that in Figure 3.7b, the record
for customer number 1525 is associated with salesperson number 361. In turn,
in Figure 3.7a, the record for salesperson number 361 is shown to have the name
Carlyle. Those two records were merged, based on the common salesperson number,
into the record for customer number 1525 in Figure 3.8. (Notice, by the way, that the
Salesperson Number field appears twice in Figure 3.8 because it appeared in each
of the files of Figure 3.7. The field values in each of those two fields are identical
in each record in the file in Figure 3.8, which must be the case since it was on those
identical values that the record merge that created the file in Figure 3.8 was based.
That being the case, certainly one of the two Salesperson Number fields in the file
in Figure 3.8 could be deleted without any loss of information.)
The file in Figure 3.8 is certainly well integrated. Finding the name of
the salesperson who is responsible for customer number 1525 now requires a
single record access of the record for customer number 1525. The salesperson
name, Carlyle, is right there in that record. This appears to be the solution to the
F IGURE 3.8
General Hardware Company combined file
Customer Customer Salesperson Salesperson Salesperson Commission Year
Number Name Number HQ City Number Name Percentage of Hire
0121 Main St. Hardware 137 New York 137 Baker 10 1995
0839 Jane’s Stores 186 Chicago 186 Adams 15 2001
0933 ABC Home Stores 137 Los Angeles 137 Baker 10 1995
1047 Acme Hardware Store 137 Los Angeles 137 Baker 10 1995
1525 Fred’s Tool Stores 361 Atlanta 361 Carlyle 20 2001
1700 XYZ Stores 361 Washington 361 Carlyle 20 2001
1826 City Hardware 137 New York 137 Baker 10 1995
2198 Western Hardware 204 New York 204 Dickens 10 1998
2267 Central Stores 186 New York 186 Adams 15 2001
www.circuitmix.com
The Database Concept 55
earlier multifile access problem. Unfortunately, integrating the two files caused
another problem: data redundancy. Notice in Figure 3.8 that, for example, the fact
that salesperson number 137 is named Baker is repeated four times, as are his
commission percentage and year of hire. This is, indeed, data redundancy, as it
repeats the same facts about the business environment multiple times within the
one file. If a given salesperson is responsible for several customer accounts, then
the data about the salesperson must appear in several records in the merged or
integrated file. It would make no sense from a logical or a retrieval standpoint to
specify, for example, the salesperson name, commission percentage, and year of
hire for one customer that the salesperson services and not for another. This would
imply a special relationship between the salesperson and that one customer that
does not exist and would remove the linkage between the salesperson and his other
customers. To be complete, the salesperson data must be repeated for every one of
his customers.
The combined file in Figure 3.8 also illustrates what have come to be referred
to as anomalies in poorly structured files. The problems arise when two different
kinds of data, like salesperson and customer data in this example, are merged into
one file. Look at the record in Figure 3.8 for customer number 2198, Western
Hardware. The salesperson for this customer is Dickens, salesperson number 204.
Look over the table and note that Western Hardware happens to be the only
customer that Dickens currently has. If Western Hardware has gone out of business
or General Hardware has stopped selling to it and they decide to delete the record
for Western Hardware from the file, they also lose everything they know about
Dickens: his commission percentage, his year of hire, even his name associated with
his salesperson number, 204. This situation, which is called the deletion anomaly,
occurs because salesperson data doesn’t have its own file, as in Figure 3.7a. The
only place in the combined file of Figure 3.8 that you can store salesperson data is
in the records with the customers. If you delete a customer and that record was the
only one for that salesperson, the salesperson’s data is gone.
Conversely, in the insertion anomaly, General Hardware can’t record data in
the combined file of Figure 3.8 about a new salesperson the company just hired until
she is assigned at least one customer. After all, the identifying field of the records
of the combined file is Customer Number! Finally, the update anomaly notes that
the redundant data of the combined file, such as Baker’s commission percentage of
10 repeated four times, must be updated each place it exists when it changes (for
example, if Baker is rewarded with an increase to a commission percentage of 15).
There appears to be a very significant tradeoff in the data structures between
data integration and data redundancy. The two files of Figure 3.7 are non-redundant
but have poor data integration. Finding the name of the salesperson responsible for
a particular customer account requires a multicommand, multifile access that can be
slow and error-prone. The merged file of Figure 3.8, in which the data is very well
integrated, eliminates the need for a multicommand, multifile access for this query,
but is highly data redundant. Neither of these situations is acceptable. A poor level
of data integration slows down the company’s information systems and, perhaps, its
business! Redundant data can cause data accuracy and other problems. Yet both the
properties of data integration and of non-redundant data are highly desirable. And,
while the above example appears to show that the two are hopelessly incompatible,
over the years a few—very few—ways have been developed to achieve both goals
in a single data management system. In fact, this concept is so important that it is
the primary defining feature of database management systems:
www.circuitmix.com
56 C h a p t e r 3 The Database Management System Concept
A database management system is a software utility for storing and retrieving
data that gives the end-user the impression that the data is well integrated
even though the data can be stored with no redundancy at all.
Any data storage and retrieval system that does not have this property should
not be called a database management system. Notice a couple of fine points in the
above definition. It says, ‘‘data can be stored with no redundancy,’’ indicating that
non-redundant storage is feasible but not required. In certain situations, particularly
involving performance issues, the database designer may choose to compromise
on the issue of data redundancy. Also, it says, ‘‘that gives the end-user the
impression that the data is well integrated.’’ Depending on the approach to database
management taken by the particular database management system, data can be
physically integrated and stored that way on the disk or it can be integrated at the
time that a data retrieval query is executed. In either case, the data will, ‘‘give the
end-user the impression that the data is well integrated.’’ Both of these fine points
will be explored further later in this book.
Multiple Relationships
Chapter 2 demonstrated how entities can relate to each other in unary, binary,
and ternary one-to-one, one-to-many, and many-to-many relationships. Clearly,
a database management system must be able to store data about the entities in
a way that reflects and preserves these relationships. Furthermore, this must be
accomplished in such a way that it does not compromise the fundamental properties
of data integration and non-redundant data storage described above. Consider the
following problems with attempting to handle multiple relationships in simple
linear files, using the binary one-to-many relationship between General Hardware
Company’s salespersons and customers as an example.
First, the Customer file of Figure 3.7 does the job with its Salesperson Number
field. The fact that, for example, salesperson number 137 is associated with four
of the customers (it appears in four of the records) while, for example, customer
number 1826 has only one salesperson associated with it demonstrates that the
one-to-many relationship has been achieved. However, as has already been shown,
the two files of this figure lack an efficient data integration mechanism; i.e., trying to
link detailed salesperson data with associated customer data is laborious. (Actually,
as will be seen later in this book, the structures of Figure 3.7 are quite viable in
the relational DBMS environment. In that case, the relational DBMS software will
handle the data integration requirement. But without that relational DBMS software,
these structures are deficient in terms of data integration.) Also, the combined file
of Figure 3.8 supports the one-to-many relationship but, of course, introduces data
redundancy.
Figure 3.9 shows a ‘‘horizontal’’ solution to the problem. The Salesperson
Number field has been removed from the Customer file. Instead, each record in
the Salesperson file lists all the customers, by customer number, that the particular
salesperson is responsible for. This could conceivably be implemented as one
variable-length field of some sort containing all the associated customer numbers
for each salesperson, or it could be implemented as a series of customer number
www.circuitmix.com
The Database Concept 57
FIGURE 3.9
General Hardware Company combined
files: One-to-many relationship horizontal
variation
(a) Salesperson file
Salesperson Salesperson Commission Year Customer
Number Name Percentage of Hire Numbers
137 Baker 10 1995 0121, 0933, 1047, 1826
186 Adams 15 2001 0839, 2267
204 Dickens 10 1998 2198
361 Carlyle 20 2001 1525, 1700
(b) Customer file
Customer Customer
Number Name HQ City
0121 Main St. Hardware New York
0839 Jane’s Stores Chicago
0933 ABC Home Stores Los Angeles
1047 Acme Hardware Store Los Angeles
1525 Fred’s Tool Stores Atlanta
1700 XYZ Stores Washington
1826 City Hardware New York
2198 Western Hardware New York
2267 Central Stores New York
fields. While this arrangement does represent the one-to-many relationship, it is
unacceptable for two reasons. One is that the record length could be highly variable
depending on how many customers a particular salesperson is responsible for. This
can be tricky from a space management point of view. If a new customer is added
to a salesperson’s record, the new larger size of the record may preclude its being
stored in the same place on the disk as it came from, but putting it somewhere else
may cause performance problems in future retrievals. The other reason is that once
a given salesperson record is retrieved, the person or program that retrieved it would
have a difficult time going through all the associated customer numbers looking for
the one desired. With simple files like these, the normal expectation is that there
will be one value of each field type in each record (e.g. one salesperson number,
one salesperson name, and so on). In the arrangement in Figure 3.9, the end-user
or supporting software would have to deal with a list of values, i.e. of customer
numbers, upon retrieving a salesperson record. This would be an unacceptably
complex process.
Figure 3.10 shows a ‘‘vertical’’ solution to the problem. In a single file, each
salesperson record is immediately followed by the records for all of the customers
for which the salesperson is responsible. While this does preserve the one-to-many
relationship, the complexities involved in a system that has to manage multiple
record types in a single file make this solution unacceptable, too.
A database management system must be able to handle all of the various
unary, binary, and ternary relationships in a logical and efficient way that does
not introduce data redundancy or interfere with data integration. The database
management system approaches that are in use today all satisfy this requirement. In
www.circuitmix.com
58 C h a p t e r 3 The Database Management System Concept
F IGURE 3.10
General Hardware Company combined
files: One-to-many relationship vertical
variation
0121
0933
1047
1826
Main St. Hardware
ABC Home Stores
Acme Hardware Store
City Hardware
137
137
137
137
New York
Los Angeles
Los Angeles
New York
2198 Western Hardware 204 New York
361 Carlyle 20 2001
204 Dickens 10 1998
186 Adams 15 2001
137 Baker 10 1995
1525
1700
Fred’s Tool Stores
XYZ Stores
361
361
Atlanta
Washington
0839
2267
Jane’s Stores
Central Stores
186
186
Chicago
New York
particular, the way that the relational approach to database management handles it
will be explained in detail.
Data Control Issues
The people responsible for managing the data in an information systems environment
must be concerned with several data control issues. This is true regardless of which
database management system approach is in use. It is even true if no database
management system is in use, that is, if the data is merely stored in simple files.
Most prominent among these data control issues are data security, backup and
recovery, and concurrency control, Figure 3.11. These are introduced here and will
be covered in more depth later in this book. The reason for considering these data
control issues in this discussion of the essence of the database management system
F IGURE 3.11
Three data control issues
Concurrency Control
Security Backup and Recovery
www.circuitmix.com
The Database Concept 59
concept is that such systems should certainly be expected to handle these issues
frequently for all the data stored in the system’s databases.
Computer security has become a very broad topic with many facets and
concerns. These include protecting the physical hardware environment, defending
against hacker attacks, encrypting data transmitted over networks, educating
employees on the importance of protecting the company’s data, and many more. All
computer security exposures potentially affect a company’s data. Some exposures
represent direct threats to data while others are more indirect. For example, the theft
of transmitted data is a direct threat to data while a computer virus, depending on
its nature, may corrupt programs and systems in such a way that the data is affected
on an incidental or delayed basis. The types of direct threats to data include outright
theft of the data, unauthorized exposure of the data, malicious corruption of the
data, unauthorized updates of the data, and loss of the data. Protecting a company’s
data assets has become a responsibility that is shared by its operating systems,
special security utility software, and its database management systems. All database
management systems incorporate features that are designed to help protect the data
in their databases.
Data can be lost or corrupted in any of a variety of ways, not just from the
data security exposures just mentioned. Entire files, portions of databases, or entire
databases can be lost when a disk drive suffers a massive accidental or deliberate
failure. At the extreme, all of a company’s data can be lost to a disaster such as
a fire, a hurricane, or an earthquake. Hackers, computer viruses, or even poorly
written application programs can corrupt from a few to all of the records of a file
or database. Even an unintentional error in entering data into a single record can
be propagated to other records that use its values as input into the creation of their
values. Clearly, every company (and even every PC user!) must have more than
one copy of every data file and database. Furthermore, some of the copies must be
kept in different buildings, or even different cities, to prevent a catastrophe from
destroying all copies of the data. The process of using this duplicate data, plus
other data, special software, and even specially designed disk devices to recover
lost or corrupted data is known as ‘‘backup and recovery.’’ As a key issue in data
management, backup and recovery must be considered and incorporated within the
database management system environment.
In today’s multi-user environments, it is quite common for two or more users
to attempt to access the same data record simultaneously. If they are merely trying
to read the data without updating it, this does not cause a problem. However, if two
or more users are trying to update a particular record simultaneously, say a bank
account balance or the number of available seats on an airline flight, they run the
risk of generating what is known as a ‘‘concurrency problem.’’ In this situation,
the updates can interfere with each other in such a way that the resulting data values
will be incorrect. This intolerable possibility must be guarded against and, once
again, the database management system must be designed to protect its databases
from such an eventuality.
A fundamental premise of the database concept is that these three data control
issues—data security, backup and recovery, and concurrency—must be managed
by or coordinated with the database management system. This means that when a
new application program is written for the database environment, the programmers
can concentrate on the details of the application and not have to worry about writing
code to manage these data control issues. It means that there is a good comfort
level that the potential problems caused by these issues are under control since
www.circuitmix.com
60 C h a p t e r 3 The Database Management System Concept
they are being managed by long-tested components of the DBMS. It means that
the functions are standard for all of the data in the environment, which leads to
easier management and economies of scale in assigning and training personnel to
be responsible for the data. This kind of commonality of control is a hallmark of the
database approach.
Data Independence
In the earlier days of ‘‘data processing,’’ many decisions involving the way that
application programs were written were made in concert with the specific file
designs and the choice of file organization and access method used. The program
logic itself was dependent upon the way in which the data is stored. In fact,
the ‘‘data dependence’’ was often so strong that if for any reason the storage
characteristics of the data had to be changed, the program itself had to be modified,
often extensively. That was a very undesirable characteristic of the data storage
and programming environments because of the time and expense involved in such
efforts. In practice, storage structures sometimes have to change, to reflect improved
storage techniques, application changes, attempts at sharing data, and performance
tuning, to name a few reasons. Thus, it is highly desirable to have a data storage and
programming environment in which as many types of changes in the data structure
as possible would not require changes in the application programs that use them.
This goal of ‘‘data independence’’ is an objective of today’s database management
systems.
DBMS APPROACHES
We have established a set of principles for the database concept and said that a
database management system is a software utility that embodies those concepts. The
next question concerns the nature of a DBMS in terms of how it organizes data and
how it permits its retrieval. Considering that the database concept is such a crucial
component of the information systems environment and that there must be a huge
profit motive tied up with it, you might think that many people have worked on the
problem over the years and come up with many different approaches to designing
DBMSs. It’s true that many very bright people have worked on this problem for a
long time but, interestingly, you can count the number of different viable approaches
that have emerged on the fingers of one hand. In particular, the central issue of
providing a non-redundant data environment that also looks as though it is integrated
is a very hard nut to crack. Let’s just say that we’re fortunate that even a small
number of practical ways to solve this problem have been discovered.
Basically, there are four major DBMS approaches:
■ Hierarchical
■ Network
■ Relational
■ Object-Oriented
The hierarchical and network approaches to database are both called
‘‘navigational’’ approaches because of the way that programs have to ‘‘navigate’’
through hierarchies and networks of data to find the data they need. Both
www.circuitmix.com
DBMS Approaches 61
CONCEPTS
IN ACT ION
3-B LANDAU UNIFORMS
Landau Uniforms is a premier sup-
plier of professional apparel to the healthcare community,
offering a comprehensive line of healthcare uniforms and
related apparel. Headquartered in Olive Branch, MS, the
company, which dates back to 1938, has continuously
expanded its operations both domestically and interna-
tionally and today includes corporate apparel among
its products. Landau sells its apparel though authorized
dealers throughout the U.S. and abroad.
Controlling Landau’s product flow in its warehouse
is a sophisticated information system that is anchored
in database management. Their order filling system,
‘‘Photo Courtesy of Landau Uniforms’’
implemented in 2001, is called the Garment Sortation
System It begins with taking orders that are then queued
in preparation for ‘‘waves’’ of as many as 80 orders to
be filled simultaneously. Each order is assigned a bin
at the end of a highly automated conveyor line. The
garments for the orders are picked from the shelves and
placed onto the beginning of the conveyor line. Scanning
devices then automatically direct the bar-coded garments
into the correct bin. When an order is completed, it
is boxed and sealed. The box then goes on another
conveyor where it is automatically weighed, a shipping
label is printed and attached to it, and it is routed to one
www.circuitmix.com
62 C h a p t e r 3 The Database Management System Concept
of several shipping docks, depending on which shipper is
being used. In addition, a bill is automatically generated
and sent to the customer. In fact, Landau bills its more
sophisticated customers electronically using an electronic
data interchange (EDI) system.
There are two underlying relational databases. The
initial order processing is handled using a DB2 database
running on an IBM ‘‘i’’ series computer. The orders are
passed on to the Garment Sortation System’s Oracle
database running on PCs. The shipping is once again
under the control of the DB2/‘‘i’’ series system. The
relational tables include an order table, a customer table,
a style master table, and, of course, a garment table with
2.4 million records.
of these technologies were developed in the 1960s and, relative to the other
approaches, are somewhat similar in structure. IBM’s Information Management
System (IMS), a DBMS based on the hierarchical approach, was released in 1969.
It was followed in the early 1970s by several network-based DBMSs developed
by such computer manufacturers of the time as UNIVAC, Honeywell, Burroughs,
and Control Data. There was also a network-based DBMS called Integrated Data
Management Store (IDMS) produced by an independent software vendor originally
called Cullinane Systems, which was eventually absorbed into Computer Associates.
These navigational DBMSs, which were suitable only for mainframe computers,
were an elegant solution to the redundancy/integration problem at the time that
they were developed. But they were complex, difficult to work with in many
respects, and, as we said, required a mainframe computer. Now often called ‘‘legacy
systems,’’ some of them interestingly have survived to this very day for certain
applications that require a lot of data and fast data response times.
The relational database approach became commercially viable in about 1980.
After several years of user experimentation, it became the preferred DBMS approach
and has remained so ever since. Chapters 4–8 of this book, as well as portions of later
chapters, are devoted to the relational approach. The object-oriented approach has
proven useful for a variety of niche applications and will be discussed in Chapter 9.
It is interesting to note that some key object-oriented database concepts have found
Y O U R
T U R N
3.2 INTEGRATING DATA
The need to integrate data is all
around us, even in our personal lives. We integrate data
many times each day without realizing that that’s what
we’re doing. When we compare the ingredients needed
for a recipe with the food ‘‘inventory’’ in our cupboards,
we are integrating data. When we think about buying
something and relate its price to the money we have in our
wallets or in our bank accounts or to the credit remaining
on our credit cards, we are integrating data. When we
compare our schedules with our children’s schedules and
perhaps those of others with whom we carpool, we are
integrating data. Can you think of other ways in which
you integrate data on a daily basis?
QUESTION:
Consider a medical condition for which you or someone
you know is being treated. Describe the different ways
that you integrate data in taking care of that condition.
Hints: Consider your schedule, your doctors’ schedules,
the amount of prescription medication you have on
hand, the inventory of medication at the pharmacy you
use, and so on.
www.circuitmix.com
Questions 63
their way into some of the mainstream relational DBMSs and some are described
as taking a hybrid ‘‘object/relational’’ approach to database.
SUMMARY
There are five major components in the database concept. One is the development of
a datacentric environment that promotes the idea of data being a significant corporate
resource and encourages the sharing of data. Another, which is really the central
premise of database management, is the ability to achieve data integration while
at the same time storing data in a non-redundant fashion. The third, which at the
structural level is actually closely related to the integration/redundancy paradigm,
is the ability to store data representing entities involved in multiple relationships
without introducing redundancy. Another component is the presence of a set of
data controls that address such issues as data security, backup and recovery, and
concurrency control. The final component is that of data independence, the ability
to modify data structures without having to modify programs that access them.
There are basically four approaches to database management: the early
hierarchical and network approaches, the current standard relational approach, and
the specialty object-oriented approach, many features of which are incorporated
into today’s expanded relational database management systems.
KEY TERMS
Attribute
Backup and recovery
Computer security
Concurrency control
Concurrency problem
Corporate resource
Data control issues
Data dependence
Data independence
Data integration
Data integrity problem
Data redundancy
Data retrieval
Data security
Datacentric environment
Direct access
Enterprise resource planning (ERP)
system
Entity
Entity set
Fact
Field
File
Logical sequential access
Manageable resource
Multiple relationships
Physical sequential access
Record
Sequential access
Software utility
Well integrated
QUESTIONS
1. What is data? Do you think the word ‘‘data’’ should
be treated as a singular or plural word? Why?
2. Name some entities and their attributes in a
university environment.
3. Name some entities and attributes in an insurance
company environment.
4. Name soe entities and attributes in a furniture store
environment.
5. What is the relationship between:
a. An entity and a record?
b. An attribute and a field?
c. An entity set and a file?
6. What is the difference between a record type and an
occurrence of that record? Give some examples.
7. Name the four basic operations on stored data. In
what important way is one in particular different
from the other three?
8. What is sequential access? What is direct access?
Which of the two is more important in today’s
business environment? Why?
www.circuitmix.com
64 C h a p t e r 3 The Database Management System Concept
9. Give an example of and describe an application that
would require sequential access in:
a. The university environment.
b. The insurance company environment.
c. The furniture store environment.
10. Give an example of and describe an application that
would require direct access in:
a. The university environment.
b. The insurance company environment.
c. The furniture store environment.
11. Should data be considered a true corporate resource?
Why or why not? Compare and contrast data to other
corporate resources (capital, plant and equipment,
personnel, etc.) in terms of importance, intrinsic
value, and modes of use.
12. Defend or refute the following statement: ‘‘Data is
the most important corporate resource because it
describes all of the others.’’
13. What are the two kinds of data redundancy, and
what are the three types of problems that they cause
in the information systems environment?
14. What factors might lead to redundant data across
multiple files? Is the problem managerial or techni-
cal in nature?
15. Describe the apparent tradeoff between data redun-
dancy and data integration in simple linear files.
16. In your own words, describe the key quality of a
DBMS that sets it apart from other data handling
systems.
17. Do you think that the single-file redundancy problem
is more serious, less serious, or about the same as
the multifile redundancy problem? Why?
18. What are the two defining goals of a database
management system?
19. What expectation should there be for a database
management system with regard to handling multi-
ple relationships? Why?
20. What are the problems with the ‘‘horizontal’’ and
‘‘vertical’’ solutions to the handling of multiple
relationships as described in the chapter?
21. What expectation should there be for a database
management system with regard to handling data
control issues such as data security, backup and
recovery, and concurrency control? Why?
22. What would the alternative be if database man-
agement systems were not designed to handle data
control issues such as data security, backup and
recovery, and concurrency control?
23. What is data independence? Why is it desirable?
24. What expectation should there be for a database
management system with regard to data indepen-
dence? Why?
25. What are the four major DBMS approaches? Which
approaches are used the most and least today?
EXERCISES
1. Consider a hospital in which each doctor is
responsible for many patients while each patient
is cared for by just one doctor. Each doctor has a
unique employee number, name, telephone number,
and office number. Each patient has a unique patient
number, name, home address, and home telephone
number.
a. What kind of relationship is there between
doctors and patients?
b. Develop sample doctor and patient data and
construct two files in the style of Figure 3.5
in which to store your sample data.
c. Do any fields have to be added to one or the
other of the two files to record the relationship
between doctors and patients? Explain.
d. Merge these two files into one, in the style of
Figure 3.6. Does this create any problems with
the data? Explain.
2. The Dynamic Chemicals Corp. keeps track of its
customers and its orders. Customers typically have
several outstanding orders while each order was
generated by a single customer. Each customer has a
unique customer number, a customer name, address,
and telephone number. An order has a unique order
number, a date, and a total cost.
a. What kind of relationship is there between
customers and orders?
b. Develop sample customer and order data and
construct two files in the style of Figure 3.5 in
which to store your sample data.
www.circuitmix.com
Minicases 65
c. Do any fields have to be added to one or the
other of the two files to record the relationship
between customers and orders? Explain.
d. Merge these two files into one, in the style of
Figure 3.6. Does this create any problems with
the data? Explain.
MINICASES
1. Answer the following questions based on the following
Happy Cruise Lines’ data.
(a) Ship table
Ship Ship Year Weight
Number Name Built (Tons)
005 Sea Joy 1999 80,000
009 Ocean IV 2003 75,000
012 Prince Al 2004 90,000
020 Queen Shirley 1999 80,000
(b) Crew Member table
Sailor Sailor Ship Home Job
Number Name Number Country Title
00536 John Smith 009 USA Purser
00732 Ling Chang 012 China Engineer
06988 Maria Gonzalez 020 Mexico Purser
16490 Prashant Kumar 005 India Navigator
18535 Alan Jones 009 UK Cruise Director
20254 Jane Adams 012 USA Captain
23981 Rene Lopez 020 Philippines Captain
27467 Fred Jones 020 UK Waiter
27941 Alain DuMont 009 France Captain
28184 Susan Moore 009 Canada Wine Steward
31775 James Collins 012 USA Waiter
32856 Sarah McLachlan 012 Ireland Cabin Steward
a. Regarding the Happy Cruise Lines Crew Member
file.
i. Describe the file’s record type.
ii. Show a record occurrence.
iii. Describe the set or range of values that the Ship
Number field can take.
iv. Describe the set or range of values that the
Home Country field can take.
b. Assume that the records of the Crew Memberfile
are physically stored in the order shown.
i. Retrieve all of the records of the file physically
sequentially.
ii. Retrieve all of the records of the file logically
sequentially based on the Sailor Name field.
iii. Retrieve all of the records of the file logi-
cally sequentially based on the Sailor Number
field.
iv. Retrieve all of the records of the file logi-
cally sequentially based on the Ship Number
field.
v. Perform a direct retrieval of the records with a
Sailor Number field value of 27467.
vi. Perform a direct retrieval of the records with a
Ship Number field value of 020.
vii. Perform a direct retrieval of the records with a
Job Title field value of Captain.
c. The value 009 appears as a ship number once in the
Ship file and four times in the Crew Member file.
Does this constitute data redundancy? Explain.
d. Merge the Ship and Crew Member files based on
the common ship number field (in a manner similar
to Figure 3.8 for the General Hardware database).
Is the merged file an improvement over the two
separate files in terms of:
i. Data redundancy? Explain.
ii. Data integration? Explain.
e. Explain why the Ship Number field is in the Crew
Member file.
f. Explain why ship number 012 appears three times
in the Crew Member file.
g. How many files must be accessed to find:
i. The year that ship number 012 was built?
ii. The home country of sailor number 27941?
iii. The name of the ship on which sailor number
18535 is employed?
h. Describe the procedure for finding the weight of the
ship on which sailor number 00536 is employed.
i. What is the mechanism for recording the one-to-
many relationship between crew members and ships
in the Happy Cruise Lines database above?
www.circuitmix.com
66 C h a p t e r 3 The Database Management System Concept
2. Answer the following questions based on the following
Super Baseball League data.
(a) TEAM file.
Team Team
Number Name City Manager
137 Eagles Orlando Smith
275 Cowboys San Jose Jones
294 Statesmen Springfield Edwards
368 Pandas El Paso Adams
422 Sharks Jackson Vega
(b) PLAYER file.
Player Player Team
Number Name Age Position Number
1209 Steve Marks 24 Catcher 294
1254 Roscoe Gomez 19 Pitcher 422
1536 Mark Norton 32 First Baseman 368
1953 Alan Randall 24 Pitcher 137
2753 John Harbor 22 Shortstop 294
2843 John Yancy 27 Center Fielder 137
3002 Stuart Clark 20 Catcher 422
3274 Lefty Smith 31 Third Baseman 137
3388 Kevin Taylor 25 Shortstop 294
3740 Juan Vidora 25 Catcher 368
a. Regarding the Super Baseball League Player file
shown below.
i. Describe the file’s record type.
ii. Show a record occurrence.
iii. Describe the set or range of values that the
Player Number field can take.
b. Assume that the records of the Player file are
physically stored in the order shown.
i. Retrieve all of the records of the file physically
sequentially.
ii. Retrieve all of the records of the file logically
sequentially based on the Player Name field.
iii. Retrieve all of the records of the file logically
sequentially based on the Player Number field.
iv. Retrieve all of the records of the file logically
sequentially based on the Team Number field.
v. Perform a direct retrieval of the records with a
Player Number field value of 3834.
vi. Perform a direct retrieval of the records with a
Team Number field value of 20.
vii. Perform a direct retrieval of the records with an
Age field value of 24.
c. The value 294 appears as a team number once in the
Team file and three times in the Player file. Does
this constitute data redundancy? Explain.
d. Merge the Team and Player files based on the
common Team Number field (in a manner similar
to Figure 3.8 for the General Hardware database).
Is the merged file an improvement over the two
separate tables in terms of:
i. Data redundancy? Explain.
ii. Data integration? Explain.
e. Explain why the Team Number field is in the Player
file.
f. Explain why team number 422 appears twice in the
Player file.
g. How many files must be accessed to find:
i. The age of player number 1953?
ii. The name of the team on which player number
2288 plays?
iii. The number of the team on which player number
2288 plays?
h. Describe the procedure for finding the name of the
city in which player number 3002 is based.
i. What is the mechanism for recording the one-to-
many relationship between players and teams in the
Super Baseball League database, above?
www.circuitmix.com
C H A P T E R 4
RELATIONAL DATA
RETRIEVAL: SQL
A s we move forward into the discussion of database management systems, we
will cover a wide range of topics and skills including how to design databases,
how to modify database designs to improve performance, how to organize corporate
departments to manage databases, and others. But first, to whet your appetites for what
is to come, we’re going to dive right into one of the most intriguing aspects of database
management: retrieving data from relational databases using the industry-standard SQL
database management language.
Note: Some instructors may prefer to cover relational data retrieval with SQL
after logical database design, Chapter 7, or after physical database design,
Chapter 8. This chapter, Chapter 4 on relational data retrieval with SQL, is
designed to work just as well in one of those positions as it is here.
OBJECTIVES
■ Write SQL SELECT commands to retrieve relational data using a variety of
operators including GROUP BY, ORDER BY, and the built-in functions AVG,
SUM, MAX, MIN, COUNT.
■ Write SQL SELECT commands that join relational tables.
■ Write SQL SELECT subqueries.
■ Describe a strategy for writing SQL SELECT statements.
■ Describe the principles of a relational query optimizer.
www.circuitmix.com
68 C h a p t e r 4 Relational Data Retrieval: SQL
CHAPTER OUTLINE
Introduction
Data Retrieval with the SQL SELECT
Command
Introduction to the SQL SELECT
Command
Basic Functions
Built-In Functions
Grouping Rows
The Join
Subqueries
A Strategy for Writing SQL
SELECT Commands
Example: Good Reading Book Stores
Example: World Music Association
Example: Lucky Rent-A-Car
Relational Query Optimizer
Relational DBMS Performance
Relational Query Optimizer Concepts
Summary
INTRODUCTION
There are two aspects of data management: data definition and data manipulation.
Data definition, which is operationalized with a data definition language (DDL),
involves instructing the DBMS software on what tables will be in the database,
what attributes will be in the tables, which attributes will be indexed, and so
forth. Data manipulation refers to the four basic operations that can and must be
performed on data stored in any DBMS (or in any other data storage arrangement,
for that matter): data retrieval, data update, insertion of new records, and deletion
of existing records. Data manipulation requires a special language with which users
can communicate data manipulation commands to the DBMS. Indeed, as a class,
these are known as data manipulation languages (DMLs).
A standard language for data management in relational databases, known as
Structured Query Language or SQL, was developed in the early 1980s. SQL
incorporates both DDL and DML features. It was derived from an early IBM
research project in relational databases called ‘‘System R.’’ SQL has long since
been declared a standard by the American National Standards Institute (ANSI) and
by the International Standards Organization (ISO). Indeed, several versions of the
standards have been issued over the years. Using the standards, many manufacturers
have produced versions of SQL that are all quite similar, at least at the level at which
we will look at SQL in this book. These SQL versions are found in such mainstream
DBMSs as DB2, Oracle, MS Access, Informix, and others. SQL in its various imple-
mentations is used very heavily in practice today by companies and organizations
of every description, Advance Auto Parts being one of countless examples.
SQL is a comprehensive database management language. The most interesting
aspect of SQL and the aspect that we want to explore in this chapter is its rich
data retrieval capability. The other SQL data manipulation features, as well as the
SQL data definition features, will be considered in the database design chapters that
come later in this book.
DATA RETRIEVAL WITH THE SQL SELECT COMMAND
Introduction to the SQL SELECT Command
Data retrieval in SQL is accomplished with the SELECT command. There are a few
fundamental ideas about the SELECT command that you should understand before
looking into the details of using it. The first point is that the SQL SELECT command
www.circuitmix.com
Data Retrieval with the SQL SELECT Command 69
CONCEPTS
IN ACT ION
4-A ADVANCE AUTO PARTS
Advance Auto Parts is the second
largest retailer of automotive parts and accessories in the
U. S. The company was founded in 1932 with three stores
in Roanoke, VA, where it is still headquartered today. In
the 1980s, with fewer than 175 stores, the company
developed an expansion plan that brought it to over 350
stores by the end of 1993. It has rapidly accelerated its
expansion since then and, with mergers and acquisitions,
now has more than 2,400 stores and over 32,000
employees throughout the United States. Advance Auto
Parts sells over 250,000 automotive components. Its
innovative ‘‘Parts Delivered Quickly’’ (PDQ) system, which
was introduced in 1982, allows its customers access to
this inventory within 24 hours.
One of Advance Auto Parts’ key database appli-
cations, its Electronic Parts Catalog, gives the company
an important competitive advantage. Introduced in the
early 1990s and continually upgraded since then, this
system allows store personnel to look up products they
sell based on the customer’s vehicle type. The system’s
records include part descriptions, images, and drawings.
Photo Courtesy of Advance Auto Parts
Once identified, store personnel pull an item from the
store’s shelves if it’s in stock. If it’s not in stock, then
using the system they send out a real-time request for the
part to the home office to check on the part’s warehouse
availability. Within minutes the part is picked at a regional
warehouse and it’s on its way. In addition to its in-store
use, the system is used by the company’s purchasing and
other departments.
The system runs on an IBM mid-range system at
company headquarters and is built on the SQL Server
DBMS. Parts catalog data, in the form of updates,
is downloaded weekly from this system to a small
server located in each store. Additional data retrieval
at headquarters is accomplished with SQL. The 35-table
database includes a Parts table with 2.5 million rows
that accounts not only for all of the items in inventory
but for different brands of the same item. There is also
a Vehicle table with 31,000 records. These two lead to
a 45-million-record Parts Application table that describes
which parts can be used in which vehicles.
www.circuitmix.com
70 C h a p t e r 4 Relational Data Retrieval: SQL
is not the same thing as the relational algebra Select operator discussed in
Chapter 5. It’s a bit unfortunate that the same word is used to mean two different
things, but that’s the way it is. The fact is that the SQL SELECT command is
capable of performing relational Select, Project, and Join operations singly or in
combination, and much more
SQL SELECT commands are considered, for the most part, to be ‘‘declarative’’
rather than ‘‘procedural’’ in nature. This means that you specify what data you are
looking for rather than provide a logical sequence of steps that guide the system in
how to find the data. Indeed, as we will see later in this chapter, the relational DBMS
analyzes the declarative SQL SELECT statement and creates an access path, a
plan for what steps to take to respond to the query. The exception to this, and the
reason for the qualifier ‘‘for the most part’’ at the beginning of this paragraph, is
that a feature of the SELECT command known as ‘‘subqueries’’ permits the user to
specify a certain amount of logical control over the data retrieval process.
Another point is that SQL SELECT commands can be run in either a ‘‘query’’
or an ‘‘embedded’’ mode. In the query mode, the user types the command at a
workstation and presses the Enter key. The command goes directly to the relational
DBMS, which evaluates the query and processes it against the database. The
result is then returned to the user at the workstation. Commands entered this way
can normally also be stored and retrieved at a later time for repetitive use. In
the embedded mode, the SELECT command is embedded within the lines of a
higher-level language program and functions as an input or ‘‘read’’ statement for
the program. When the program is run and the program logic reaches the SELECT
command, the program executes the SELECT. The SELECT command is sent to
the DBMS which, as in the query-mode case, processes it against the database and
returns the results, this time to the program that issued it. The program can then use
and further process the returned data. The only tricky part to this is that traditional
higher-level language programs are designed to retrieve one record at a time. The
result of a relational retrieval command is itself, a relation. A relation, if it consists
of a single row, can resemble a record, but a relation of several rows resembles, if
anything, several records. In the embedded mode, the program that issued the SQL
SELECT command and receives the resulting relation back, must treat the rows of
the relation as a list of records and process them one at a time.
SQL SELECT commands can be issued against either the actual, physical
database tables or against a ‘‘logical view’’ of one table or of several joined tables.
Good business practice dictates that in the commercial environment, SQL SELECT
commands should be issued against such logical views rather than directly against
the base tables. As we will see later in this book, this is a simple but effective
security precaution.
Finally, the SQL SELECT command has a broad array of features and options
and we will only cover some of them at this introductory level. But what is also
very important is that our discussion of the SELECT command and the features that
we will cover will work in all of the major SQL implementations, such as Oracle,
MS Access, SQL Server, DB2, Informix, and so on, possibly with minor syntax
variations in some cases.
Basic Functions
The Basic SELECT Format In the simplest SELECT command, we will indicate from
which table of the database we want to retrieve data, which rows of that table we
www.circuitmix.com
Data Retrieval with the SQL SELECT Command 71
are interested in, and which attributes of those rows we want to retrieve. The basic
format of such a SELECT statement is:
SELECT