2 + Pages – 12 hours - Paper Answers

Question in text doc

Running Head: BASIC CONCEPTS AND TECHNIQUES 1

BASIC CONCEPTS AND TECHNIQUES 4

Basic Concepts and Techniques

Classification

Name

Institution

Course

Tutor

Date

Basic Concepts in Data Classification

Data classification refers to the process involved in organizing data in different categories for it to be used effectively. Classification of data make it easier for retrieval and location. Additionally, it also reduces several duplications of data thereby reducing storage as well as backup costs. The main types of data classification involves: content, context and user (Ghaddar and Naoum, 2018). In the content, the classification is based on looking for sensitive information. On context, the classification is based on searching for indirect indicators of sensitive information.

General framework for classification

Data classification consist of grouping depending on the relevance. Data is classified on the bases of the content carried, the knowledge involved and the content contained. One of the necessity in data classification is the data framework. Framework provide the structure. The framework is significant to the enterprise organisation who benefit from big data.

What is a decision tree and decision tree modifier?

Decision tree refers to a supervised machine where the data is split depending on specific parameters. Decision tree consists of nodes, edges and the leaf nodes. The nodes test the value of specific attribute. Branch correlate with the outcome of the test. Leaf nodes predicts the outcome. On the other hand, decision tree modifier refers to the discriminator class that separate the training set such that each portion contains entirely of one class.

What is a hyper parameter?

Hyper parameter refers to an external configuration to the model and its value cannot be calculated from the data. Hyper parameter are mostly used in estimation of model parameters and are specified by the practitioner. Hyper parameters are adjustable parameters used in obtaining a model with optical performances (Chen et al, 2019).

Hyper parameter optimization is a challenge especially when selecting a set of optical hyper parameters. The parameter is used to regulate the learning process.

Model evaluation is a significant part in the development process. It is significant in finding the best model representing the data as well as how well the selected model perform various activities. Some of the cross validation pitfalls when choosing and assessing data include selection of model performance, selection of variables and performance of single cross validation.

References

Ghaddar, B., & Naoum-Sawaya, J. (2018). High dimensional data classification and feature selection using support vector machines. European Journal of Operational Research, 265(3), 993-1004.

Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26-40.

Running Head: BASIC CONCEPTS AND TECHNIQUES

Basic Concepts and Techniques

Classification

Name

Institution

Course

Tutor

Date

Running Head: BASIC CONCEPTS AND TECHNIQUES 1

Basic Concepts and Techniques
Classification
Name
Institution
Course
Tutor
Date

About

Press

Creators

Advertise

Developers

Terms

Privacy

Policy & Safety

How YouTube works

Test new features

Data collection methods

Student’s name

Affiliation

Course

Professor

Date of submission

This study source was downloaded by 100000802314458 from CourseHero.com on 01-24-2022 19:10:04 GMT -06:00

https://www.coursehero.com/file/107950179/w4downloaddocx/

Introduction

The process of data collection is an essential part of our daily routine activities. The data

collection process can be identified as the guidance engine, which usually drives us to a quality

improvement in areas one is investigating. For instance, Capri’s (2015) reading on the manual

data collection has mainly assisted many individuals in understanding how data is collected. This

has primarily made a promising discovery of how things are mainly operating as one mostly

becomes more interested in the data collection process. Through the data collection, we can

investigate wide occurrences of the research questions.

What were the traditional methods of data collection in the transit system?

Several methods of data collection were traditionally used in the transit system. The

traditional methods which were mainly used included invasive techniques. This method was used

primarily to use a piezo- sensor or a magnet as a local data collection method. There was the use

of a human surveyor in the terrain or from a video in collecting data (Lai, et al., 2020). This was

mainly done through both direct and indirect personal interviews. The human surveyor mostly

involved the idea of questionnaire use. Radars and other simple techniques or a form of image

analysis were mainly used through a machine vision. The traditional methods were primarily

used as they offered a good platform for collecting data effectively.

Why are the traditional methods insufficient in satisfying the requirement of data

collection?

The traditional methods of data collection were not more effective as they had some

limitations. The data collected was not to the standards as there were not many considerations

with the required standards. This had an impact on the comparability of the data collection. The

context of data collection was not much considered. For instance, those who were collecting the

This study source was downloaded by 100000802314458 from CourseHero.com on 01-24-2022 19:10:04 GMT -06:00
https://www.coursehero.com/file/107950179/w4downloaddocx/

https://www.coursehero.com/file/107950179/w4downloaddocx/

data were not much valued. There was a complexity as there were many risks and dangers in the

data collection process (Baumfeld Andre, et al., 2020). The barriers made it difficult to attain the

quality requirements of the data. Finally, the data collected through the traditional methods were

not to the standards as there was a lack of training in collecting data. These challenges made it

difficult for the conventional approaches to be more satisfying in data collection.

Give a synopsis of the Capri (2015) case study and your thoughts regarding the

requirements of the optimization and performance measurement requirements and the

impact to expensive and labor-intensive nature

After reading the chapter by Capri (2015) on manual data collection, I realized an

apparent breakdown. This chapter is mainly regarding equipping the researchers with the best

techniques to apply in the data collection process. This chapter has helped chiefly the

engineering sector be more conversant with data collection methods. According to my thoughts, I

would mainly have to appreciate all those individuals who would take time and read this chapter;

they will be more equipped with the best data collection methods. Through this idea, all the

requirements and best measurements will be emphasized; hence the impacts to the expensive and

labor-intensive nature will be utilized effectively.

The Capri (2015) chapter reading has mainly helped several individuals minimize the

expense they would experience in data collecting processes.

This chapter has taught us how we can work in the data collection process without

experiencing many challenges. There is also an improvement as a whole as the issue of labor-

intensive is well outlined in nature. The techniques and alternatives offered are mainly likely to

impact the engineering sector as a whole. This is evidenced as there is a good optimization and

an improvement in the performance requirements.

This study source was downloaded by 100000802314458 from CourseHero.com on 01-24-2022 19:10:04 GMT -06:00
https://www.coursehero.com/file/107950179/w4downloaddocx/

https://www.coursehero.com/file/107950179/w4downloaddocx/

In conclusion, the engineering department has been identified to improve in general as

there are guidelines and processes pertaining there data collection methods. As we are today, we

cannot compare this as it was before. There were several challenges which impacted the data

collection process. In the current situation, we are happy as there is a clear light of where we will

be in the future.

This study source was downloaded by 100000802314458 from CourseHero.com on 01-24-2022 19:10:04 GMT -06:00
https://www.coursehero.com/file/107950179/w4downloaddocx/

https://www.coursehero.com/file/107950179/w4downloaddocx/

References

Baumfeld Andre, E., Reynolds, R., Caubel, P., Azoulay, L., & Dreyer, N. A. (2020). Trial designs

using real‐world data: The changing landscape of the regulatory approval

process. Pharmacoepidemiology and drug safety, 29(10), 1201-1212.

Lai, X., Teng, J., & Ling, L. (2020, September). Evaluating Public Transportation Service in a

Transit Hub based on Passengers Energy Cost. In 2020 IEEE 23rd International

Conference on Intelligent Transportation Systems (ITSC) (pp. 1-7). IEEE.

This study source was downloaded by 100000802314458 from CourseHero.com on 01-24-2022 19:10:04 GMT -06:00

https://www.coursehero.com/file/107950179/w4downloaddocx/
Powered by TCPDF (www.tcpdf.org)

https://www.coursehero.com/file/107950179/w4downloaddocx/

http://www.tcpdf.org

Dr. Oner Celepcikay

ITS 632

Week 4

Classification

Header – dark yellow 24 points Arial Bold
Body text – white 20 points Arial Bold, dark yellow highlights
Bullets – dark yellow
Copyright – white 12 points Arial
Size:
Height: 7.52″
Width: 10.02″
Scale: 70%
Position on slide:
Horizontal – 0″
Vertical – 0″

Machine Learning Methods – Classification
ITS 632
Given a collection of records (training set)
– Each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the values of other attributes.
A test set is used to estimate the accuracy of the model.
Goal: previously unseen records (test set) should be assigned a class as accurately as possible.

Machine Learning – Classification Example
ITS 632

categorical
categorical
continuous
class

Test
Set
Training
Set
Model
Learn
Classifier

Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K > 80K
Splitting Attributes

Model: Decision Tree
Machine Learning – Classification Example

categorical
categorical
continuous
ITS 632
class

MarSt
Refund
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K > 80K
There could be more than one tree that fits the same data!

categorical
categorical
continuous
Another Example of Decision Tree
ITS 632