Questions:
a. The effective evolution of loan applications for credit worthiness
The credit worthiness of loan application has measured the ability to pay back the installments of a person. A person, who have lowest failing record of pay back the loan amount, has the highest priority (Bharadwaj et al. 2013). The banks check this record before proceeding a loan to a customer. When a people has a good report of creditworthiness, the banks granted the credit amount to individuals. Here this report utilizes as a guarantee of the person, which ensure that it will be paid back (Friedman et al. 2013). The creditworthiness is used to populate a scorecard of individuals, which define the altogether checkpoints that banks measure to grant the loan. The banks have some rules and regulations to process a loan; this process is very complicated where a bank complete analyze the details of a person who wants the loan.
A loan granted to a person until complete all paperwork. In this process, the credit score is the most vital thing, if this score is high than the requested loan will be granted otherwise it cannot be granted (Rautenstrauch, Seelmann-Eggebert and Turowski 2012). In the loan policy, if a person is not eligible to pay the installments of the due loan amount, the banks can take over the guaranteed property. Here some factors are identified for populating the scorecard, which is given underneath.
- If any people takes, many loans or credit card requests that demonstrate an extreme credit conduct was promoting a low score.
- The unsecured loans like credit card loan or individual towards a low score as these intercede to a more negative score.
- The credit and advance must be kept up by making a legitimate blend of it.
- If there is a background marked by paying the levy late or a great deal of amount pending (Reich and Benbasat 2013).
- The MasterCard over utilization or using as far as possible each time adds to a low score.
b. exploratory analysis
The above exploratory analysis is used to find the five key elements; these elements are listed underneath.
- Property
- Duration in month of loan
- Status of existing checking account
- Some people being liable to provide maintenance.
- Present employment since
Property: The properties are an asset of borrower or customers, which can increase the credit rating of a person. Now from the above exploratory analysis the researcher finds a property as one of the top five variables (Reich and Benbasat 2013). If a customer or person wants to take a loan from any bank, the banks check the asset of the borrower. Now depends on these assets the borrower get a credit score.
Duration in the month of loan: The variable called Duration in a month is used to calculate the number of installments over the total amount (Rautenstrauch, Seelmann-Eggebert and Turowski 2012). Now this variable is calculated after grated the loan amount. Now the banks system can check the duration of how much installment are left or due.
Status of an existing checking account: The status of an existing checking account is used to determine the borrower have any existing account (Friedman et al. 2013). In the above exploratory analysis, this variable got the first checkpoint where every customer checks this at the initial stage of application.
Task
Some people being liable to provide maintenance: The proportion for this variable is effectively recognize from different variables through exploratory investigation (Bharadwaj et al. 2013). It is utilized to check the limit of the obligation of upkeep for their credit account. In this decision tree, it is another significant part of a potential advance candidate.
Present employment since Presently the credit proportion identifier utilizes this variable to get the proportion of a decent and terrible status of an induvial job (Reich and Benbasat 2013). Presently in the decision tree, it is utilized to check the livelihood status like is the business since short of what one year or under four years and unemployed or above seven years.
a. A data warehouse is a collection of databases containing a large amount of various critical data and pieces information. Data warehouses are generally used by business organizations for storing business data in categorized databases. For designing a suitable data warehouse, a data warehouse architecture is necessary. There are two data warehouse architectures suggested by Kimball and Inmon, which are commonly known as Kimball’s methodology and Inmon’s methodology (Sagiroglu and Sinanc 2013). Any one of the two methodologies can be used. However, there are some basic differences between the two methodologies. In practice, the most popular architecture used in most business organizations is Kimball Architecture Model.
There are three approaches to follow for the Kimball Architecture Model, also known as Kimball Bus Architecture (Kimball and Ross 2013). There are three intermediate stages through which the three data inputs are to be provided. The three data inputs are order transactions, snapshots of inventory and the payment transactions. Any of these three can be used for the data mining in the business organization. The data input is provided to the data staging for 3NF DW. After that, the data is sent to the Normalized Data Warehouse. Then the output is used for data staging Dim Data Warehouse. After that, the data is sent to the respective category within the data warehouse.
The Kimball Architecture is generally a dimensional approach. There are several aspects to be applied for implementing Kimball’s approach.
- Various sources of the data are to be collected and integrated in a real-time approach. Due to this, the users can access a lot of data within a compact space. Moreover, it saves the retrieval time of the data from different sources.
- The huge number of data from multiple sources should be kept in separate relevant categories by using general formats, keys, data model and access methods.
- A transaction history should also be kept for reference purposes.
- The data as well as the tables and fields can be restructured for the convenience of the user.
- Master data management should be used to integrate all relevant customer tables into one table.
- Daily running reports are to be updated daily to avoid operational or performance errors. Moreover, optimization should be done to for convenient access of reading and writing as well as faster operations and report generations.
- After the data is imported to the warehouse, the data should be cleaned up from the draft to increase quality of data. In addition, this enables the user to provide consistent codes and details of data.
- Consistency of production reports from various departments should be balanced and maintained. This will enable the user to match similarity and verify accuracy of reports.
- Data warehouses help in creating suitable business intelligence. Hence, the architecture should be suitably designed to create a proper business intelligence.
According to Ralph Kimball, the creator of the bus architecture, a dimensional model must be used to design a data warehouse. On the other hand, Bill Inmon popularized the 3NF model due to its normalized version.
As proposed by Kimball, data is to be partitioned and verified for accuracy of the numeric transactional information. The major advantage of the Kimball’s dimensional approach is that the data warehouse is easy to operate and data can be accessed easily. The retrieval process of the data can also be operated easily using the Kimball’s dimensional approach (Kimball and Ross 2013). However, there are several disadvantages to this approach. The main disadvantage is that for maintaining accuracy and integration of the data in the whole data warehouse, the retrieval and loading of data from the warehouse databases become complicated and take huge amount of time. Moreover, if normalized approach is integrated with the dimensional one, the operations can take a while to completely integrate with the system.
Exploratory Data Analysis and Decision Tree Analysis
Finally, it can be said that the most popular architecture is Kimball’s bus architecture. The main reason behind this is that, this approach is more suited to business needs as the business operations can be done easily and efficiently (Sagiroglu and Sinanc 2013). However, there is a risk of losing all the data as a specific master plan is not defined. On the other hand, Inmon’s architecture provides master plan support for the data warehouse.
b. The Kimball’s Architecture design can be efficiently used to incorporate the business operations within the data warehouse system. Business operations consist of several steps of data mining:
Capture – This step is used to capture or intake the required piece of data or information. Generally, this data includes transactional and operational information. This data is analyzed and verified for checking of accuracy and errors.
Processing – The captured data is processed in a processing centre. Here, the data is cross-verified and compared with existing records for verifying accuracy. Moreover, the transactional data from different departments are compared and balanced for checking accuracy and maintaining a suitable balance. Then the data is processed for division and categorization.
Storage – The processed data is divided into separate categories according to relevance and sent to the data warehouse for storage. The search engine is optimized in order to access the necessary data from the database easily. The retrieval process of the data can also be operated easily using the Kimball’s dimensional approach.
Presentation – This process is used to retrieve the necessary data from the data warehouse and present it to the user according to the user’s request.
In case of Big Data, a huge amount of data is stored. In generally, a number of databases are stored within the big data. Ordinary data access and search engines cannot be used for accessing data stored in Big Data. For Big Data, special access tools are used. For a business organization, for implementing Big Data tools, a data architecture is necessary (Sagiroglu and Sinanc 2013). For this purpose, a data warehouse is needed. As described in the previous question, a data warehouse is a collection of databases containing a large amount of various critical data and pieces information. These data warehouses are units of Big Data used for the easy access of the data. Now, the existing architecture models help in setting up the data warehouses. As discussed in the previous part, there are two data warehouse architectures suggested by Kimball and Inmon, which are commonly known as Kimball’s methodology and Inmon’s methodology. Any one of the two methodologies can be used. However, there are some basic differences between the two methodologies (Kimball and Ross 2013). In practice, the most popular architecture used in most business organizations is Kimball Architecture Model. According to Kimball’s model, data is to be partitioned and verified for accuracy of the numeric transactional information. The major advantage of the Kimball’s dimensional approach is that the data warehouse is easy to operate and data can be accessed easily. These processes can be used for the capture as well as processing of the data. Kimball’s model provides easy modes of capture and process. Hence, most business organizations prefer to implement Kimball’s model. The retrieval process of the data can be operated easily using the Kimball’s dimensional approach. Kimball’s dimensional approach enables the user to use the provided search tool to search for the requirement information within the big data (Sagiroglu and Sinanc 2013). Moreover, Kimball’s model provides easy approach for retrieving the required data from the Big Data unit of the data warehouse. However, there are several disadvantages to this approach for the processing of Big Data. The main disadvantage is that for maintaining accuracy and integration of the data in each Big Data unit, the retrieval and loading of data from the warehouse databases become complicated and take huge amount of time. As there are huge amount of data in each Big Data unit, the processing of the user’s approach requires huge amount of time for processing. Moreover, if normalized approach is integrated with the dimensional one, the operations can take a while to completely integrate with the system.
Data Warehousing and Big Data
Although there are several drawbacks in the Kimball’s model, there are many advantages regarding the operation of the Big Data. These advantages have enabled Kimball’s model to gain so much popularity. Firstly, numerous sources of the data can be collected and integrated within the Big Data unit. Due to this, the users can access a large amount of data from the Big Data unit within a compact space and using a simple Big Data search tool (Sagiroglu and Sinanc 2013). Moreover, it saves the retrieval time of the data from different sources as all the similar data are kept in a single location. Secondly, huge number of data from multiple sources can in separate relevant categories according to general formats, keys, data model and access methods. This helps the user to find the necessary piece of information easily. Thirdly, when the data is imported to the Big Data unit, the residual data is cleaned up from the draft to increase quality of data. This enables the user to provide consistent codes and details of data at regular basis without difficulty. Finally, Kimball’s model verifies the data from different departments and matches them to produce a balanced data sheet (Kimball and Ross 2013). Hence, Kimball’s model, as discussed in the previous question, can efficiently integrate all the data operations in the Big Data.
a. The graph for Sales by Product Category over the years 2009 to 2012
In above graph shows the sales of a product over the year 2009 to 2012. Here three type of product is chosen for populate their sales growth starting from the year 2009 to 2012. Every product in this graph is arranged in ascending order. At the first slot, the sales report of furniture is shown. Now analyze the sales report of furniture, the researcher identifies that in 2009, the sale of furniture is reached the highest point, and 2012 is the lowest point. Now look at the second slot, where sales of official supplies data use to generate the sales report. In this slot, the researcher identifies that year 2009, and 2011 is the highest and lowest sales achieved accordingly. At the last slot shows, sales graph of technology, likewise the previous assumption here the researcher find the highest sale in 2009 and lowest in 2011. After oversees entire trend analysis the researcher finds the maximum profit came in 2009.
b. The graph for Product Category Average Profit and Total Sales for each month over the years 2009 to 2012
Sales Reports using Tableau Desktop
The above graph shows the average profit and total sales for each month of three years form 2009. Here the researcher split a year by two periods of six months. Every period shows the average profit and total sales of the company. At the first period of 2009 shows the average profit and total sales. There the researcher identifies that at the beginning the sales rate was very high, and it decreases with the time. Likewise the year 2009, 2010 to 2012 all profit and total sales are a point on the above graph. Accordingly, this pattern examination comprehends about the pick time of the deals, and it will help in their creation line. On the off chance that any industry utilizes this sort of information examination, then they can discover their opportunity to accomplish their destinations.
c. The graph for relative sizes by City within each state, Product Sales for the year 2010
In the above geographical map, show the locations (city) of products sales. Here the researcher uses the address of market where the products are sold to find these locations on this geographical map. Here the researcher also filters all this data according to the highest and lowest sales. The items subcategories are subdividing into various three-isolated angle like profit, sales and unit prices. In the above chart demonstrates the each city of a state where the item is sold. This graphical presentation upgrades the advertising idea of industry; it finds the business sector for their item. The past data is utilized to evaluations this graphical presentation it reflects later on patterns. The company can use this analysis to know how much their organizational growth increases in each year.
d. The graph for product Sub Categories that are technology based Unit Prices, Sales and Profit for each month over the years 2009 to 2012
The above graph shows the sales, profit and unit price for 2009 to 2012. Here the researcher use sales, unit price and profit data to find this graph. Here the X-pivot is used for the value and the Y-pivot use for three diverse profit, unit cost and sales. Similarly, the earlier the year 2009 this chart proceed up to 2012. This graph shows the trend analysis of this three variable along with each month of 2009 to 2012. In this graph, every month, show three bars which indicate the unit price, profit, and sales. Now the sales bar for each month is higher than other two bars. According to this graph, the company knows for their pick month for sales, the month for highest profit achieved and the month for the company having no profit.
References
Bharadwaj, A., El Sawy, O.A., Pavlou, P.A. and Venkatraman, N., 2013. Digital business strategy: toward a next generation of insights. Mis Quarterly, 37(2), pp.471-482.
Friedman, B., Kahn Jr, P.H., Borning, A. and Huldtgren, A., 2013. Value sensitive design and information systems. In Early engagement and new technologies: Opening up the laboratory (pp. 55-95). Springer Netherlands.
Kimball, R. and Ross, M., 2013. The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.
Rautenstrauch, C., Seelmann-Eggebert, R. and Turowski, K. eds., 2012.Moving into Mass Customization: Information Systems and Management Principles. Springer Science & Business Media.
Reich, B.H. and Benbasat, I., 2013. 10 Measuring the Information Systems–Business Strategy Relationship. Strategic Information Management, p.265.
Sagiroglu, S. and Sinanc, D., 2013, May. Big data: A review. In Collaboration Technologies and Systems (CTS), 2013 International Conference on (pp. 42-47). IEEE.