Dataset 1
In Australia, many people require to lodge a tax return after the end of the financial year. Australians could prepare and lodge a own tax return or pay a registered tax agent to do it for themselves. By utilising a subset of the sample-file from the Australian Taxation Office (ATO), we are summarising and analysing several hidden facts of lodgement method.
The report is constructed by elaboration of two datasets. The first dataset has five variables that are Gender, age_range, Lodgement_method, Tot_inc_amt and Tot_ded_amt. We are eager to know the proportion of people who lodge a tax return utilizing a tax agent. We are also eager to see the difference among the age groups according to their lodging group. We are investigating whether there is an association between total income and lodgement method or not. Lastly, we are testing whether there is a relationship between total income and deduction amount or not.
A tax agent to lodge a tax return in the future structures the second dataset. The dataset 2 is a collection of preference of international students of tax return lodgement method.
The samples of dataset1 are secondary in nature. We collected the data from internet sources. In dataset1, Gender is a factorised qualitative data, age_range is the ordinal variable, Lodgement_method is the nominal variable, Tot_inc_amt and Tot_inc_amt are the numeric variables.
The samples of dataset2 are primary in nature. We gather the data by survey method. In dataset2, we took into account only quantitative dataset that is actually lodgement method. The data is collected by survey method. Our target population was 200 students, among which 30 students incurred to respond. The remaining 170 students responded about the question of sample survey method. The data sampling method is simple random sampling without replacement that is unbiased. However, we eliminated missing data. Therefore, bias may arise. The dataset contains two variables that are country_name and Lodgment_method. Lodgement method is nominal data necessary for this analysis.
Lodgment |
Frequency |
Proportion |
Agent |
732 |
0.732 |
Self |
268 |
0.268 |
Total |
1000 |
1 |
The frequency table in dataset 1 indicates that among 1000 people 732 people (73.2%) make their lodgment by Agents. Only 268 people (26.8%) make their lodgment by self-preparation.
The pie chart indicates the share of two types of lodgment methods that are agent and self in dataset 1.
One sample proportional Z-test |
|
|
proportion (p) = |
0.732 |
|
(1-p) = |
0.268 |
|
total sample = |
1000 |
|
standard error = |
0.014006 |
|
confidence limit = |
95% |
|
z-value at 0.05 critical region = |
1.96 |
|
Confidence Intervals |
||
upper confidence interval = |
0.759452 |
|
lower confidence interval = |
0.704548 |
We apply one sample proportional z-test for testing the proportion of lodging method as agent. The actual proportion of lodging method as agent is 0.732. The calculated confidence intervals of proportion of lodging method as agent are 0.704548 and 0.759452 at 95% confidence limit. That means, there is 95% probability of being the proportion of lodging method via agent between these two intervals.
Lodgment |
Frequency |
Proportion |
Agent |
118 |
0.694117647 |
Self |
52 |
0.305882353 |
Total |
170 |
1 |
Dataset 2
The frequency table in dataset 1 indicates that among 170 people 118 people (69.4%) make their lodgment by Agents. Only 52 people (30.6%) make their lodgment by self-preparation.
The pie chart indicates the share of two types of lodgment methods that are agent and self in dataset 2.
One sample proportional Z-test |
||
proportion (p) = |
0.694117647 |
|
(1-p) = |
0.305882353 |
|
total sample = |
170 |
|
standard error = |
0.035340224 |
|
confidence limit = |
95% |
|
z-value at 0.05 critical region = |
1.959963985 |
|
Confidence Intervals |
||
upper confidence interval = |
0.763383213 |
|
lower confidence interval = |
0.624852081 |
We apply one sample proportional z-test for testing the proportion of lodging method as agent in dataset 2. The actual proportion of lodging method as agent is 0.694117647. The calculated confidence intervals of proportion of lodging method as agent are 0.624852081 and 0.763383213. That means there is 95% probability of being the proportion of lodging method via agent between these two intervals in dataset 2.
The first dataset has greater sample that surveyed dataset (1000>170). In the first dataset, among 1000 people 732 people (73.2%) make their lodging by agent. Whereas, in the second dataset, among 170 people 118 people (more than 69.4%) make their lodging by agent. The survey result gives lesser percentage of lodging. We apply two samples proportional z-test of equality of proportions.
Null hypothesis: |
The proportions are equal for both the datasets. |
|
Alternative hypothesis: |
The proportions are unequal for both the datasets. |
|
Two sample z-test |
||
dataset 1 |
total sample |
1000 |
lodging by agent |
732 |
|
proportion (p1bar) |
0.732 |
|
dataset 2 |
total sample |
170 |
lodging by agent |
118 |
|
proportion (p2bar) |
0.694117647 |
|
total sample |
1170 |
|
total lodging by agent |
850 |
|
total proprtion (p-bar) |
0.726495726 |
|
numerator of z-statistic |
(p1bar – p2bar) |
0.037882353 |
pbar* (1-pbar) |
0.198699686 |
|
propotion*sample |
0.001367521 |
|
denominator of z-statistic |
SQRT(proprtion*sample) |
0.036980013 |
z-statistic |
1.024400745 |
|
p-value |
0.15386 |
|
Decision-making |
Null hypothesis accepted |
We apply two samples z-test for testing the equality of means. The calculated z-statistic is 1.024400745. According to the calculated p-value, we accept the null hypothesis at 95% confidence interval. Therefore, there is 95% probability of being the proportions of two datasets equal.
Correlation Coefficient
|
age_range |
Lodgment_method |
age_range |
1 |
|
Lodgment_method |
0.108090763 |
1 |
The Pearson correlation coefficient is 0.108090763. Therefore, the correlation coefficient between age range and lodgment method is insignificant and ignorable. The amount of age range and lodgment method is uncorrelated to each other. For calculating the correlation coefficient, we leveled “agent” as 1 and “self” as 2 for making qualitative variable quantitative variable.
Lodging Method is Self:
Numerical Summary
age_range |
|
Mean |
6.395522388 |
Standard Error |
0.21041402 |
Median |
7 |
Mode |
9 |
Standard Deviation |
3.444625965 |
Sample Variance |
11.86544804 |
Kurtosis |
-1.00656719 |
Skewness |
-0.490030757 |
Range |
11 |
Minimum |
0 |
Maximum |
11 |
Sum |
1714 |
Count |
268 |
Largest |
11 |
Smallest |
0 |
Confidence Level (95.0%) |
0.414281758 |
upper control limit |
6.807933867 |
lower control limit |
5.983110909 |
The age range of “Self” lodging method has the 95% probability of being in the interval 6.807933867 and 5.983110909.
Table: The frequency distribution table of age group when lodging method is “Self”
age_group |
frequency |
cumulative frequency |
percentage of frequency |
cumulative percentage of frequency |
0 |
24 |
24 |
8.96% |
8.96% |
1 |
9 |
33 |
3.36% |
12.31% |
2 |
16 |
49 |
5.97% |
18.28% |
3 |
18 |
67 |
6.72% |
25.00% |
4 |
13 |
80 |
4.85% |
29.85% |
5 |
16 |
96 |
5.97% |
35.82% |
6 |
26 |
122 |
9.70% |
45.52% |
7 |
18 |
140 |
6.72% |
52.24% |
8 |
26 |
166 |
9.70% |
61.94% |
9 |
44 |
210 |
16.42% |
78.36% |
10 |
37 |
247 |
13.81% |
92.16% |
11 |
21 |
268 |
7.84% |
100.00% |
total |
268 |
1 |
The frequency is maximum for age group number “9” and minimum for age group number “1”.
The percentage of frequency is maximum for age group number “9” and minimum for age group number “1”.
Lodging Method is Agent:
Numerical Summary
age_range |
|
Mean |
5.640710383 |
Standard Error |
0.108390312 |
Median |
6 |
Mode |
7 |
Standard Deviation |
2.932553929 |
Sample Variance |
8.599872545 |
Kurtosis |
-0.923068312 |
Skewness |
-0.150394186 |
Range |
11 |
Minimum |
0 |
Maximum |
11 |
Sum |
4129 |
Count |
732 |
Largest |
11 |
Smallest |
0 |
Confidence Level (95.0%) |
0.212793429 |
upper control limit |
5.853155394 |
limit control limit |
5.428265371 |
The age range of “agent” lodging method has the 95% probability of being in the interval 5.853155394 and 5.428265371.
Table: The frequency distribution table of age group when lodging method is “Agent”
age_group |
frequency |
cumulative frequency |
percentage of frequency |
cumulative percentage of frequency |
0 |
38 |
38 |
5.19% |
5.19% |
1 |
31 |
69 |
4.23% |
9.43% |
2 |
49 |
118 |
6.69% |
16.12% |
3 |
77 |
195 |
10.52% |
26.64% |
4 |
76 |
271 |
10.38% |
37.02% |
5 |
74 |
345 |
10.11% |
47.13% |
6 |
75 |
420 |
10.25% |
57.38% |
7 |
88 |
508 |
12.02% |
69.40% |
8 |
78 |
586 |
10.66% |
80.05% |
9 |
71 |
657 |
9.70% |
89.75% |
10 |
59 |
716 |
8.06% |
97.81% |
11 |
16 |
732 |
2.19% |
100.00% |
total |
732 |
100.000000% |
The frequency is maximum for age group number “7” and minimum for age group number “11”.
Lodgement Method-Dataset 1
The percentage of frequency is maximum for age group number “7” and minimum for age group number “11”.
95% confidence interval of correlation coefficient |
|
Pearson Correlation Coefficient ( r ) |
0.108090763 |
Z’ |
0.108514702 |
Number of samples (N) |
1000 |
Standard Error [1/SQRT(N-3)] |
0.031670318 |
Z(0.05,997) |
1.96 |
Confidence intervals |
|
Lower limit of Z’ |
0.046440879 |
Upper limit of Z’ |
0.170588525 |
Lower limit of r |
0.046407521 |
Upper limit of r |
0.168952828 |
The leveled age group and lodging method has correlation coefficient (0.108090763). The two factors are uncorrelated at 95% confidence intervals also as the lower and upper limits of correlation coefficients are respectively (0.046407521) and (0.168952828).
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.108090763 |
|||||
R Square |
0.011683613 |
|||||
Adjusted R Square |
0.010693316 |
|||||
Standard Error |
0.440763543 |
|||||
Observations |
1000 |
|||||
ANOVA |
||||||
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
2.292044456 |
2.292044 |
11.79809 |
0.000617314 |
|
Residual |
998 |
193.8839555 |
0.194273 |
|||
Total |
999 |
196.176 |
||||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
1.177557148 |
0.029792573 |
39.52519 |
2.2E-206 |
1.119093878 |
1.236020418 |
age_range |
0.015478838 |
0.004506429 |
3.434835 |
0.000617 |
0.006635676 |
0.024322001 |
The value of multiple R-square is 0.011683613. Calculated F-statistic is 11.79809. These two variables are not highly associated with each other. The p-value 0.000617314 less than 0.05 interprets that these age range has insignificant relationship with lodgment method at 95% confidence interval.
The calculations of “part-a” and “part-b” indicate that the age range and Lodgment method of the 1000 people is uncorrelated. However, the histograms and frequency tables infer that agent more normally distributes the distribution of age group by self-preparation lodgment method than the distribution of age group.
|
Lodgment_ method |
Tot_inc_amt |
Lodgment_method |
1 |
|
Tot_inc_amt |
-0.071848524 |
1 |
The Pearson correlation coefficient is (-0.071848524). Therefore, the correlation coefficient between amount of total income and lodgment method is insignificant and ignorable. The amount of total income and lodgment method are uncorrelated. For calculating the correlation coefficient, we leveled “agent” as 1 and “self” as 2 for making qualitative variable quantitative variable.
Tot_inc_amt |
|
Mean |
66423.89 |
Standard Error |
4119.671 |
Median |
48222.5 |
Mode |
0 |
Standard Deviation |
111459.8 |
Sample Variance |
1.24E+10 |
Kurtosis |
114.1994 |
Skewness |
9.494428 |
Range |
1693122 |
Minimum |
-37 |
Maximum |
1693085 |
Sum |
48622288 |
Count |
732 |
Largest |
1693085 |
Smallest |
-37 |
Confidence Level (95.0%) |
8087.799 |
Upper confidence limit |
74498.45 |
Lower confidence limit |
58349.33 |
For the lodgment method “agent”, the average amount of total amount income is $66423.89. The 95% confidence limit for total amount of income is ranges between $74498.45 and $58349.33.
The line plot of Total Income Amount of income for the people whose lodgment method is agent
Tot_inc_amt |
|
Mean |
49783.85448 |
Standard Error |
4402.27677 |
Median |
35804 |
Mode |
0 |
Standard Deviation |
72068.37673 |
Sample Variance |
5193850924 |
Kurtosis |
83.48626709 |
Skewness |
7.591961655 |
Range |
924342 |
Minimum |
0 |
Maximum |
924342 |
Sum |
13342073 |
Count |
268 |
Largest |
924342 |
Smallest |
0 |
Confidence Level (95.0%) |
8667.592386 |
Upper confidence limit |
58412.31695 |
Lower confidence limit |
41155.39201 |
For the lodgment method “agent”, the average amount of total amount income is $49783.85448. The 95% confidence limit for total amount of income is ranges between $58412.31695 and $41155.39201.
The line plot of Total Income Amount of income for the people whose lodgment method is self
95% confidence interval of correlation coefficient |
|
Pearson Correlation Coefficient ( r ) |
-0.071848524 |
Z’ |
-0.071972541 |
Number of samples (N) |
1000 |
Standard Error [1/SQRT(N-3)] |
0.031670318 |
Z(0.05,997) |
1.96 |
Confidence intervals |
|
Lower limit of Z’ |
-0.134046363 |
Upper limit of Z’ |
-0.009898718 |
Lower limit of r |
-0.133249225 |
Upper limit of r |
-0.009898394 |
The Pearson correlation coefficient between total amount of income and lodgment methods is (-0.071848524). Therefore, these two factors are uncorrelated.
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.071848524 |
|||||
R Square |
0.00516221 |
|||||
Adjusted R Square |
0.004165379 |
|||||
Standard Error |
0.44221534 |
|||||
Observations |
1000 |
|||||
ANOVA |
||||||
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
1.012701781 |
1.012702 |
5.178619 |
0.023077745 |
|
Residual |
998 |
195.1632982 |
0.195554 |
|||
Total |
999 |
196.176 |
||||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
1.287223099 |
0.016337405 |
78.78994 |
0 |
1.255163494 |
1.319282704 |
Tot_inc_amt |
-3.10228E-07 |
1.36325E-07 |
-2.27566 |
0.023078 |
-5.77744E-07 |
-4.27124E-08 |
The value of multiple R-square (0.00516221) indicates that the association of these two factors is insignificant.
The box plot indicates the distribution of total amount of income. Its spread is very high. However, the quartiles are below $200000.
Tot_amt_inc via agent |
|
Minimum |
-37 |
Maximum |
1693085 |
1st Quartile |
26618.25 |
2nd Quartile (Median) |
48222.5 |
3rd Quartile |
76060.75 |
Bottom |
26618.25 |
2q Box |
21604.25 |
3q Box |
27838.25 |
Whisker- |
26655.25 |
Whisker+ |
1617024.25 |
IQR |
49442.5 |
Upper bound |
150224.5 |
Lower bound |
-47545.5 |
The “five point summary” indicates the distribution of amount of total income by the agents. The minimum, first quartile, second quartile, third quartile and maximum of the amount of income are $(-37), $26618.25, $4822.5, $76060.75 and $1693085.
Tot_amt_inc by self preparation |
|
Minimum |
0 |
Maximum |
924342 |
1st Quartile |
15774 |
2nd Quartile (Median) |
35804 |
3rd Quartile |
65176.5 |
Bottom |
15774 |
2q Box |
20030 |
3q Box |
29372.5 |
Whisker- |
15774 |
Whisker+ |
859165.5 |
IQR |
49402.5 |
Upper bound |
139280.25 |
Lower bound |
-58329.75 |
The “five point summary” indicates the distribution of amount of total income by self-preparation. The minimum, first quartile, second quartile, third quartile and maximum of the amount of income are $0, $15774, $35804, $65176.5 and $924342. Inter-quartile range is higher for total amount of income by self-preparation.
Tot_amt_inc |
|
Minimum |
-37 |
Maximum |
1693085 |
1st Quartile |
23521.75 |
2nd Quartile (Median) |
45840 |
3rd Quartile |
74404 |
Bottom |
23521.75 |
2q Box |
22318.25 |
3q Box |
28564 |
Whisker- |
23558.75 |
Whisker+ |
1618681 |
IQR |
50882.25 |
Upper bound |
150727.375 |
Lower bound |
-52801.625 |
Lodgement Method-Dataset2
The “five point summary” indicates the distribution of amount of total income. The minimum, first quartile, second quartile, third quartile and maximum of the amount of income are $(-37), $23521.75, $45840, $74404 and $1693085.
The box plot of distribution of total amount of income indicates that spread is very high from small negative value to high positive value. However, three quartiles, minimum and maximum values of the distribution lie in the interval $0 to $200000.
The grouped box plot of total amount of income refers that the range and spread is higher for total amount of income via agent than total amount of income via self-preparation. The quartiles (first, second and third) for total amount of income via agent are greater than total amount of income by self-preparation. The maximum amount of income is significantly greater in case of agent than self-preparation.
The calculated number of Outliers for total income by both the lodging method is 49, the outliers for total income by lodging method via agent is 39 and the outliers for total income by lodging method via self preparation is 12. The percentages of outliers in all the three cases are 4.9%, 5.32787% and 4.4776%. The number of outliers for total income by lodging method via agent is maximum in percentage.
For both the self and agent lodging method:
|
Tot_inc_amt |
Tot_ded_amt |
Tot_inc_amt |
1 |
|
Tot_ded_amt |
0.433918351 |
1 |
Linear Regression between amount of total income and amount of total deduction for the people whose lodging method is agent:
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.433918351 |
|||||
R Square |
0.188285136 |
|||||
Adjusted R Square |
0.187471794 |
|||||
Standard Error |
7680.455019 |
|||||
Observations |
1000 |
|||||
ANOVA |
||||||
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
13655794661 |
1.37E+10 |
231.4958 |
3.59212E-47 |
|
Residual |
998 |
58871410515 |
58989389 |
|||
Total |
999 |
72527205176 |
||||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
415.2418564 |
283.7502247 |
1.463406 |
0.143671 |
-141.5736382 |
972.057351 |
Tot_inc_amt |
0.036024597 |
0.002367705 |
15.21499 |
3.59E-47 |
0.031378346 |
0.040670848 |
The residual scatter plot of total income amount vs. total deduction amount for the people whose lodgment method is agent
Correlation Coefficient |
||
|
Tot_inc_amt |
Tot_ded_amt |
Tot_inc_amt |
1 |
|
Tot_ded_amt |
0.428452411 |
1 |
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.428452411 |
|||||
R Square |
0.183571469 |
|||||
Adjusted R Square |
0.180502188 |
|||||
Standard Error |
9173.929557 |
|||||
Observations |
268 |
|||||
ANOVA |
||||||
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
5033608656 |
5033608656 |
59.80929 |
2.16458E-13 |
|
Residual |
266 |
22386821614 |
84160983.51 |
|||
Total |
267 |
27420430271 |
||||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
-1240.313957 |
681.5035713 |
-1.819966922 |
0.069888 |
-2582.141512 |
101.5135984 |
Tot_inc_amt |
0.060247545 |
0.007790315 |
7.733646607 |
2.16E-13 |
0.04490902 |
0.07558607 |
The residual scatter plot of total income amount vs. total deduction amount for the people whose lodgment method is self
Correlation Coefficient |
||
|
Tot_inc_amt |
Tot_ded_amt |
Tot_inc_amt |
1 |
|
Tot_ded_amt |
0.457006824 |
1 |
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.457006824 |
|||||
R Square |
0.208855238 |
|||||
Adjusted R Square |
0.207771478 |
|||||
Standard Error |
6969.341502 |
|||||
Observations |
732 |
|||||
ANOVA |
||||||
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
9360429247 |
9.36E+09 |
192.7136 |
4.6926E-39 |
|
Residual |
730 |
35457356312 |
48571721 |
|||
Total |
731 |
44817785560 |
||||
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Intercept |
840.2284205 |
299.9216382 |
2.801493 |
0.005221 |
251.416582 |
1429.040259 |
Tot_inc_amt |
0.032104882 |
0.002312677 |
13.88213 |
4.69E-39 |
0.02756459 |
0.036645172The residual scatter plot of total income amount vs. total deduction amount for both kind of lodgment method |
- All people: The Pearson correlation coefficient ( r ) of total income amount and total deduction amount of all the 1000 samples is 0.433918351. Therefore, we can tell that there exists a moderate positive correlation between these two factors.
- Self-lodgmaent method: The Pearson correlation coefficient between total income amount and total deduction amount for those people whose lodgment is self, is calculated as 0.428452411. We can say that there exists a moderate positive correlation between these factors for the particular lodgment.
- Agent lodgment method: The Pearson correlation coefficient between total income amount and total deduction amount for those people whose lodgment is agent, is calculated as 457006824. We can say that there exists a moderate positive correlation between these factors for the particular lodgment. The correlation is highest for lodgment method via agent.
- People whose lodgment method is self-preparation: The regression model considers the total deduction amount as response or dependent variable and amount of total income as single predictor or independent variable.
The linear regression equation of the model is- Y = β0 + β1 * X.
We can write, Tot_ded_amount = 415.2418564 – 0.036024597* Tot_inc_amt.
The value of multiple R-square (coefficient of determination) is 0.188285136. Therefore, only 18.82% variability of house price is described by total deduction amount (Faraway 2016). The linear association of these two factors is not significant. The value of F-statistic is 231.4958. The p-value of the model is 3.59212E-47. It is less than 0.05. Therefore, we reject the null hypothesis at 95% confidence interval. According to the slope of predictor (β1 = 0.036024597), we can state that amount of total income is positively associated with total deduction amount. Therefore, we can conclude that amount of total income is greater than amount of total deduction. The graph showed that the fitting is not also good.
- People whose lodgment method is agent: The regression model considers the total deduction amount as response or dependent variable and amount of total income as single predictor or independent variable.
Two Sample Proportional Z-Test
The linear regression equation of the model is- Y = β0 + β1 * X.
We can write, Tot_ded_amount = -1240.313957- 0.060247545* Tot_inc_amt.
The value of multiple R-square (coefficient of determination) is 0.183571469. Therefore, only 18.35% variability of house price is described by total deduction amount (Faraway 2016). The linear association of these two factors is not significant. The value of F-statistic is 59.80929. The p-value of the model is 2.16458E-13. It is less than 0.05. Therefore, we reject the null hypothesis at 95% confidence interval. According to the slope of predictor (β1 = 0.060247545), we can state that amount of total income is positively associated with total deduction amount. Therefore, we can conclude that amount of total income is greater than amount of total deduction. The graph showed that the fitting is not also good.
- All people: The regression model considers the total deduction amount as response or dependent variable and amount of total income as single predictor or independent variable.
The linear regression equation of the model is- Y = β0 + β1 * X.
We can write, Tot_ded_amount = 840.2284205 – 0.032104882* Tot_inc_amt.
The value of multiple R-square (coefficient of determination) is 0.208855238. Therefore, only 20.88% variability of house price is described by total deduction amount (Faraway 2016). The linear association of these two factors is insignificant. The value of F-statistic is 192.7136. The p-value of the model is 4.6926E-39. It is less than 0.05. Therefore, we reject the null hypothesis at 95% confidence interval. According to the slope of predictor (β1 = 0.032104882), we can state that amount of total income is positively associated with total deduction amount. Therefore, we can conclude that amount of total income is greater than amount of total deduction. The graph showed that the fitting is not also good.
Conclusion
The conclusion that we can draw from the previous results that lodging method is not at all reflected by amount of total income and age range. Total income amount is not also significantly associated and related with total deduction amount. The people whose lodgment is done by agents are more in count. They have higher maximum price and quartile prices. The spread of total income amount is high in case of agents than self-preparation lodgment.
We can add many other factors including qualitative and quantitative variables. The regression models might involve more predictors for the response. The model prediction would be more apprehensible. More samples could give the accurate result. The logistic model may provide further explanation about the dataset. The behavior of lodgment method can be described by gender distribution and total amount of income.
References:
Bedeian, Arthur G. ““More than meets the eye”: A guide to interpreting the descriptive statistics and correlation matrices reported in management research.” Academy of Management Learning & Education 13.1 (2014): 121-135.
Chen, Zhongxue, and Saralees Nadarajah. “On the optimally weighted z-test for combining probabilities from independent studies.” Computational Statistics & Data Analysis 70 (2014): 387-394.
Cleophas, Ton J., and Aeilko H. Zwinderman. “Data Spread: Standard Deviations, One Sample Z-Test, One Sample Binomial Test.” Clinical Data Analysis on a Pocket Calculator. Springer International Publishing, 2016. 201-205.
Faraway, Julian J. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Vol. 124. CRC press, 2016.
Fayers, Peter M., and David Machin. Quality of life: the assessment, analysis and interpretation of patient-reported outcomes. John Wiley & Sons, 2013.
Puth, Marie-Therese, Markus Neuhäuser, and Graeme D. Ruxton. “Effective use of Pearson’s product–moment correlation coefficient.” Animal Behaviour 93 (2014): 183-189.
Spitzer, Michaela, et al. “BoxPlotR: a web tool for generation of box plots.” Nature methods 11.2 (2014): 121-122.
Tyner, Bryan C., and Daniel M. Fienup. “A comparison of video modeling, text?based instruction, and no instruction for creating multiple baseline graphs in Microsoft Excel.” Journal of applied behavior analysis 48.3 (2015): 701-706.