Dataset 1
In Australia, many people require to lodge a tax return after the end of the financial year. Australians could prepare and lodge a own tax return or pay a registered tax agent to do it for themselves. By utilising a subset of the samplefile from the Australian Taxation Office (ATO), we are summarising and analysing several hidden facts of lodgement method.
The report is constructed by elaboration of two datasets. The first dataset has five variables that are Gender, age_range, Lodgement_method, Tot_inc_amt and Tot_ded_amt. We are eager to know the proportion of people who lodge a tax return utilizing a tax agent. We are also eager to see the difference among the age groups according to their lodging group. We are investigating whether there is an association between total income and lodgement method or not. Lastly, we are testing whether there is a relationship between total income and deduction amount or not.
A tax agent to lodge a tax return in the future structures the second dataset. The dataset 2 is a collection of preference of international students of tax return lodgement method.
The samples of dataset1 are secondary in nature. We collected the data from internet sources. In dataset1, Gender is a factorised qualitative data, age_range is the ordinal variable, Lodgement_method is the nominal variable, Tot_inc_amt and Tot_inc_amt are the numeric variables.
The samples of dataset2 are primary in nature. We gather the data by survey method. In dataset2, we took into account only quantitative dataset that is actually lodgement method. The data is collected by survey method. Our target population was 200 students, among which 30 students incurred to respond. The remaining 170 students responded about the question of sample survey method. The data sampling method is simple random sampling without replacement that is unbiased. However, we eliminated missing data. Therefore, bias may arise. The dataset contains two variables that are country_name and Lodgment_method. Lodgement method is nominal data necessary for this analysis.
Lodgment 
Frequency 
Proportion 
Agent 
732 
0.732 
Self 
268 
0.268 
Total 
1000 
1 
The frequency table in dataset 1 indicates that among 1000 people 732 people (73.2%) make their lodgment by Agents. Only 268 people (26.8%) make their lodgment by selfpreparation.
The pie chart indicates the share of two types of lodgment methods that are agent and self in dataset 1.
One sample proportional Ztest 


proportion (p) = 
0.732 

(1p) = 
0.268 

total sample = 
1000 

standard error = 
0.014006 

confidence limit = 
95% 

zvalue at 0.05 critical region = 
1.96 

Confidence Intervals 

upper confidence interval = 
0.759452 

lower confidence interval = 
0.704548 
We apply one sample proportional ztest for testing the proportion of lodging method as agent. The actual proportion of lodging method as agent is 0.732. The calculated confidence intervals of proportion of lodging method as agent are 0.704548 and 0.759452 at 95% confidence limit. That means, there is 95% probability of being the proportion of lodging method via agent between these two intervals.
Lodgment 
Frequency 
Proportion 
Agent 
118 
0.694117647 
Self 
52 
0.305882353 
Total 
170 
1 
Dataset 2
The frequency table in dataset 1 indicates that among 170 people 118 people (69.4%) make their lodgment by Agents. Only 52 people (30.6%) make their lodgment by selfpreparation.
The pie chart indicates the share of two types of lodgment methods that are agent and self in dataset 2.
One sample proportional Ztest 

proportion (p) = 
0.694117647 

(1p) = 
0.305882353 

total sample = 
170 

standard error = 
0.035340224 

confidence limit = 
95% 

zvalue at 0.05 critical region = 
1.959963985 

Confidence Intervals 

upper confidence interval = 
0.763383213 

lower confidence interval = 
0.624852081 
We apply one sample proportional ztest for testing the proportion of lodging method as agent in dataset 2. The actual proportion of lodging method as agent is 0.694117647. The calculated confidence intervals of proportion of lodging method as agent are 0.624852081 and 0.763383213. That means there is 95% probability of being the proportion of lodging method via agent between these two intervals in dataset 2.
The first dataset has greater sample that surveyed dataset (1000>170). In the first dataset, among 1000 people 732 people (73.2%) make their lodging by agent. Whereas, in the second dataset, among 170 people 118 people (more than 69.4%) make their lodging by agent. The survey result gives lesser percentage of lodging. We apply two samples proportional ztest of equality of proportions.
Null hypothesis: 
The proportions are equal for both the datasets. 

Alternative hypothesis: 
The proportions are unequal for both the datasets. 

Two sample ztest 

dataset 1 
total sample 
1000 
lodging by agent 
732 

proportion (p1bar) 
0.732 

dataset 2 
total sample 
170 
lodging by agent 
118 

proportion (p2bar) 
0.694117647 

total sample 
1170 

total lodging by agent 
850 

total proprtion (pbar) 
0.726495726 

numerator of zstatistic 
(p1bar – p2bar) 
0.037882353 
pbar* (1pbar) 
0.198699686 

propotion*sample 
0.001367521 

denominator of zstatistic 
SQRT(proprtion*sample) 
0.036980013 
zstatistic 
1.024400745 

pvalue 
0.15386 

Decisionmaking 
Null hypothesis accepted 
We apply two samples ztest for testing the equality of means. The calculated zstatistic is 1.024400745. According to the calculated pvalue, we accept the null hypothesis at 95% confidence interval. Therefore, there is 95% probability of being the proportions of two datasets equal.
Correlation Coefficient

age_range 
Lodgment_method 
age_range 
1 

Lodgment_method 
0.108090763 
1 
The Pearson correlation coefficient is 0.108090763. Therefore, the correlation coefficient between age range and lodgment method is insignificant and ignorable. The amount of age range and lodgment method is uncorrelated to each other. For calculating the correlation coefficient, we leveled “agent” as 1 and “self” as 2 for making qualitative variable quantitative variable.
Lodging Method is Self:
Numerical Summary
age_range 

Mean 
6.395522388 
Standard Error 
0.21041402 
Median 
7 
Mode 
9 
Standard Deviation 
3.444625965 
Sample Variance 
11.86544804 
Kurtosis 
1.00656719 
Skewness 
0.490030757 
Range 
11 
Minimum 
0 
Maximum 
11 
Sum 
1714 
Count 
268 
Largest 
11 
Smallest 
0 
Confidence Level (95.0%) 
0.414281758 
upper control limit 
6.807933867 
lower control limit 
5.983110909 
The age range of “Self” lodging method has the 95% probability of being in the interval 6.807933867 and 5.983110909.
Table: The frequency distribution table of age group when lodging method is “Self”
age_group 
frequency 
cumulative frequency 
percentage of frequency 
cumulative percentage of frequency 
0 
24 
24 
8.96% 
8.96% 
1 
9 
33 
3.36% 
12.31% 
2 
16 
49 
5.97% 
18.28% 
3 
18 
67 
6.72% 
25.00% 
4 
13 
80 
4.85% 
29.85% 
5 
16 
96 
5.97% 
35.82% 
6 
26 
122 
9.70% 
45.52% 
7 
18 
140 
6.72% 
52.24% 
8 
26 
166 
9.70% 
61.94% 
9 
44 
210 
16.42% 
78.36% 
10 
37 
247 
13.81% 
92.16% 
11 
21 
268 
7.84% 
100.00% 
total 
268 
1 
The frequency is maximum for age group number “9” and minimum for age group number “1”.
The percentage of frequency is maximum for age group number “9” and minimum for age group number “1”.
Lodging Method is Agent:
Numerical Summary
age_range 

Mean 
5.640710383 
Standard Error 
0.108390312 
Median 
6 
Mode 
7 
Standard Deviation 
2.932553929 
Sample Variance 
8.599872545 
Kurtosis 
0.923068312 
Skewness 
0.150394186 
Range 
11 
Minimum 
0 
Maximum 
11 
Sum 
4129 
Count 
732 
Largest 
11 
Smallest 
0 
Confidence Level (95.0%) 
0.212793429 
upper control limit 
5.853155394 
limit control limit 
5.428265371 
The age range of “agent” lodging method has the 95% probability of being in the interval 5.853155394 and 5.428265371.
Table: The frequency distribution table of age group when lodging method is “Agent”
age_group 
frequency 
cumulative frequency 
percentage of frequency 
cumulative percentage of frequency 
0 
38 
38 
5.19% 
5.19% 
1 
31 
69 
4.23% 
9.43% 
2 
49 
118 
6.69% 
16.12% 
3 
77 
195 
10.52% 
26.64% 
4 
76 
271 
10.38% 
37.02% 
5 
74 
345 
10.11% 
47.13% 
6 
75 
420 
10.25% 
57.38% 
7 
88 
508 
12.02% 
69.40% 
8 
78 
586 
10.66% 
80.05% 
9 
71 
657 
9.70% 
89.75% 
10 
59 
716 
8.06% 
97.81% 
11 
16 
732 
2.19% 
100.00% 
total 
732 
100.000000% 
The frequency is maximum for age group number “7” and minimum for age group number “11”.
Lodgement MethodDataset 1
The percentage of frequency is maximum for age group number “7” and minimum for age group number “11”.
95% confidence interval of correlation coefficient 

Pearson Correlation Coefficient ( r ) 
0.108090763 
Z’ 
0.108514702 
Number of samples (N) 
1000 
Standard Error [1/SQRT(N3)] 
0.031670318 
Z(0.05,997) 
1.96 
Confidence intervals 

Lower limit of Z’ 
0.046440879 
Upper limit of Z’ 
0.170588525 
Lower limit of r 
0.046407521 
Upper limit of r 
0.168952828 
The leveled age group and lodging method has correlation coefficient (0.108090763). The two factors are uncorrelated at 95% confidence intervals also as the lower and upper limits of correlation coefficients are respectively (0.046407521) and (0.168952828).
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.108090763 

R Square 
0.011683613 

Adjusted R Square 
0.010693316 

Standard Error 
0.440763543 

Observations 
1000 

ANOVA 


df 
SS 
MS 
F 
Significance F 

Regression 
1 
2.292044456 
2.292044 
11.79809 
0.000617314 

Residual 
998 
193.8839555 
0.194273 

Total 
999 
196.176 


Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Intercept 
1.177557148 
0.029792573 
39.52519 
2.2E206 
1.119093878 
1.236020418 
age_range 
0.015478838 
0.004506429 
3.434835 
0.000617 
0.006635676 
0.024322001 
The value of multiple Rsquare is 0.011683613. Calculated Fstatistic is 11.79809. These two variables are not highly associated with each other. The pvalue 0.000617314 less than 0.05 interprets that these age range has insignificant relationship with lodgment method at 95% confidence interval.
The calculations of “parta” and “partb” indicate that the age range and Lodgment method of the 1000 people is uncorrelated. However, the histograms and frequency tables infer that agent more normally distributes the distribution of age group by selfpreparation lodgment method than the distribution of age group.

Lodgment_ method 
Tot_inc_amt 
Lodgment_method 
1 

Tot_inc_amt 
0.071848524 
1 
The Pearson correlation coefficient is (0.071848524). Therefore, the correlation coefficient between amount of total income and lodgment method is insignificant and ignorable. The amount of total income and lodgment method are uncorrelated. For calculating the correlation coefficient, we leveled “agent” as 1 and “self” as 2 for making qualitative variable quantitative variable.
Tot_inc_amt 

Mean 
66423.89 
Standard Error 
4119.671 
Median 
48222.5 
Mode 
0 
Standard Deviation 
111459.8 
Sample Variance 
1.24E+10 
Kurtosis 
114.1994 
Skewness 
9.494428 
Range 
1693122 
Minimum 
37 
Maximum 
1693085 
Sum 
48622288 
Count 
732 
Largest 
1693085 
Smallest 
37 
Confidence Level (95.0%) 
8087.799 
Upper confidence limit 
74498.45 
Lower confidence limit 
58349.33 
For the lodgment method “agent”, the average amount of total amount income is $66423.89. The 95% confidence limit for total amount of income is ranges between $74498.45 and $58349.33.
The line plot of Total Income Amount of income for the people whose lodgment method is agent
Tot_inc_amt 

Mean 
49783.85448 
Standard Error 
4402.27677 
Median 
35804 
Mode 
0 
Standard Deviation 
72068.37673 
Sample Variance 
5193850924 
Kurtosis 
83.48626709 
Skewness 
7.591961655 
Range 
924342 
Minimum 
0 
Maximum 
924342 
Sum 
13342073 
Count 
268 
Largest 
924342 
Smallest 
0 
Confidence Level (95.0%) 
8667.592386 
Upper confidence limit 
58412.31695 
Lower confidence limit 
41155.39201 
For the lodgment method “agent”, the average amount of total amount income is $49783.85448. The 95% confidence limit for total amount of income is ranges between $58412.31695 and $41155.39201.
The line plot of Total Income Amount of income for the people whose lodgment method is self
95% confidence interval of correlation coefficient 

Pearson Correlation Coefficient ( r ) 
0.071848524 
Z’ 
0.071972541 
Number of samples (N) 
1000 
Standard Error [1/SQRT(N3)] 
0.031670318 
Z(0.05,997) 
1.96 
Confidence intervals 

Lower limit of Z’ 
0.134046363 
Upper limit of Z’ 
0.009898718 
Lower limit of r 
0.133249225 
Upper limit of r 
0.009898394 
The Pearson correlation coefficient between total amount of income and lodgment methods is (0.071848524). Therefore, these two factors are uncorrelated.
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.071848524 

R Square 
0.00516221 

Adjusted R Square 
0.004165379 

Standard Error 
0.44221534 

Observations 
1000 

ANOVA 


df 
SS 
MS 
F 
Significance F 

Regression 
1 
1.012701781 
1.012702 
5.178619 
0.023077745 

Residual 
998 
195.1632982 
0.195554 

Total 
999 
196.176 


Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Intercept 
1.287223099 
0.016337405 
78.78994 
0 
1.255163494 
1.319282704 
Tot_inc_amt 
3.10228E07 
1.36325E07 
2.27566 
0.023078 
5.77744E07 
4.27124E08 
The value of multiple Rsquare (0.00516221) indicates that the association of these two factors is insignificant.
The box plot indicates the distribution of total amount of income. Its spread is very high. However, the quartiles are below $200000.
Tot_amt_inc via agent 

Minimum 
37 
Maximum 
1693085 
1st Quartile 
26618.25 
2nd Quartile (Median) 
48222.5 
3rd Quartile 
76060.75 
Bottom 
26618.25 
2q Box 
21604.25 
3q Box 
27838.25 
Whisker 
26655.25 
Whisker+ 
1617024.25 
IQR 
49442.5 
Upper bound 
150224.5 
Lower bound 
47545.5 
The “five point summary” indicates the distribution of amount of total income by the agents. The minimum, first quartile, second quartile, third quartile and maximum of the amount of income are $(37), $26618.25, $4822.5, $76060.75 and $1693085.
Tot_amt_inc by self preparation 

Minimum 
0 
Maximum 
924342 
1st Quartile 
15774 
2nd Quartile (Median) 
35804 
3rd Quartile 
65176.5 
Bottom 
15774 
2q Box 
20030 
3q Box 
29372.5 
Whisker 
15774 
Whisker+ 
859165.5 
IQR 
49402.5 
Upper bound 
139280.25 
Lower bound 
58329.75 
The “five point summary” indicates the distribution of amount of total income by selfpreparation. The minimum, first quartile, second quartile, third quartile and maximum of the amount of income are $0, $15774, $35804, $65176.5 and $924342. Interquartile range is higher for total amount of income by selfpreparation.
Tot_amt_inc 

Minimum 
37 
Maximum 
1693085 
1st Quartile 
23521.75 
2nd Quartile (Median) 
45840 
3rd Quartile 
74404 
Bottom 
23521.75 
2q Box 
22318.25 
3q Box 
28564 
Whisker 
23558.75 
Whisker+ 
1618681 
IQR 
50882.25 
Upper bound 
150727.375 
Lower bound 
52801.625 
Lodgement MethodDataset2
The “five point summary” indicates the distribution of amount of total income. The minimum, first quartile, second quartile, third quartile and maximum of the amount of income are $(37), $23521.75, $45840, $74404 and $1693085.
The box plot of distribution of total amount of income indicates that spread is very high from small negative value to high positive value. However, three quartiles, minimum and maximum values of the distribution lie in the interval $0 to $200000.
The grouped box plot of total amount of income refers that the range and spread is higher for total amount of income via agent than total amount of income via selfpreparation. The quartiles (first, second and third) for total amount of income via agent are greater than total amount of income by selfpreparation. The maximum amount of income is significantly greater in case of agent than selfpreparation.
The calculated number of Outliers for total income by both the lodging method is 49, the outliers for total income by lodging method via agent is 39 and the outliers for total income by lodging method via self preparation is 12. The percentages of outliers in all the three cases are 4.9%, 5.32787% and 4.4776%. The number of outliers for total income by lodging method via agent is maximum in percentage.
For both the self and agent lodging method:

Tot_inc_amt 
Tot_ded_amt 
Tot_inc_amt 
1 

Tot_ded_amt 
0.433918351 
1 
Linear Regression between amount of total income and amount of total deduction for the people whose lodging method is agent:
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.433918351 

R Square 
0.188285136 

Adjusted R Square 
0.187471794 

Standard Error 
7680.455019 

Observations 
1000 

ANOVA 


df 
SS 
MS 
F 
Significance F 

Regression 
1 
13655794661 
1.37E+10 
231.4958 
3.59212E47 

Residual 
998 
58871410515 
58989389 

Total 
999 
72527205176 


Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Intercept 
415.2418564 
283.7502247 
1.463406 
0.143671 
141.5736382 
972.057351 
Tot_inc_amt 
0.036024597 
0.002367705 
15.21499 
3.59E47 
0.031378346 
0.040670848 
The residual scatter plot of total income amount vs. total deduction amount for the people whose lodgment method is agent
Correlation Coefficient 


Tot_inc_amt 
Tot_ded_amt 
Tot_inc_amt 
1 

Tot_ded_amt 
0.428452411 
1 
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.428452411 

R Square 
0.183571469 

Adjusted R Square 
0.180502188 

Standard Error 
9173.929557 

Observations 
268 

ANOVA 


df 
SS 
MS 
F 
Significance F 

Regression 
1 
5033608656 
5033608656 
59.80929 
2.16458E13 

Residual 
266 
22386821614 
84160983.51 

Total 
267 
27420430271 


Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Intercept 
1240.313957 
681.5035713 
1.819966922 
0.069888 
2582.141512 
101.5135984 
Tot_inc_amt 
0.060247545 
0.007790315 
7.733646607 
2.16E13 
0.04490902 
0.07558607 
The residual scatter plot of total income amount vs. total deduction amount for the people whose lodgment method is self
Correlation Coefficient 


Tot_inc_amt 
Tot_ded_amt 
Tot_inc_amt 
1 

Tot_ded_amt 
0.457006824 
1 
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.457006824 

R Square 
0.208855238 

Adjusted R Square 
0.207771478 

Standard Error 
6969.341502 

Observations 
732 

ANOVA 


df 
SS 
MS 
F 
Significance F 

Regression 
1 
9360429247 
9.36E+09 
192.7136 
4.6926E39 

Residual 
730 
35457356312 
48571721 

Total 
731 
44817785560 


Coefficients 
Standard Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
Intercept 
840.2284205 
299.9216382 
2.801493 
0.005221 
251.416582 
1429.040259 
Tot_inc_amt 
0.032104882 
0.002312677 
13.88213 
4.69E39 
0.02756459 
0.036645172The residual scatter plot of total income amount vs. total deduction amount for both kind of lodgment method 
 All people: The Pearson correlation coefficient ( r ) of total income amount and total deduction amount of all the 1000 samples is 0.433918351. Therefore, we can tell that there exists a moderate positive correlation between these two factors.
 Selflodgmaent method: The Pearson correlation coefficient between total income amount and total deduction amount for those people whose lodgment is self, is calculated as 0.428452411. We can say that there exists a moderate positive correlation between these factors for the particular lodgment.
 Agent lodgment method: The Pearson correlation coefficient between total income amount and total deduction amount for those people whose lodgment is agent, is calculated as 457006824. We can say that there exists a moderate positive correlation between these factors for the particular lodgment. The correlation is highest for lodgment method via agent.
 People whose lodgment method is selfpreparation: The regression model considers the total deduction amount as response or dependent variable and amount of total income as single predictor or independent variable.
The linear regression equation of the model is Y = β_{0 }+ β_{1} * X.
We can write, Tot_ded_amount = 415.2418564 – 0.036024597* Tot_inc_amt.
The value of multiple Rsquare (coefficient of determination) is 0.188285136. Therefore, only 18.82% variability of house price is described by total deduction amount (Faraway 2016). The linear association of these two factors is not significant. The value of Fstatistic is 231.4958. The pvalue of the model is 3.59212E47. It is less than 0.05. Therefore, we reject the null hypothesis at 95% confidence interval. According to the slope of predictor (β_{1} = 0.036024597), we can state that amount of total income is positively associated with total deduction amount. Therefore, we can conclude that amount of total income is greater than amount of total deduction. The graph showed that the fitting is not also good.
 People whose lodgment method is agent: The regression model considers the total deduction amount as response or dependent variable and amount of total income as single predictor or independent variable.
Two Sample Proportional ZTest
The linear regression equation of the model is Y = β_{0 }+ β_{1} * X.
We can write, Tot_ded_amount = 1240.313957 0.060247545* Tot_inc_amt.
The value of multiple Rsquare (coefficient of determination) is 0.183571469. Therefore, only 18.35% variability of house price is described by total deduction amount (Faraway 2016). The linear association of these two factors is not significant. The value of Fstatistic is 59.80929. The pvalue of the model is 2.16458E13. It is less than 0.05. Therefore, we reject the null hypothesis at 95% confidence interval. According to the slope of predictor (β_{1} = 0.060247545), we can state that amount of total income is positively associated with total deduction amount. Therefore, we can conclude that amount of total income is greater than amount of total deduction. The graph showed that the fitting is not also good.
 All people: The regression model considers the total deduction amount as response or dependent variable and amount of total income as single predictor or independent variable.
The linear regression equation of the model is Y = β_{0 }+ β_{1} * X.
We can write, Tot_ded_amount = 840.2284205 – 0.032104882* Tot_inc_amt.
The value of multiple Rsquare (coefficient of determination) is 0.208855238. Therefore, only 20.88% variability of house price is described by total deduction amount (Faraway 2016). The linear association of these two factors is insignificant. The value of Fstatistic is 192.7136. The pvalue of the model is 4.6926E39. It is less than 0.05. Therefore, we reject the null hypothesis at 95% confidence interval. According to the slope of predictor (β_{1} = 0.032104882), we can state that amount of total income is positively associated with total deduction amount. Therefore, we can conclude that amount of total income is greater than amount of total deduction. The graph showed that the fitting is not also good.
Conclusion
The conclusion that we can draw from the previous results that lodging method is not at all reflected by amount of total income and age range. Total income amount is not also significantly associated and related with total deduction amount. The people whose lodgment is done by agents are more in count. They have higher maximum price and quartile prices. The spread of total income amount is high in case of agents than selfpreparation lodgment.
We can add many other factors including qualitative and quantitative variables. The regression models might involve more predictors for the response. The model prediction would be more apprehensible. More samples could give the accurate result. The logistic model may provide further explanation about the dataset. The behavior of lodgment method can be described by gender distribution and total amount of income.
References:
Bedeian, Arthur G. ““More than meets the eye”: A guide to interpreting the descriptive statistics and correlation matrices reported in management research.” Academy of Management Learning & Education 13.1 (2014): 121135.
Chen, Zhongxue, and Saralees Nadarajah. “On the optimally weighted ztest for combining probabilities from independent studies.” Computational Statistics & Data Analysis 70 (2014): 387394.
Cleophas, Ton J., and Aeilko H. Zwinderman. “Data Spread: Standard Deviations, One Sample ZTest, One Sample Binomial Test.” Clinical Data Analysis on a Pocket Calculator. Springer International Publishing, 2016. 201205.
Faraway, Julian J. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Vol. 124. CRC press, 2016.
Fayers, Peter M., and David Machin. Quality of life: the assessment, analysis and interpretation of patientreported outcomes. John Wiley & Sons, 2013.
Puth, MarieTherese, Markus Neuhäuser, and Graeme D. Ruxton. “Effective use of Pearson’s product–moment correlation coefficient.” Animal Behaviour 93 (2014): 183189.
Spitzer, Michaela, et al. “BoxPlotR: a web tool for generation of box plots.” Nature methods 11.2 (2014): 121122.
Tyner, Bryan C., and Daniel M. Fienup. “A comparison of video modeling, text?based instruction, and no instruction for creating multiple baseline graphs in Microsoft Excel.” Journal of applied behavior analysis 48.3 (2015): 701706.