For each of the problems below (except in cases where you are asked to discuss your interpretaion) write R code blocks that will compute appropriate solutions. A good rule of thumb for judging whether your solution is appropriately “computable” is to ask yourself “If I added or changed observations to this data set, would my code still compute the right solution?”
When completed, submit your R Markdown document (the file with the extension .Rmd) and the knit HTML via Sakai.
A study by Whitman et al. (2004) showed that the amount of black coloring on the nose of male lions increases with age, and suggested that this might be used to estimate the age of unknown lions. To establish the relationship between these variables they measured the black coloring on the noses of male lions of known age (represented as a proportion).
Data from this study is available in CSV format at: https://github.com/bio304-class/bio304-course-notes/raw/master/datasets/ABD-lion-noses.csv.
The variables in this data file are proportionBlack and ageInYears giving the proportion of black pigmentation on the nose of each lione used in the study and the corresponding age of each lion.
Generate a bivariate plot (with marginal histograms) showing the relationship between nose pigmentation (the predictor variable) and age (the dependent variable) [2 pts]
Using vector operations calculate the regression coefficients for the regression of age on nose pigementation. [2 pts]
Modify your bivariate plot for problem 1a to illustrate the estimated regression. [2 pts]
Based on the regression of age on proportion of black pigmentation, if you observed a male lion with no black pigmentation on its nose, how old would you predict it to be? [1 pt]
What is the predicted increase in age for a 10% increase in black pigmentation? [1 pt]
What is the coefficient of determination for your fit model? [1 pts]
Generate a residual plot to examine the residuals from the estimated regression. Are the residuals approximately normal? Are the evenly distributed around zero? Are the any signs of trends in the residuals? [2 pts]
[4 pts] Write your own function bivariate.regression(X, Y) that takes as input two vectors, X and Y, and which calculates the regression of Y on X using vector operations. Your function should return a list object that includes two elements
a list that includes the slope and intercept of the fitted model
a data frame that includes the X and Y values, the fitted (predicted) values of Y, and the residual values.
Name the returned object reg.model.
[4 pts] Write a function regression.plots(reg.model) that takes as input the output object for your function in problem 1, and which returns one nicely formatted plot with two subfigures:
a figure illustrating the bivariate distribution of X and Y and a line representing the regression model. Include in the title of this subplot the values of coefficients of the regression model
a figure illustrating the residuals from the regression model as a function of X (i.e. a residual plot)
The HistData package is required to have access to the data for this assignment. Install HistData via one of the standard mechanisms.
The GaltonFamilies data set in the HistData package lists observations for heights of parents and their adult offspring for 934 children in 205 families, a famous data set collected by Francis Galton (Galton, F. (1886). Regression Towards Mediocrity in Hereditary Stature. Journal of the Anthropological Institute, 15, 246-263).
Load the HistData package and examine the GaltonFamilies data set. Using cowplot create a single figure with three sub-figures: A) the distribution of height for all offspring; B) overlapping density plots giving the height distributions for male and female offspring separately; C) boxplots of height for male and female children separately [3 pts]
Create two 3D scatter plots, one each for male and female offspring, showing the relationship between offspring height and mother and fathers height. [2 pts]
For male offspring, using the lm() function, fit a multiple regression of offspring height on father and mother’s height. Write out the predicted model with the corresponding coefficients in the form \(\widehat{O}_{\mbox{male}} = a + b_1 F + b_2 M\) where \(O\), \(F\) and \(M\) are offspring, father’s, and mother’s height. What fraction of the variation in offspring height does the model capture? [2 pts]
For female offspring, using the lm() function, fit a multiple regression of offspring height on father and mother’s height. Write out the predicted model with the corresponding coefficients in the form \(\widehat{O}_{\mbox{female}} = a + b_1 F + b_2 M\) where \(O\), \(F\) and \(M\) are offspring, father’s, and mother’s height. What fraction of the variation in offspring height does the model capture? [2 pts]
What is the predicted height of a male child, if the mother was 5 ft tall and the father was 6 ft tall? What is the predicted height of a female child from the same parents? [2 pt]