1. Open the excel attached.
2. Sample 50 rows following the instruction in the video (https://toptipbio.com/random-sampling-excel/) start at the 3 minute mark – using Method 2). You are to sample across all columns and then save the Excel Workbook as your own unique, personal dataset (name.xlsx)
3. Document the context. Before you start exploring the data, you should seek to understand the data at a high level using the 5W’s and 1H. Identify how many cases/individuals (rows) and how many features/variables (columns) are in your dataset.
Explore
Check for missing data. Before we do anything, we try to understand why, if any, variables have missing data points. For this activity, simply make note of the missing values. Missing data is a feature in its own right and should be treated as such.
Classify all the variables by type:Quantitative (Continuous or Discrete)
Categorical (Qualitative)
Select one (1) categorical and two (2) quantitative variables.Visualize and describe the distribution of each variable using appropriate methods in Excel. For each variable, separately provide:one visual (e.g. graph) – remember the principles of good graphing;
appropriate descriptive statistics;
a one-sentence description of the distribution of the variable. (See examples attached.) Spot and describe outliers in the dataset. Do not remove!
Analyze the relationship between the two quantitative variables using the regression options in Excel. Identify which variable is the dependent variable (response) and which variable is the independent variable (explanatory). Provide:
• scatterplot – remember the principles of good graphing;
• appropriate statistics including the line of best fit (regression equation);
• description of the relationship between the two variables that includes the interpretation of correlation, slope and intercept, and the coefficient of determination; and finally a
• recommendation as to whether or not your model should be used for predictions.
Write a two-page Data Report based on your exploration using the following sections:Introduction that briefly states the purpose and contents of the report.
Data section describing the dataset. Methods section describing the choice of variables, visuals created, summary statistics calculated.
Analysis section that contains the required analyses from 5(a-c).
Conclusion giving the most relevant observations and interesting findings.