We’ll use the ggpairs() function from the GGally package to create a plot matrix to see how the variables relate to one another. Let’s do some exploratory data visualization. To decide whether we can make a predictive model, the first step is to see if there appears to be a relationship between our predictor and response variables (in this case girth, height, and volume). It would be useful to be able to accurately predict tree volume from height and/or girth. If you don’t want to actually cut down and dismantle the tree, you have to resort to some technically challenging and time-consuming activities like climbing the tree and making precise measurements. It’s fairly simple to measure tree heigh and girth using basic forestry tools, but measuring tree volume is a lot harder. These metrics are useful information for foresters and scientists who study the ecology of trees. This data set consists of 31 observations of 3 numeric variables describing black cherry trees: What does this data set look like? data (trees ) # access the data from R’s datasets package ![]() Since we’re working with an existing (clean) data set, steps 1 and 2 above are already done, so we can skip right to some preliminary exploratory analysis in step 3. ![]() The trees data set is included in base R’s datasets package, and it’s going to help us answer this question. scatterplot3d We’ll use this package for visualizing more complex linear regression models with multiple predictors.We’ll be using it to create a plot matrix as part of our initial exploratory data visualization. GGally This package extends the functionality of ggplot2.ggplot2 We’ll use this popular data visualization package to build plots of our models.We’ll be using one of them, “trees”, to learn about building linear regression models. data sets This package contains a wide variety of practice data sets.If you want to practice building the models and visualizations yourself, we’ll be using the following R packages: It will also help to have some very basic statistics knowledge, but if you know what a mean and standard deviation are, you’ll be able to follow along. If you’re new to learning the R language, we recommend our R Fundamentals and R Programming: Intermediate courses from our R Data Analyst path. We’ll use R in this blog post to explore this data set and learn the basics of linear regression. In this post, we’ll use linear regression to build a model that predicts cherry tree volume from metrics that are much easier for folks who study trees to measure. Linear regression is one of the simplest and most common supervised machine learning algorithms that data scientists use for predictive modeling. ![]() ![]()
0 Comments
Leave a Reply. |