问题
I have a dataframe ("data") with 7 columns (2 Factor, 5 num). The first column is containing the names of 7 different countries and in the following columns I have collected data for different parameters (like population, GDP etc.) characterizing each country. In the last column a factor variable assigns which continent the respective country belongs to.
The data looks like this:
structure(list(Country = structure(c(5L, 4L, 7L, 2L, 1L, 6L,
3L), .Label = c("Brazil", "Chile", "China", "France", "Germany",
"India", "Netherlands"), class = "factor"), GDP = c(0.46, 0.57,
0.75, 0.56, 0.28, 0.88, 1), Population = c(0.18, 0.09, 0.54,
0.01, 0.02, 0.17, 0.84), Birth.rate = c(87.21, 18.34, 63.91,
14.21, 5.38, 51.19, 209.26), Income = c(43.89, 18.23, 63.91,
12.3, 0.1, 14.61, 160.82), Savings = c(43.32, 0.11, 0, 1.91,
5.29, 36.58, 50.38), Continent = structure(c(2L, 2L, 2L, 3L,
3L, 1L, 1L), .Label = c("Asia", "Europe", "South America"), class = "factor")), .Names = c("Country",
"GDP", "Population", "Birth.rate", "Income", "Savings", "Continent"
), class = "data.frame", row.names = c(NA, -7L))
I need some sort of loop function which plots (e.g. scatter plot) every single column against each other so that in the end every column (except the first and the last, i.e. the two factor variables) has been plotted against all other columns but each in a single plot chart (not all plots in one). Preferably all these plots are being saved to some folder on my local machine.
Also it would be great if the x and y axis are already labeled according to the respective two columns that are plotted against each other. Moreover it would be convenient to have a label next to each point in the plot displaying the respective country name. Lastly it would be nice to have three different colors for the points of the countries according to the three different continents.
So far I only have a piece of code that goes like
for (i in seq(1,length(data),1)) {
plot(data[,i], ylab=names(data[i]), xlab="Country",
text(i, labels=Country, pos=4, cex =.5))
}
As you can see it only plots each column against the first column ("Country") which is not what I want in the end.
Do you have any idea how I could achieve this? Thank you!
回答1:
You can use pairs()
directly from R. Note that dt
represents your dataset.
pairs(dt)
dt <- structure(list(Country = structure(c(5L, 4L, 7L, 2L, 1L, 6L,
3L), .Label = c("Brazil", "Chile", "China", "France", "Germany",
"India", "Netherlands"), class = "factor"), GDP = c(0.46, 0.57,
0.75, 0.56, 0.28, 0.88, 1), Population = c(0.18, 0.09, 0.54,
0.01, 0.02, 0.17, 0.84), Birth.rate = c(87.21, 18.34, 63.91,
14.21, 5.38, 51.19, 209.26), Income = c(43.89, 18.23, 63.91,
12.3, 0.1, 14.61, 160.82), Savings = c(43.32, 0.11, 0, 1.91,
5.29, 36.58, 50.38), Continent = structure(c(2L, 2L, 2L, 3L,
3L, 1L, 1L), .Label = c("Asia", "Europe", "South America"), class = "factor")), .Names = c("Country",
"GDP", "Population", "Birth.rate", "Income", "Savings", "Continent"
), class = "data.frame", row.names = c(NA, -7L))
回答2:
I've alway thought that splom
function in package 'lattice' was quite useful for this sort of exploratory analysis. This is obviously not a great example since it obscures the group memberships but it shows the combinations of points and a non-parametric regression line in the "pairs" format:
png()
print( splom(~iris[1:4], groups = Species, data = iris,
panel = function(x, y, i, j, ...) {
panel.points(x,y, ...)
panel.loess(x,y, ...)
})); dev.off()
来源:https://stackoverflow.com/questions/36582772/r-plotting-each-column-against-each-column