I am following Chapter 1 of Wickham and Grolemund\'s \"R for data science\" on visualization.
I have tried:
ggplot(data = mpg) + geom_point(mapping
I remember how completely confused I was by this when I started using ggplot.
To build on @Mauicio Calvao's answer, use color
inside the aes
to break up the colours in the plot by a variable of data.frame you are plotting eg:
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv))
So when color
(or size
or linetype
or similar things) is inside the aes
it's really asking by what object\variable should the colour groups be determined. If this is a string (eg "blue"
) then they are all given the one group, but the name of that group isn't related to the actual colour.
To assign colours once grouped by color
inside the aes
you use scale_color_manual
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv))+
scale_colour_manual(values = c("black","blue","orange"))
This issue and more specifically the difference in the output from the two mentioned commands are explicitly dealt with in Section 5.4.2 of the 2nd edition of "ggplot2. Elegant graphics for data analysis", by Hadley Wickham himself:
Either:
aes
) a variable of your data to an aesthetic, e.g., aes(..., color = VarX)
, or ...aes
, but inside a geom
element) an aesthetic to a constant value e.g. "blue"In the first case, of mapping an aesthetic, such as color
, ggplot2 chooses a color based on a kind of uniform average of all available colors (at the colorwheel), because the values of the mapped variable are all constant; why should the chosen color coincide with the constant value you happend to choose to map from? More explicitly, if you try the command:
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y =hwy, color = "foo"))
you get exactly the same output plot as in the first command of the original question.