Wordcloud showing colour based on continous metadata in R

问题

I'm creating a wordcloud in which the size of the words is based on frequency, but i want the colour of the words to be mapped to a third variable (stress, which is the amount of stress associated with each word, a numerical or continuous variable).

I tried the following, which gave me only two different colours (yellow and purple) while i want something more smooth. I would like some color range like a palette that goes from green to red for example.

df = data.frame(word = c("calling", "meeting", "conference", "contract", "negotiation", "email"),
n = c(20, 12, 4, 8, 10, 43),
stress = c(23, 30, 15, 40, 35, 15))
df = tbl_df(df) 
wordcloud(words = df$word, freq = df$n, col = df$stress)

Does anyone know how to deal with this continous metadata and get some smoothly changing colour for the words when stress goes up? Thanks!

回答1:

Here is a potential solution. You want to use the wordcloud2 package for your task. Then, you can solve your issue, I suppose. Since I do not know your real data, I created a sample data to demonstrate a prototype.

If you have many words, I am not sure if adding colors with a continuous variable (stress) is a good idea. One thing you could do is to create a new group variable using cut(). In this way, you can reduce the numbers of colors you would use in your graphics. Here, I created a new column called color with five colors from the viridis package.

When you use wordcloud2(), you have only two things to supply. One is data and the other is color. Font size reflects frequency of the words without specifying it.

mydf = data.frame(word = c("calling", "meeting", "conference", "contract", "negotiation",
                           "email", "friends", "chat", "text", "deal",
                           "business", "promotion", "discount", "users", "family"),
                  n = c(20, 12, 4, 8, 10, 43, 33, 5, 47, 28, 12, 9, 50, 31, 22),
                  stress = c(23, 30, 15, 40, 35, 15, 30, 18, 10, 5, 29, 38, 45, 8, 3))


          word  n stress
1      calling 20     23
2      meeting 12     30
3   conference  4     15
4     contract  8     40
5  negotiation 10     35
6        email 43     15
7      friends 33     30
8         chat  5     18
9         text 47     10
10        deal 28      5
11    business 12     29
12   promotion  9     38
13    discount 50     45
14       users 31      8
15      family 22      3

library(dplyr)
library(wordcloud2)
library(viridis)

mutate(mydf, color = cut(stress, breaks = c(0, 10, 20, 30, 40, Inf),
             labels = c("#FDE725FF", "#73D055FF", "#1F968BFF",
                        "#2D708EFF", "#481567FF"),
             include.lowest = TRUE)) -> temp

wordcloud2(data = temp, color = temp$color)

回答2:

Or something a bit more automatic instead of specifying the exact threshold values and colors:

library(RColorBrewer)
library(wordcloud2)

mydf = data.frame(word = c("calling", "meeting", "conference", "contract", "negotiation",
                       "email", "friends", "chat", "text", "deal",
                       "business", "promotion", "discount", "users", "family"),
              n = c(20, 12, 4, 8, 10, 43, 33, 5, 47, 28, 12, 9, 50, 31, 22),
              stress = c(23, 30, 15, 40, 35, 15, 30, 18, 10, 5, 29, 38, 45, 8, 3))

color_range_number <- length(unique(mydf$stress))
color <- colorRampPalette(brewer.pal(9,"Blues")[3:7])(color_range_number)[factor(mydf$stress)]

wordcloud2(mydf, color=color)

So that the size is determined by 'n', and the shade of color determined by 'stress'.

[3:7] is for adjusting the color scale range. 1 is the lightest and 9 is the darkest.

You may check the other color palette options by: