问题
I currently am trying to graph 2 columns in a data frame I created using ggplot
I am graphing date vs. numeric value. I used dplyr library to create the dataframe:
is_china <- confirmed_cases_worldwide %>%
filter(country == "China", type=='confirmed') %>%
mutate(cumu_cases = cumsum(cases))
I believe the reason is due to the y value being a result column of cumsum function, but am unsure
The table looks something like this, the last column being the targeted y value:
2020-01-22 NA China 31.8257 117.2264 confirmed 1 1
2 2020-01-23 NA China 31.8257 117.2264 confirmed 8 9
3 2020-01-24 NA China 31.8257 117.2264 confirmed 6 15
4 2020-01-25 NA China 31.8257 117.2264 confirmed 24 39
5 2020-01-26 NA China 31.8257 117.2264 confirmed 21 60
6 2020-01-27 NA China 31.8257 117.2264 confirmed 10 70
7 2020-01-28 NA China 31.8257 117.2264 confirmed 36 106
8 2020-01-29 NA China 31.8257 117.2264 confirmed 46 152
When I graph this with the column cases(second to last on the table), it is fine, but when I try graphing with the cumulative cases, the graph is very volitle:
I am unsure why.
回答1:
Here is one approach:
library(ggplot2)
ggplot(is_china,aes(x = as.Date(date),y = cumu_cases)) +
geom_line()
回答2:
You're attempting to group by country, but there is just one country.
library(dplyr)
is_china <- confirmed_cases_worldwide %>%
filter(country == "China", type=='confirmed') %>%
mutate(date = as.Date(date))
unique(is_china$country)
# [1] "China"
However, the lat
and long
variables with 33 distinctions indicate that we have panel data. So without considering the panel structure, you get strange values with cumsum
; besides, the variable is already there and we don't need to calculate it again. Altogether this explains the strange lines you're getting.
Since the province
variable is empty, we could use lat
and long
to generate a new gps
variable for grouping.
unique(is_china$lat)
# [1] 31.8257 40.1824 30.0572 26.0789 ... [33] 29.1832
unique(is_china$long)
# [1] 117.2264 116.4142 107.8740 117.9874 ... [33] 120.0934
is_china$gps <- apply(is_china[4:5], 1, function(x) Reduce(paste, x))
Now we can plot the data using gps
as a factor
.
library(ggplot2)
ggplot(is_china, aes(x=date, y=cumu_cases, color=factor(gps))) +
geom_line()
To select only specific coordinates you may subset your data, e.g.:
ggplot(is_china[is_china$gps %in% c("30.9756 112.2707", "22.3 114.2"), ],
aes(x=date, y=cumu_cases, color=factor(gps))) +
geom_line()
Data:
confirmed_cases_worldwide <-
read.csv("https://raw.githubusercontent.com/king-sules/Covid/master/china_vs_world.csv")
来源:https://stackoverflow.com/questions/61979426/ggplot-not-properly-displaying