ggplot not properly displaying

て烟熏妆下的殇ゞ 提交于 2021-02-11 15:02:33

问题


I currently am trying to graph 2 columns in a data frame I created using ggplot

I am graphing date vs. numeric value. I used dplyr library to create the dataframe:

is_china <- confirmed_cases_worldwide %>%
  filter(country == "China", type=='confirmed') %>%
  mutate(cumu_cases = cumsum(cases))

I believe the reason is due to the y value being a result column of cumsum function, but am unsure

The table looks something like this, the last column being the targeted y value:


    2020-01-22  NA  China   31.8257 117.2264    confirmed   1   1
2   2020-01-23  NA  China   31.8257 117.2264    confirmed   8   9
3   2020-01-24  NA  China   31.8257 117.2264    confirmed   6   15
4   2020-01-25  NA  China   31.8257 117.2264    confirmed   24  39
5   2020-01-26  NA  China   31.8257 117.2264    confirmed   21  60
6   2020-01-27  NA  China   31.8257 117.2264    confirmed   10  70
7   2020-01-28  NA  China   31.8257 117.2264    confirmed   36  106
8   2020-01-29  NA  China   31.8257 117.2264    confirmed   46  152

When I graph this with the column cases(second to last on the table), it is fine, but when I try graphing with the cumulative cases, the graph is very volitle:

I am unsure why.


回答1:


Here is one approach:

library(ggplot2)
ggplot(is_china,aes(x = as.Date(date),y = cumu_cases)) +
   geom_line()




回答2:


You're attempting to group by country, but there is just one country.

library(dplyr)
is_china <- confirmed_cases_worldwide %>%
  filter(country == "China", type=='confirmed') %>%
  mutate(date = as.Date(date))

unique(is_china$country)
# [1] "China"

However, the lat and long variables with 33 distinctions indicate that we have panel data. So without considering the panel structure, you get strange values with cumsum; besides, the variable is already there and we don't need to calculate it again. Altogether this explains the strange lines you're getting.

Since the province variable is empty, we could use lat and long to generate a new gps variable for grouping.

unique(is_china$lat)
# [1] 31.8257 40.1824 30.0572 26.0789 ...  [33] 29.1832
unique(is_china$long)
# [1] 117.2264 116.4142 107.8740 117.9874 ... [33] 120.0934

is_china$gps <- apply(is_china[4:5], 1, function(x) Reduce(paste, x))

Now we can plot the data using gps as a factor.

library(ggplot2)
ggplot(is_china, aes(x=date, y=cumu_cases, color=factor(gps))) +
  geom_line()

To select only specific coordinates you may subset your data, e.g.:

ggplot(is_china[is_china$gps %in% c("30.9756 112.2707", "22.3 114.2"), ],
       aes(x=date, y=cumu_cases, color=factor(gps))) +
  geom_line()


Data:

confirmed_cases_worldwide <- 
  read.csv("https://raw.githubusercontent.com/king-sules/Covid/master/china_vs_world.csv")


来源:https://stackoverflow.com/questions/61979426/ggplot-not-properly-displaying

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!