问题
I'm trying to make a connection to other columns within a function given a conditional.
Essentially, I want to make a dataframe go from long to wide given a conditional, where those values in one column are NA
relative to another column that has values in that same row, turn the NAs
into a specific numeric.
Although the values assigned have to be column specific. So if 2010
has NAs
whilst 2019
has a value, then return 16
otherwise, if 2019
has NAs when in that same row, 2010
has values, return 16
.
what I have tried:
# A tibble: 26 x 4
year locality_id landcover pland
<chr> <chr> <int> <dbl>
1 2010 L2228604 10 0.0645
2 2010 L2228604 13 0.935
3 2010 L452817 8 0.0968
4 2010 L452817 9 0.0323
5 2010 L452817 12 0.613
6 2010 L452817 13 0.194
7 2010 L452817 14 0.0645
8 2010 L596267 0 0.194
9 2010 L596267 9 0.0323
10 2010 L596267 11 0.0645
# ... with 16 more rows
p <- function(x){
if(x == 'NA') {
return(16)
} else {
17
}
}
#wrong outcome
# A tibble: 26 x 4
locality_id pland `2010` `2019`
<chr> <dbl> <dbl> <dbl>
1 L2228604 0.0645 17 NA
2 L2228604 0.935 17 NA
3 L452817 0.0968 17 NA
4 L452817 0.0323 17 NA
5 L452817 0.613 17 NA
6 L452817 0.194 17 NA
7 L452817 0.0645 17 NA
8 L596267 0.194 17 NA
9 L596267 0.0323 17 NA
10 L596267 0.0645 17 NA
# ... with 16 more rows
What I am expecting:
# A tibble: 26 x 4
locality_id pland `2010` `2019`
<chr> <dbl> <int> <int>
1 L2228604 0.0645 10 17
2 L2228604 0.935 13 17
3 L452817 0.0968 8 17
4 L452817 0.0323 9 17
5 L9185766 0.54838710 16 8
6 L9185766 0.19354839 16 9
7 L9185766 0.03225806 16 13
8 L9185766 0.16129032 16 14
9 L9234578 1.00000000 16 12
reproducible code:
structure(list(year = c("2010", "2010", "2010", "2010", "2010",
"2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010",
"2010", "2010", "2019", "2019", "2019", "2019", "2019", "2019",
"2019", "2019", "2019", "2019", "2019"), locality_id = c("L2228604",
"L2228604", "L452817", "L452817", "L452817", "L452817", "L452817",
"L596267", "L596267", "L596267", "L596267", "L152650", "L910180",
"L910180", "L910180", "L4791597", "L4791597", "L9149985", "L9149985",
"L9149985", "L9185766", "L9185766", "L9185766", "L9185766", "L9185766",
"L9234578"), landcover = c(10L, 13L, 8L, 9L, 12L, 13L, 14L, 0L,
9L, 11L, 13L, 13L, 0L, 8L, 9L, 5L, 8L, 10L, 11L, 12L, 4L, 8L,
9L, 13L, 14L, 12L), pland = c(0.0645161290322581, 0.935483870967742,
0.0967741935483871, 0.032258064516129, 0.612903225806452, 0.193548387096774,
0.0645161290322581, 0.193548387096774, 0.032258064516129, 0.0645161290322581,
0.709677419354839, 1, 0.4375, 0.34375, 0.03125, 0.566666666666667,
0.0333333333333333, 0.1, 0.0333333333333333, 0.866666666666667,
0.0645161290322581, 0.548387096774194, 0.193548387096774, 0.032258064516129,
0.161290322580645, 1)), row.names = c(NA, -26L), class = c("tbl_df",
"tbl", "data.frame"))
回答1:
You can get the data in wide format and based on column name replace the NA
value.
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = year, values_from = landcover) %>%
mutate(across(`2010`:`2019`, ~if(cur_column() == '2019')
replace_na(.x, 17) else replace_na(.x, 16)))
# locality_id pland `2010` `2019`
# <chr> <dbl> <dbl> <dbl>
# 1 L2228604 0.0645 10 17
# 2 L2228604 0.935 13 17
# 3 L452817 0.0968 8 17
# 4 L452817 0.0323 9 17
# 5 L452817 0.613 12 17
# 6 L452817 0.194 13 17
# 7 L452817 0.0645 14 17
# 8 L596267 0.194 0 17
# 9 L596267 0.0323 9 17
#10 L596267 0.0645 11 17
# … with 16 more rows
回答2:
There is a package called libr with a datastep()
function that can do this easily. The datastep()
function allows you to step through the data row by row, and make decisions based on the values of the columns. You can also create new columns on the fly.
Here is some code for your specific problem. I put your sample data in a tibble called dat
:
library(libr)
final <- datastep(dat,
keep = c("locality_id", "pland", "2010", "2019"),
{
if (year == "2010") {
"2010" <- landcover
"2019" <- 17
} else if (year == "2019") {
"2010" <- 16
"2019" <- landcover
}
})
Above tibble final
contains the following results:
# locality_id pland 2010 2019
# 1 L2228604 0.06451613 10 17
# 2 L2228604 0.93548387 13 17
# 3 L452817 0.09677419 8 17
# 4 L452817 0.03225806 9 17
# 5 L452817 0.61290323 12 17
# 6 L452817 0.19354839 13 17
# 7 L452817 0.06451613 14 17
# 8 L596267 0.19354839 0 17
# 9 L596267 0.03225806 9 17
# 10 L596267 0.06451613 11 17
# 11 L596267 0.70967742 13 17
# 12 L152650 1.00000000 13 17
# 13 L910180 0.43750000 0 17
# 14 L910180 0.34375000 8 17
# 15 L910180 0.03125000 9 17
# 16 L4791597 0.56666667 16 5
# 17 L4791597 0.03333333 16 8
# 18 L9149985 0.10000000 16 10
# 19 L9149985 0.03333333 16 11
# 20 L9149985 0.86666667 16 12
# 21 L9185766 0.06451613 16 4
# 22 L9185766 0.54838710 16 8
# 23 L9185766 0.19354839 16 9
# 24 L9185766 0.03225806 16 13
# 25 L9185766 0.16129032 16 14
# 26 L9234578 1.00000000 16 12
来源:https://stackoverflow.com/questions/65865839/control-flow-for-calling-other-columns-in-function