问题
I want to fill a new data frame called hd5
based on a conditions from a old data frame called dfnew1
.
Can I do it without a nested for
loop ?
for( j in 2 : length(hd6) )
{
for( i in 1: length(hd5$DATE) )
{
abcd= dfnew1 %>%
filter( (Date == hd5$DATE[i]) , (StrikePrice== hd6[j]) , (OptionType== "CE")) %>%
arrange( dte )
hd5[i,j]= abcd[1,9]
}
}
hd6= [13900,14000,14100,14200]
dfnew1 looks like this
Date expiry optiontype strikeprice closeprice dte
1/1/2019 31/1/2019 ce 13900 700 30
1/1/2019 31/1/2019 ce 14000 650 30
1/1/2019 31/1/2019 ce 14100 600 30
1/1/2019 31/2/2019 ce 14100 900 58
1/2/2019 31/1/2019 ce 13900 800 29
1/2/2019 31/1/2019 ce 14000 750 29
1/2/2019 31/1/2019 ce 14100 700 29
i want to fill my new dataframe hd5 from this dfnew1 dataframe by maching the date and strtkeprice and optiontype
hd5 which i want to filled should look like
Date 13900 14000 14100 14200
1/1/2019 700 650 600 550
1/2/2019 800 750 700 650
回答1:
Here's a tidyverse option:
library(dplyr)
# library(tidyr)
dat %>%
group_by(Date, strikeprice) %>%
summarize(closeprice = min(closeprice)) %>%
ungroup() %>%
tidyr::pivot_wider(names_from = "strikeprice", values_from = "closeprice")
# # A tibble: 2 x 4
# Date `13900` `14000` `14100`
# <chr> <int> <int> <int>
# 1 1/1/2019 700 650 600
# 2 1/2/2019 800 750 700
(You might see online tutorials referencing tidyr::spread
. It does effectively the same thing here, but has been retired (source: https://tidyr.tidyverse.org/reference/spread.html, along with tidyr::gather
), so it is generally recommended that new code should use the pivot_*
functions.)
Note: based on your expected output, it looks like you took the minimum for
1/1/2019 31/1/2019 ce 14100 600 30
1/1/2019 31/2/2019 ce 14100 900 58
I might be more inclined (when "price" is involved) to use sum
instead, but it depends heavily on your actual intent and use. Replace min
with your aggregation of choice, be it max
, sum
, or something else.
I'll note that having numeric column names is a little non-standard, and can cause confusion (dat[,14100]
will fail, dat[,\
14100`]or
dat[,"14100"]` should generally work).
You may find that having numeric column headers makes sense for some comparisons and for depicting a table, but if you plan on plotting things (e.g., using ggplot2
), often a longer version (your original layout, summarizing notwithstanding) might be preferred.
Data:
dat <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Date expiry optiontype strikeprice closeprice dte
1/1/2019 31/1/2019 ce 13900 700 30
1/1/2019 31/1/2019 ce 14000 650 30
1/1/2019 31/1/2019 ce 14100 600 30
1/1/2019 31/2/2019 ce 14100 900 58
1/2/2019 31/1/2019 ce 13900 800 29
1/2/2019 31/1/2019 ce 14000 750 29
1/2/2019 31/1/2019 ce 14100 700 29")
回答2:
We can also use spread
after summarizing to get the min
of 'closeprice' after grouping by 'Date', 'strikeprice'
library(dplyr)
library(tidyr)
dat %>%
group_by(Date, strikeprice) %>%
slice(which.min(dte)) %>%
ungroup() %>%
spread(strikeprice, closeprice)
# A tibble: 2 x 4
# Date `13900` `14000` `14100`
# <chr> <int> <int> <int>
#1 1/1/2019 700 650 600
#2 1/2/2019 800 750 700
Or using pivot_wider
by making use of values_fn
to pass a function. Here, we select
only the column of interest
dat %>%
select(Date, strikeprice, closeprice) %>%
pivot_wider(names_from = strikeprice, values_from = closeprice,
values_fn = list(closeprice = min))
# A tibble: 2 x 4
# Date `13900` `14000` `14100`
# <chr> <int> <int> <int>
#1 1/1/2019 700 650 600
#2 1/2/2019 800 750 700
Or another option is dcast
library(data.table)
dcast(setDT(dat), Date ~ strikeprice, min, value.var = 'closeprice')
# Date 13900 14000 14100
#1: 1/1/2019 700 650 600
#2: 1/2/2019 800 750 700
data
dat <- structure(list(Date = c("1/1/2019", "1/1/2019", "1/1/2019", "1/1/2019",
"1/2/2019", "1/2/2019", "1/2/2019"), expiry = c("31/1/2019",
"31/1/2019", "31/1/2019", "31/2/2019", "31/1/2019", "31/1/2019",
"31/1/2019"), optiontype = c("ce", "ce", "ce", "ce", "ce", "ce",
"ce"), strikeprice = c(13900L, 14000L, 14100L, 14100L, 13900L,
14000L, 14100L), closeprice = c(700L, 650L, 600L, 900L, 800L,
750L, 700L), dte = c(30L, 30L, 30L, 58L, 29L, 29L, 29L)),
class = "data.frame", row.names = c(NA,
-7L))
来源:https://stackoverflow.com/questions/61857346/how-to-use-apply-family-instead-of-nested-for-loop-for-my-problem