how to use apply family instead of nested for loop for my problem

守給你的承諾、 提交于 2021-01-29 16:34:39

问题


I want to fill a new data frame called hd5 based on a conditions from a old data frame called dfnew1.

Can I do it without a nested for loop ?

   for(  j in 2 : length(hd6)  )
   {
     for( i in 1: length(hd5$DATE) )
    {
     abcd= dfnew1 %>%  
     filter( (Date == hd5$DATE[i]) , (StrikePrice== hd6[j]) , (OptionType== "CE"))  %>%
     arrange( dte  )          
     hd5[i,j]= abcd[1,9]
     }
   }

hd6= [13900,14000,14100,14200]

dfnew1 looks like this

Date     expiry     optiontype strikeprice closeprice  dte
1/1/2019  31/1/2019  ce          13900      700        30
1/1/2019  31/1/2019  ce          14000      650        30
1/1/2019  31/1/2019  ce          14100      600        30
1/1/2019  31/2/2019  ce          14100      900        58
1/2/2019  31/1/2019  ce          13900      800        29
1/2/2019  31/1/2019  ce          14000      750        29
1/2/2019  31/1/2019  ce          14100      700        29

i want to fill my new dataframe hd5 from this dfnew1 dataframe by maching the date and strtkeprice and optiontype

hd5 which i want to filled should look like

Date         13900  14000 14100 14200
1/1/2019     700     650   600   550
1/2/2019     800     750   700   650

回答1:


Here's a tidyverse option:

library(dplyr)
# library(tidyr)
dat %>%
  group_by(Date, strikeprice) %>%
  summarize(closeprice = min(closeprice)) %>%
  ungroup() %>%
  tidyr::pivot_wider(names_from = "strikeprice", values_from = "closeprice")
# # A tibble: 2 x 4
#   Date     `13900` `14000` `14100`
#   <chr>      <int>   <int>   <int>
# 1 1/1/2019     700     650     600
# 2 1/2/2019     800     750     700

(You might see online tutorials referencing tidyr::spread. It does effectively the same thing here, but has been retired (source: https://tidyr.tidyverse.org/reference/spread.html, along with tidyr::gather), so it is generally recommended that new code should use the pivot_* functions.)

Note: based on your expected output, it looks like you took the minimum for

1/1/2019  31/1/2019  ce          14100      600        30
1/1/2019  31/2/2019  ce          14100      900        58

I might be more inclined (when "price" is involved) to use sum instead, but it depends heavily on your actual intent and use. Replace min with your aggregation of choice, be it max, sum, or something else.

I'll note that having numeric column names is a little non-standard, and can cause confusion (dat[,14100] will fail, dat[,\14100`]ordat[,"14100"]` should generally work).

You may find that having numeric column headers makes sense for some comparisons and for depicting a table, but if you plan on plotting things (e.g., using ggplot2), often a longer version (your original layout, summarizing notwithstanding) might be preferred.


Data:

dat <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Date     expiry     optiontype strikeprice closeprice  dte
1/1/2019  31/1/2019  ce          13900      700        30
1/1/2019  31/1/2019  ce          14000      650        30
1/1/2019  31/1/2019  ce          14100      600        30
1/1/2019  31/2/2019  ce          14100      900        58
1/2/2019  31/1/2019  ce          13900      800        29
1/2/2019  31/1/2019  ce          14000      750        29
1/2/2019  31/1/2019  ce          14100      700        29")



回答2:


We can also use spread after summarizing to get the min of 'closeprice' after grouping by 'Date', 'strikeprice'

library(dplyr)
library(tidyr)
dat %>%
  group_by(Date, strikeprice) %>%
  slice(which.min(dte)) %>%
  ungroup() %>%
  spread(strikeprice, closeprice)
# A tibble: 2 x 4
#  Date     `13900` `14000` `14100`
#  <chr>      <int>   <int>   <int>
#1 1/1/2019     700     650     600
#2 1/2/2019     800     750     700

Or using pivot_wider by making use of values_fn to pass a function. Here, we select only the column of interest

dat %>%
  select(Date, strikeprice, closeprice) %>%     
  pivot_wider(names_from = strikeprice, values_from = closeprice,
       values_fn = list(closeprice = min))
# A tibble: 2 x 4   
#  Date     `13900` `14000` `14100`
#  <chr>      <int>   <int>   <int>
#1 1/1/2019     700     650     600
#2 1/2/2019     800     750     700

Or another option is dcast

library(data.table)
dcast(setDT(dat), Date  ~ strikeprice, min, value.var = 'closeprice')
#       Date 13900 14000 14100
#1: 1/1/2019   700   650   600
#2: 1/2/2019   800   750   700

data

dat <- structure(list(Date = c("1/1/2019", "1/1/2019", "1/1/2019", "1/1/2019", 
"1/2/2019", "1/2/2019", "1/2/2019"), expiry = c("31/1/2019", 
"31/1/2019", "31/1/2019", "31/2/2019", "31/1/2019", "31/1/2019", 
"31/1/2019"), optiontype = c("ce", "ce", "ce", "ce", "ce", "ce", 
"ce"), strikeprice = c(13900L, 14000L, 14100L, 14100L, 13900L, 
14000L, 14100L), closeprice = c(700L, 650L, 600L, 900L, 800L, 
750L, 700L), dte = c(30L, 30L, 30L, 58L, 29L, 29L, 29L)),
class = "data.frame", row.names = c(NA, 
-7L))


来源:https://stackoverflow.com/questions/61857346/how-to-use-apply-family-instead-of-nested-for-loop-for-my-problem

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!