Converting text to a data.frame based on headers

前端 未结 1 1384
不知归路
不知归路 2021-01-21 12:44

I uploaded a .txt file in to R as follows: Election_Parties <- readr::read_lines(\"Election_Parties.txt\"). Let\'s say the following te

相关标签:
1条回答
  • 2021-01-21 13:27

    If your separator is always "", then once you have your text in a vector; use that as a demarcator and do cumsum to separate them into groups.

    TXT = readr::read_lines("Election_Parties.txt")
    #we add a separator for your first country
    TXT = c("",TXT)
    idx <- cumsum(TXT=="")
    # use idx <- cumsum(!grepl("^[A-Z]",TXT)) if weird newline
    

    You can see BOLIVIA goes into 1, COLOMBIA goes into 2

    tibble::tibble(TXT,idx)
    # A tibble: 10 x 2
       TXT                                                                       idx
       <chr>                                                                   <int>
     1 ""                                                                          1
     2 BOLIVIA                                                                     1
     3 "P17-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimie…     1
     4 P19-Liberty and Justice (Libertad y Justicia [LJ])                          1
     5 P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tup…     1
     6 ""                                                                          2
     7 COLOMBIA                                                                    2
     8 P1-Democratic Aliance M-19 (Alianza Democratica M-19 [AD-M19])              2
     9 P2-National Popular Alliance (Alianza Nacional Popular [ANAPO])             2
    10 P3-Indigenous Authorities of Colombia (Autoridades Indígenas de Colomb…     2
    

    We just apply a function to each group and make a dataframe

    func = function(x){
      data.frame(Country=x[2],Parties=x[3:length(x)])
    }
    do.call(rbind,by(TXT,idx,func))
    
    0 讨论(0)
提交回复
热议问题