问题
I've been asked to create a data frame in R using information copied from a website; the data is not contained in a file. The full data list is at:
https://www.npr.org/2012/12/07/166400760/hollywood-heights-the-ups-downs-and-in-betweens
Here is a portion of the data:
Leading Men (Average American male: 5 feet 9.5 inches)
Dolph Lundgren — 6 feet 5 inches
John Cleese — 6 feet 5 inches
Michael Clarke Duncan — 6 feet 5 inches
Vince Vaughn — 6 feet 5 inches
Clint Eastwood — 6 feet 4 inches
Jimmy Stewart — 6 feet 3 inches
Bill Murray — 6 feet 1.5 inches
Leading Ladies (Average American female: 5 feet 4 inches)
Uma Thurman — 6 feet 0 inches
Brooke Shields — 6 feet 0 inches
Jane Lynch — 6 feet 0 inches
I am supposed to use R to create the data frame, where one column is Name, the second is Height (in cm), and the third column is Gender.
I have copied and pasted all data into Notepad, manually made three different columns, and converted height to cm by hand. But this is manually creating the data frame.
Is there a way to make a data frame in R using the data as given?
回答1:
You can copy that whole list and then use read.line
to bring in the text on your clipboard into R. Then using regex you can extract the gender form the header of each section, expand it to the rows below, and then separate
the first column to name and height. See below;
web.lines <- read.delim("clipboard", header = F) # reading data from clipboard
library(tidyverse)
web.lines %>%
mutate(gender = str_extract(V1, "Leading\\s+\\b(\\w+)\\b")) %>% # extracting gender from headers
fill(gender , .direction = "down") %>% # filling the gender for all rows
group_by(gender) %>%
slice(-1) %>% # removing the headers
separate(V1, into = c("Name", "Height"), sep = " — ") # separating name and height
#> # A tibble: 59 x 3
#> # Groups: gender [2]
#> Name Height gender
#> <chr> <chr> <chr>
#> 1 Uma Thurman 6 feet 0 inches Leading Ladies
#> 2 Brooke Shields 6 feet 0 inches Leading Ladies
#> 3 Jane Lynch 6 feet 0 inches Leading Ladies
#> 4 Nicole Kidman 5 feet 11 inches Leading Ladies
#> 5 Tilda Swinton 5 feet 10.5 inches Leading Ladies
#> ...
#> 28 Dolph Lundgren 6 feet 5 inches Leading Men
#> 29 John Cleese 6 feet 5 inches Leading Men
#> 30 Michael Clarke Duncan 6 feet 5 inches Leading Men
#> 31 Vince Vaughn 6 feet 5 inches Leading Men
#> 32 Clint Eastwood 6 feet 4 inches Leading Men
#> ...
来源:https://stackoverflow.com/questions/64376566/creating-a-dataframe-with-text-from-a-website