问题
I am trying to log in in stackoverflow and navigating on the search bar, searching by tidyverse package.
The main problem is when I set the url, which is not giving me the form to fill with my email and my password:
So url<-"https://stackoverflow.com"
doesnt work. I tried the url: url<-"https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
which is the url that I have when I click on the the Log in bottom, but I also can't find the form to fill with my email and password when using html_form
. This is my code:
library(rvest)
url<-"https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
(session <- html_session(url))
(form <- html_form(read_html(url))[[1]])
(filled_form <- set_values(form,email="myemail@gmail.com",pass="mypassword"))
(form_submitted<-submit_form(session,filled_form))
(submitted_url<-form_submitted$url)
after_filled_html<-jump_to(session,submitted_url)
And after this, I would like to do a search by the term: [tidyverse]
and start scraping it.
I think this second part I will be able to manage if I solve the problem of the code above if I fix the login/password/form problem.
Any help guys
回答1:
You could directly set the search term in the URL, without need to log into stackoverflow
:
library(rvest)
getStackQuestions <- function(search) {
stackoverflow <- read_html(paste0('https://stackoverflow.com/questions/tagged/',search,'?tab=Newest'))
questions <- stackoverflow %>% html_nodes(".question-hyperlink:not(.mb0)")
question.href <- questions %>% html_attr('href')
question.text <- questions %>% html_text()
questions <- data.frame( text = question.text, href = paste0("https://stackoverflow.com",question.href))
questions
}
tidyverse_questions <- getStackQuestions('tidyverse')
head(tidyverse_questions$text)
[1] "Python/Pandas equivalent of across and weighted average"
[2] "Transforming columns based off separate dataframe - R solution"
[3] "Group by summarize in between dates with dplyr"
[4] "Transpose complex data.frame with tidyR"
[5] "Create 1 composite variable derived from different combinations of values of 2nd variable that are separated by specific levels of 3rd variable"
[6] "extracting a cv.glmnet object from Tune_results"
来源:https://stackoverflow.com/questions/65864142/navigating-and-scraping-with-r-rvest