Check whether a website provides photo or video based on a pattern in its URL

…衆ロ難τιáo~ 提交于 2021-02-17 05:15:09

问题


I am wondering how I can figure out if a website provide photo or video by checking its URL. I investigated the website that I am interested in and found that most of the links I have are in this form: (I am not sure if I can actually name the website, so for now I just wrote it in a form of an example):

http://www.example.com/abcdef

where example is the main domain and abcdef is a number like 69964. The interesting pattern I found is that after entering this URL, if it actually has video the URL will change automatically to https://www.example.com/abcdef#mode=tour while if it's just a photo, it will change to https://www.example.com/abcdef#mode=0

Now I have a list of URLs from this website and I just want to check if it has photo or video, or it's not working (invalid URL). Is there anyway to do that?


回答1:


So I have a rather simple solution for this.

Inspecting the URLs provided by the OP (e.g., https://www.pixilink.com/93313) indicates that the #mode= default value is provided by the variable initial_mode = in an embedded javascript. So to establish whether a URL will default to "picture" (#mode=0) or video (#mode=tour) can be accomplished by investigating the value assigned to this variable.

#Function to get the value of initial_mode from the URL
urlmode <- function(x){
  mycontent <- readLines(x)
  mypos <- grep("initial_mode = ", mycontent)
  
  if(grepl("0", mycontent[mypos])){
    cat("\n", x, "has default initial_mode picture: #mode=0 \n")
    return("picture")
  } else if(grepl("tour", mycontent[mypos])){
    cat("\n", x, "has default initial_mode video: #mode=tour \n")
    return("video")
  } else{
    cat("\n", x, "is an invalid URL. \n")
    return("invalid")
  }
}


#Example URLs to demonstrate functionality
myurl1 <- "https://www.pixilink.com/93313"
myurl2 <- "https://www.pixilink.com/69964"


urlmode(myurl1)
#
# https://www.pixilink.com/93313 has default initial_mode picture: #mode=0 
#[1] "picture"
#Warning message:
#In readLines(x) :
#  incomplete final line found on 'https://www.pixilink.com/93313'
#

urlmode(myurl2)
#
# https://www.pixilink.com/69964 has default initial_mode video: #mode=tour 
#[1] "video"
#Warning message:
#In readLines(x) :
#  incomplete final line found on 'https://www.pixilink.com/69964'

Needless to say this is an extremely simplistic function that will (most likely) fail all but the ideal (sub)set of cases. But it's a start.



来源:https://stackoverflow.com/questions/64597035/check-whether-a-website-provides-photo-or-video-based-on-a-pattern-in-its-url

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!