How to Extract Instagram Data

前端 未结 4 538
傲寒
傲寒 2021-02-01 11:12

I\'m attempting to construct a Microsoft Access database of Instagram accounts, and want to extract the following data, among other things:

  • Account name
  • N
4条回答
  •  面向向阳花
    2021-02-01 11:36

    You should definitely check out Instagram's API, which can provide you all the public information you would want to scrape. You'll just need to write a script to make the proper API calls (provided below).

    From Instagram's website:

    We do our best to have all our URLs be RESTful. Every endpoint (URL) may support one of four different http verbs. GET requests fetch information about an object, POST requests create objects, PUT requests update objects, and finally DELETE requests will delete objects.

    You'll just need to have the ACCESS-TOKEN value for the relevant account ready when you use the URL in your code, and be able to unpack the json that Instagram returns to you with each GET request. If the data isn't directly available, you can always back it out indirectly. - Account name - Number of followers - Number of people followed

    Here's a great starting point: https://www.instagram.com/developer/endpoints/users/#get_users

    And here's how you would make a call to an API in python:

    #Python 2.7.6
    #RestfulClient.py
    
    import requests
    from requests.auth import HTTPDigestAuth
    import json
    
    # Replace with the correct URL
    url = "http://api_url"
    
    # It is a good practice not to hardcode the credentials. So ask the user to enter credentials at runtime
    myResponse = requests.get(url,auth=HTTPDigestAuth(raw_input("username: "), raw_input("Password: ")), verify=True)
    #print (myResponse.status_code)
    
    # For successful API call, response code will be 200 (OK)
    if(myResponse.ok):
    
        # Loading the response data into a dict variable
        # json.loads takes in only binary or string variables so using content to fetch binary content
        # Loads (Load String) takes a Json file and converts into python data structure (dict or list, depending on JSON)
        jData = json.loads(myResponse.content)
    
        print("The response contains {0} properties".format(len(jData)))
        print("\n")
        for key in jData:
            print key + " : " + jData[key]
    else:
      # If response code is not ok (200), print the resulting http error code with description
        myResponse.raise_for_status()
    

提交回复
热议问题