Webscraping Instagram follower count BeautifulSoup

后端未结

关注

 5  2000

暖寄归人 2021-01-18 21:12

I\'m just starting to learn how to web scrape using BeautifulSoup and want to write a simple program that will get the follower count for a given Instagram

5条回答

走了就别回头了 (楼主)

2021-01-18 22:01
Instagram always responds with JSON data, making it a usually cleaner option to obtain metadata from the JSON, rather than parsing the HTML response with BeautifulSoup. Given that using BeatifulSoup is not a constraint, there are at least two clean options to get the follower count of an Instagram profile:
1. Obtain the profile page, search the JSON and parse it:
```
import json
import re
import requests

response = requests.get('https://www.instagram.com/' + PROFILE)
json_match = re.search(r'window\._sharedData = (.*);', response.text)
profile_json = json.loads(json_match.group(1))['entry_data']['ProfilePage'][0]['graphql']['user']

print(profile_json['edge_followed_by']['count'])
```
  Then, profile_json variable contains the profile's metadata, not only the follower count.
2. Use a library, leaving changes of Instagram's responses the upstream's problem. There is Instaloader, which can be used liked this:
```
from instaloader import Instaloader, Profile

L = Instaloader()
profile = Profile.from_username(L.context, PROFILE)

print(profile.followers)
```
  It also supports logging in, allowing to access private profiles as well.
  
  (disclaimer: I am authoring this tool)
Either way, you obtain a structure containing the profile's metadata, without needing to do strange things to the html response.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...