Python爬取B站视频全部评论

wshinng

已于 2023-09-11 16:13:26 修改

阅读量9.8k

点赞数 41

CC 4.0 BY-SA版权

分类专栏： python爬虫文章标签： python 爬虫

于 2021-09-06 21:22:54 首次发布

本文链接：https://blog.csdn.net/github_39611284/article/details/120144169

python爬虫专栏收录该内容

2 篇文章

订阅专栏

本文介绍如何使用Python3爬取B站视频的全部评论，并将其保存为CSV文件。通过输入B站视频的Bvid号，程序会遍历所有评论并存储。注意，总评论数可能包括评论和回复，而此代码仅爬取评论部分。

Python3爬取B站视频全部评论

- 1.为什么有这篇文章
- 2.相关代码

1.为什么有这篇文章

最近受朋友委托需要爬取B站视频下的评论作为他的分析数据，我上网查了很多相关教程和文章都没有爬取全部的评论，不能满足朋友的需求，只好自己动手，在此分享一下实现代码供大家学习交流。

2.相关代码

通过输入B站视频Bvid号，可以将视频下的评论全部保存到csv文件中。（说明一下：输出显示总评论数 > 当前评论数，这并不是BUG，而是总评论数包含评论和回复，这里只是爬取了评论。）

import requests
import re
import time
import csv

#消息头信息
header={'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
        }

#获取评论API
original_url = 'https://api.bilibili.com/x/v2/reply/main?jsonp=jsonp&next={}&type=1&oid={}&mode=3'

#时间戳转换成日期
def get_time(ctime):
    timeArray = time.localtime(ctime)
    otherStyleTime = time.strftime("%Y.%m.%d", timeArray)
    return str(otherStyleTime)

#获取aid
def get_oid(bvid):
    video_url = 'https://www.bilibili.com/video/' + bvid
    page = requests.get(video_url, headers=header).text
    aid = re.search(r'"aid":[0-9]+', page).group()[6:]
    return aid

#边爬取评论边保存文件
def online_save(Bvid):
    all_count = 0
    oid = get_oid(Bvid)
    page = 1
    url = original_url.format(page, oid)
    html = requests.get(url, headers=header)
    data = html.json()
    count = int(data['data']['cursor']['all_count'])
    fname = Bvid + '_评论.csv'
    with open(fname, 'w+', newline='', encoding='utf_8_sig') as f:
        csv_writer=csv.writer(f)
        csv_writer.writerow(["时间", "点赞", "评论"])
        for i in data['data']['replies']:
            message=i['content']['message']
            message = re.sub('\s+', '', message)
            ctime=get_time(i['ctime'])
            like=i['like']
            csv_writer.writerow([ctime,str(like),message])
            all_count = all_count + 1
        print('总评论数：{}，当前评论数:{},爬取Page{}完毕。'.format(count, all_count, page))
        time.sleep(5)
        while all_count < count:
            page += 1
            url = original_url.format(page, oid)
            try:
                html = requests.get(url, headers=header)
                data = html.json()
                for i in data['data']['replies']:
                    message = i['content']['message']
                    ctime = get_time(i['ctime'])
                    like = i['like']
                    csv_writer.writerow([ctime, str(like), message])
                    # f.write(ctime+'\t' + str(like) + '\n')
                    # f.write(message)
                    # f.write('\n------------------------\n')
                    all_count = all_count + 1
                print('总评论数：{}，当前评论数:{},爬取Page{}完毕。'.format(count, all_count, page))
                time.sleep(5)
            except:
                break
        f.close()

if __name__=='__main__':
    Bvid=input('输入视频Bvid:')
    online_save(Bvid)
    print('完成！')