使用python爬虫增加浏览量

最新推荐文章于 2022-11-13 13:24:42 发布

小由之

最新推荐文章于 2022-11-13 13:24:42 发布

阅读量3.7w

点赞数 12

CC 4.0 BY-SA版权

文章标签：爬虫

本文链接：https://blog.csdn.net/weixin_42486139/article/details/102687538

本文介绍使用Selenium库进行自动化网页刷新的方法，并通过设置多个IP代理绕过同一IP频繁访问限制，实现提高博客浏览量的目的。同时分享了如何判断IP是否可用的技术细节。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

纯属瞎研究，若有值得改进的地方，欢迎提出宝贵意见~

第一版

很简单的几句代码，同一个ip一分钟内频繁刷无效，时间间隔设置为60多秒。速度很慢，一分多钟才可以增加一次浏览量。

前提，安装selenium库

第一版代码：（仅供参考，不建议使用）

import time
from selenium import webdriver
import random
driver = webdriver.Chrome()
driver.maximize_window()
#driver.implicitly_wait(6)

driver.get("https://blog.csdn.net/weixin_42486139/article/details/102687538") #博客链接

for i in range(1000):
    print('第%d次刷新'%i)
    try:   
        driver.refresh() # 刷新方法 refresh
        print ('test pass: refresh successful')
    except Exception as e:
        print ("Exception found", format(e))
        driver.quit()
    time.sleep(60+random.randint(1,10)) #隔一分多钟刷新一次

第二版

既然这样，那我多搞几个ip呢，其他博客也有很多参考，都是基于urllib库，不知道为什么，我使用这个库，可以抓取到网页信息，但是浏览量不增加，还是在selenium上继续搞。

免费ip代理网站：https://www.xicidaili.com/ 从这里搞几个可用的ip（不是所有的ip都可用），若不知道如何获取可用ip，请下拉至文章最后，附判断ip是否可用代码。

第二版代码：（设置了10个ip，可以6s刷一次）

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 22 17:01:54 2019

@author: mandy
"""
import time
from selenium import webdriver

#设置selenium后台模式执行，避免多次弹窗打开网页关闭网页
option=webdriver.ChromeOptions()
option.add_argument('--headless')
option.add_argument('--no-sandbox')
option.add_argument('--start-maximized')
#driver = webdriver.Chrome(chrome_options=option)

#chromeOptions = webdriver.ChromeOptions()
proxys=["--proxy-server=http://221.178.232.130:8080",
        "--proxy-server=http://61.131.160.177:9006",
         "--proxy-server=http://122.194.209.187:61234",
         "--proxy-server=http://59.37.18.243:3128",
         "--proxy-server=http://218.64.69.79:8080",
         "--proxy-server=http://222.90.110.194:8080",
         "--proxy-server=http://114.249.230.208:8000",
         "--proxy-server=http://222.184.59.8:808",
         "--proxy-server=http://27.128.187.22:3128",
         "--proxy-server=http://113.109.249.32:808",
        ]
j=0
for i in range(1000):
    for proxy in proxys:
        try:    #try..except..保证遇到TimeoutException报错不中断
            option.add_argument(proxy)
            driver = webdriver.Chrome(chrome_options=option)
            driver.get("https://blog.csdn.net/weixin_42486139/article/details/102687538") #博客链接
            j+=1
            print('第%d次刷新'%j)
            time.sleep(6) 
            driver.quit()
        except Exception as e:
            print(e)

虽然设置了后台模式运行，但是每次执行webdriver.Chrome()会弹出dos窗，如下：

解决方法：修改selenium包中的service.py代码第76行，改为：

ok，大功告成，在后台默默地刷新就可以了，不会影响你做其他的事情。

附

判断ip是否可用代码（可用的ip地址及端口号保存在生成的IP.txt文件中）：

# -*- coding: utf-8 -*-
"""
Created on Mon Nov  4 16:17:01 2019

@author: mandy
"""
import urllib.request
from urllib import request, parse
import requests

# 写入获取到的ip地址到proxy
proxys = [
        {'https':'221.178.232.130:8080'},
        {'https':'61.131.160.177:9006'},
        {'https':'122.194.209.187:61234'},
        {'https':'59.37.18.243:3128'},
        {'https':'218.64.69.79:8080'},
        {'https':'222.90.110.194:8080'},   #5
        {'https':'222.184.59.8:808'},
        {'https':'218.249.69.214:1081'},
        {'https':'114.249.230.208:8000'},
        {'https':'27.128.187.22:3128'},
        {'https':'114.220.115.180:8118'},
        {'https':'116.228.44.9:8085'},
        {'https':'113.109.249.32:808'},
        {'https':'183.129.207.78:18118'},
        {'https':'14.20.235.117:808'},
        {'https':'122.136.212.132:53281'}

        ]
# 用百度检测ip代理是否成功
url = 'https://www.baidu.com/'
# 请求头
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}
# 发送get请求
for proxy in proxys:   
    try:
        response = requests.get(url=url,headers=headers,proxies=proxy)
        f=open('IP.txt','a')
        f.write(str(proxy)+'\n')
        f.close()
        #print(response.text)
        ## 获取返回页面保存到本地，便于查看
        #with open('ip.html','w',encoding='utf-8') as f:
        #    f.write(response.text)
    
    except:
        print(proxy,'无效ip！')