纯属瞎研究,若有值得改进的地方,欢迎提出宝贵意见~
第一版
很简单的几句代码,同一个ip一分钟内频繁刷无效,时间间隔设置为60多秒。速度很慢,一分多钟才可以增加一次浏览量。
前提,安装selenium库
第一版代码:(仅供参考,不建议使用)
import time
from selenium import webdriver
import random
driver = webdriver.Chrome()
driver.maximize_window()
#driver.implicitly_wait(6)
driver.get("https://blog.csdn.net/weixin_42486139/article/details/102687538") #博客链接
for i in range(1000):
print('第%d次刷新'%i)
try:
driver.refresh() # 刷新方法 refresh
print ('test pass: refresh successful')
except Exception as e:
print ("Exception found", format(e))
driver.quit()
time.sleep(60+random.randint(1,10)) #隔一分多钟刷新一次
第二版
既然这样,那我多搞几个ip呢,其他博客也有很多参考,都是基于urllib库,不知道为什么,我使用这个库,可以抓取到网页信息,但是浏览量不增加,还是在selenium上继续搞。
免费ip代理网站 :https://www.xicidaili.com/ 从这里搞几个可用的ip(不是所有的ip都可用),若不知道如何获取可用ip,请下拉至文章最后,附判断ip是否可用代码。
第二版代码:(设置了10个ip,可以6s刷一次)
# -*- coding: utf-8 -*-
"""
Created on Tue Oct 22 17:01:54 2019
@author: mandy
"""
import time
from selenium import webdriver
#设置selenium后台模式执行,避免多次弹窗打开网页关闭网页
option=webdriver.ChromeOptions()
option.add_argument('--headless')
option.add_argument('--no-sandbox')
option.add_argument('--start-maximized')
#driver = webdriver.Chrome(chrome_options=option)
#chromeOptions = webdriver.ChromeOptions()
proxys=["--proxy-server=http://221.178.232.130:8080",
"--proxy-server=http://61.131.160.177:9006",
"--proxy-server=http://122.194.209.187:61234",
"--proxy-server=http://59.37.18.243:3128",
"--proxy-server=http://218.64.69.79:8080",
"--proxy-server=http://222.90.110.194:8080",
"--proxy-server=http://114.249.230.208:8000",
"--proxy-server=http://222.184.59.8:808",
"--proxy-server=http://27.128.187.22:3128",
"--proxy-server=http://113.109.249.32:808",
]
j=0
for i in range(1000):
for proxy in proxys:
try: #try..except..保证遇到TimeoutException报错不中断
option.add_argument(proxy)
driver = webdriver.Chrome(chrome_options=option)
driver.get("https://blog.csdn.net/weixin_42486139/article/details/102687538") #博客链接
j+=1
print('第%d次刷新'%j)
time.sleep(6)
driver.quit()
except Exception as e:
print(e)
虽然设置了后台模式运行,但是每次执行webdriver.Chrome()会弹出dos窗,如下:
解决方法:修改selenium包中的service.py代码第76行,改为:
ok,大功告成,在后台默默地刷新就可以了,不会影响你做其他的事情。
附
判断ip是否可用代码(可用的ip地址及端口号保存在生成的IP.txt文件中):
# -*- coding: utf-8 -*-
"""
Created on Mon Nov 4 16:17:01 2019
@author: mandy
"""
import urllib.request
from urllib import request, parse
import requests
# 写入获取到的ip地址到proxy
proxys = [
{'https':'221.178.232.130:8080'},
{'https':'61.131.160.177:9006'},
{'https':'122.194.209.187:61234'},
{'https':'59.37.18.243:3128'},
{'https':'218.64.69.79:8080'},
{'https':'222.90.110.194:8080'}, #5
{'https':'222.184.59.8:808'},
{'https':'218.249.69.214:1081'},
{'https':'114.249.230.208:8000'},
{'https':'27.128.187.22:3128'},
{'https':'114.220.115.180:8118'},
{'https':'116.228.44.9:8085'},
{'https':'113.109.249.32:808'},
{'https':'183.129.207.78:18118'},
{'https':'14.20.235.117:808'},
{'https':'122.136.212.132:53281'}
]
# 用百度检测ip代理是否成功
url = 'https://www.baidu.com/'
# 请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}
# 发送get请求
for proxy in proxys:
try:
response = requests.get(url=url,headers=headers,proxies=proxy)
f=open('IP.txt','a')
f.write(str(proxy)+'\n')
f.close()
#print(response.text)
## 获取返回页面保存到本地,便于查看
#with open('ip.html','w',encoding='utf-8') as f:
# f.write(response.text)
except:
print(proxy,'无效ip!')