# PSpider
A simple web spider frame written by Python, which needs Python3.8+
### Features of PSpider
1. Support multi-threading crawling mode (using threading)
2. Support using proxies for crawling (using threading and queue)
3. Define some utility functions and classes, for example: UrlFilter, get_string_num, etc
4. Fewer lines of code, easyer to read, understand and expand
### Modules of PSpider
1. utilities module: define some utilities functions and classes for multi-threading spider
2. instances module: define classes of Fetcher, Parser, Saver for multi-threading spider
3. concurrent module: define WebSpiderFrame of multi-threading spider
### Procedure of PSpider

①: Fetchers get TaskFetch from QueueFetch, and make requests based on this task
②: Put the result(TaskParse) of ① to QueueParse, and so Parser can get task from it
③: Parser gets task from QueueParse, and parses content to get new TaskFetchs and TaskSave
④: Put the new TaskFetchs to QueueFetch, and so Fetchers can get task from it again
⑤: Put the TaskSave to QueueSave, and so Saver can get task from it
⑥: Saver gets TaskSave from QueueSave, and saves items to filesystem or database
⑦: Proxieser gets proxies from web or database, and puts proxies to QueueProxies
⑧: Fetcher gets proxies from QueueProxies if needed, and makes requests based on this proxies
### Tutorials of PSpider
**Installation: you'd better use the first method**
(1)Copy the "spider" directory to your project directory, and `import spider`
(2)Install spider to your python system using `python3 setup.py install`
**See test.py**
### TodoList
1. More Demos
2. Distribute Spider
3. Execute JavaScript
### If you have any questions or advices, you can commit "Issues" or "Pull requests"

chinacha_
- 粉丝: 2271
最新资源
- 5种ceemdan组合时间序列预测模型Python代码(包括ceemdan-lstm、ceemdan-cnn-lstm等)
- 江苏移动通信有限责任公司员工绩效考核实施细则精.doc
- 最新国家开放大学电大《优秀广告作品评析答案》网络核心课形考网考作业.docx
- 工程项目管理计划书.doc
- 基于PLC双轴位置控制.docx
- 基于复矢量PI控制器的模型参考自适应三相永磁同步电机高速低载波比无速度传感器控制仿真研究 - MATLAB 宝典
- 第8章-网络营销的策略组合.ppt
- (源码)基于NodeMCU的可视化通知提醒系统.zip
- 系统集成测试(SIT)报告.docx
- 基于MATLAB的GMSK系统的设计仿真.doc
- 离心风机辐射噪声仿真分析:从结构模态到声源辐射噪声的全流程解析 · 辐射噪声 深度版
- 专题讲座资料(2021-2022年)大工秋Java程序设计在线作业.docx
- (源码)基于Arduino的EDeliveryRobot.zip
- Comsol光子晶体仿真技术:拓扑荷、偏振态、三维能带及Q因子计算
- 基于非支配排序的多目标鱼鹰优化算法求解柔性作业车间调度问题的MATLAB实现
- (源码)基于多种编程语言和框架的物联网服务器与客户端.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈


