Windows下安装和使用scrapy-redis

1、Windows下安装Redis服务

下载Windows的安装包地址:
https://github.com/microsoftarchive/redis/releases/download/win-3.0.504/Redis-x64-3.0.504.msi

redis可视化工具软件包:
https://github.com/uglide/RedisDesktopManager/releases/download/0.9.3/redis-desktop-manager-0.9.3.817.exe

安装不懂和配置步骤也可以参考:
推荐参考下面第一个博客,很详细的安装和配置:https://www.cnblogs.com/jaign/articles/7920588.html

https://jingyan.baidu.com/article/0f5fb099045b056d8334ea97.html

2、Windows下scrapy-redis的安装与配置

2.1、安装:

pip install scrapy-redis

2.2、setting配置:

1、是否遵守robot协议:一般选择不遵守
ROBOTSTXT_OBEY = False

2、显示log设置:
_LEVEL = 'DEBUG'
LOG_LEVEL = "WARNING"

3、scrapy-redis设置:
# redis
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
SCHEDULER_PERSIST = True
DOWNLOAD_DELAY = 3

REDIS_URL = 'redis://192.168.12.209:6379/8'

# redis数据库没有密码情况
# REDIS_URL = 'redis://30.6.252.40:6379/1'
# redis数据库有密码情况
REDIS_HOST = '30.6.252.40'
REDIS_PORT = 6379
REDIS_PARAMS = {
    'password': '123456',
}
# 另一种有密码情况 Windows账户名
REDIS_URL = 'redis://Windows账户名:123456@127.0.0.1:6379/10'

# item存放在redis中的配置
ITEM_PIPELINES = {
    'ddbooks.pipelines.InfoPipeline': 100,
    'ddbooks.pipelines.DdbooksPipeline': 300,
    # 'scrapy_redis.pipelines.RedisPipeline': 400,
}

3、scrapy-redis三个模板

可以去参考:
官方文档:https://scrapy-redis.readthedocs.io/en/stable/
源码位置:https://github.com/rmax/scrapy-redis

一、CrawlSpider的继承与设置:

	from scrapy.linkextractors import LinkExtractor
	from scrapy.spiders import CrawlSpider, Rule
	
	class DmozSpider(CrawlSpider):
	    name = 'dmoz'
	    allowed_domains = ['dmoztools.net']
	    start_urls = ['http://www.dmoztools.net/']
	
	    rules = [
	        Rule(LinkExtractor(restrict_xpaths=(“”)), callback='parse_directory', follow=True),
	    ]

二、RedisSpider的继承与设置:

	from scrapy_redis.spiders import RedisSpider
	
	class MySpider(RedisSpider):
	    name = 'myspider_redis'
	    redis_key = 'myspider:start_urls'

三、RedisCrawlSpider的继承与设置:

from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from scrapy_redis.spiders import RedisCrawlSpider

class MyCrawler(RedisCrawlSpider):
    name = 'mycrawler_redis'
    redis_key = 'mycrawler:start_urls'

    rules = (    
        Rule(LinkExtractor(restrict_xpaths=(“”)), callback='parse_page', follow=True),
    )
相关推荐
<p> <b><span style="background-color:#FFE500;">【超实用课程内容】</span></b> </p> <p> <br /> </p> <p> <br /> </p> <p> 本课程内容包含讲解<span>解读Nginx的基础知识,</span><span>解读Nginx的核心知识、带领学员进行</span>高并发环境的Nginx性能优化实战,让学生能够快速将所学融合到企业应用中。 </p> <p> <br /> </p> <p style="font-family:Helvetica;color:#3A4151;font-size:14px;background-color:#FFFFFF;"> <b><br /> </b> </p> <p style="font-family:Helvetica;color:#3A4151;font-size:14px;background-color:#FFFFFF;"> <b><span style="background-color:#FFE500;">【课程如何观看?】</span></b> </p> <p style="font-family:Helvetica;color:#3A4151;font-size:14px;background-color:#FFFFFF;"> PC端:<a href="https://edu.csdn.net/course/detail/26277"><span id="__kindeditor_bookmark_start_21__"></span></a><a href="https://edu.csdn.net/course/detail/27216">https://edu.csdn.net/course/detail/27216</a> </p> <p style="font-family:Helvetica;color:#3A4151;font-size:14px;background-color:#FFFFFF;"> 移动端:CSDN 学院APP(注意不是CSDN APP哦) </p> <p style="font-family:Helvetica;color:#3A4151;font-size:14px;background-color:#FFFFFF;"> 本课程为录播课,课程永久有效观看时长,大家可以抓紧时间学习后一起讨论哦~ </p> <p style="font-family:"color:#3A4151;font-size:14px;background-color:#FFFFFF;"> <br /> </p> <p class="ql-long-24357476" style="font-family:"color:#3A4151;font-size:14px;background-color:#FFFFFF;"> <strong><span style="background-color:#FFE500;">【学员专享增值服务】</span></strong> </p> <p class="ql-long-24357476" style="font-family:"color:#3A4151;font-size:14px;background-color:#FFFFFF;"> <b>源码开放</b> </p> <p class="ql-long-24357476" style="font-family:"color:#3A4151;font-size:14px;background-color:#FFFFFF;"> 课件、课程案例代码完全开放给你,你可以根据所学知识,自行修改、优化 </p> <p class="ql-long-24357476" style="font-family:"color:#3A4151;font-size:14px;background-color:#FFFFFF;"> 载方式:电脑登录<a href="https://edu.csdn.net/course/detail/26277"></a><a href="https://edu.csdn.net/course/detail/27216">https://edu.csdn.net/course/detail/27216</a>,播放页面右侧点击课件进行资料打包载 </p> <p> <br /> </p> <p> <br /> </p> <p> <br /> </p>
©️2020 CSDN 皮肤主题: 撸撸猫 设计师:马嘣嘣 返回首页
实付 9.90元
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值