Make use of Scrapy’s standard HttpProxyMiddleware by specifying proxy meta value and the autherization header in a Scrapy Request, for example:
If you want to specifying proxy for each request
[sourcecode language=”python” wraplines=”false” collapse=”false”] import scrapyfrom w3lib.http import basic_auth_header
yield scrapy.Request(
url=url, callback=self.parse,
meta={‘proxy’: ‘https://<PROXY_IP_OR_URL>:<PROXY_PORT>’},
headers={
‘Proxy-Authorization’: basic_auth_header(
‘<PROXY_USERNAME>’, ‘<PROXY_PASSWORD>’)
}
)
[/sourcecode]
If you want to specifying proxy for all requests
In order to route all spider’s requests through the proxy automatically, isolate its details in a middleware by adding this example class in the project’s middlewares.py file:
[sourcecode language=”python” wraplines=”false” collapse=”false”] from w3lib.http import basic_auth_headerclass CustomProxyMiddleware(object):
def process_request(self, request, spider):
request.meta[‘proxy’] = “https://<PROXY_IP_OR_URL>:<PROXY_PORT>”
request.headers[‘Proxy-Authorization’] = basic_auth_header(
‘<PROXY_USERNAME>’, ‘<PROXY_PASSWORD>’)
[/sourcecode]
Then reference it in the downloader middlewares section of the project’s settings.py, putting it before the standard HttpProxyMiddleware:
[sourcecode language=”python” wraplines=”false” collapse=”false”] DOWNLOADER_MIDDLEWARES = {‘<PROJECT_NAME>.middlewares.CustomProxyMiddleware’: 350,
‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’: 400,
}
[/sourcecode]
I am a Freelancer in programming specifically Python Scripting, Web scraping, and Web automation with 10+ years of experience.