How to prevent from blocking IP in scraping Alibaba products?

297
December 01, 2017, at 09:13 AM

I have a python script to scrape Alibaba products. When I start to run it and visited several Alibaba products, Alibaba website start to block my IP because I visited their site every 3 seconds.

I found a solution to prevent blocking from Alibaba website. I need to have a 20 seconds of delay for every product website. But the problem is the scraping is too slow, and Alibaba have a millions of product per category.

Can you suggest how to improve scraping speed without blocking my IP? is Proxy IP is that only solution?

Answer 1

You can buy IPs for cheap. It will come in a .txt file. You can even get a premium version of some proxy provider if you Google hard enough.

You can then use random headers, combine them with random IPs, and then send a request.

If you combine automatic header rotations and random IPs in combination with it, you can achieve what you want.

There is no readily available implementation of this.

from fake_useragent import UserAgent
import requests 
url = "https://www.google.com/search?tbm=bks&q=" + query
headers = {
'User-Agent': UserAgent().random
}
response = requests.get(url, headers = headers, proxies = ???)
r = response.content

The code will look something along the lines above, you just need to figure out how to randomize picking of a particular IP from the proxylist, and fitting it in proxies.

Also keep in mind! The proxies that you use also matters. HTTP proxies are easily traceable by the website. HTTPS proxies are what you need! But they are harder to get, and they stop working quickly!

Hope this helps. :)

Rent Charter Buses Company
READ ALSO
How to pivot a dataframe

How to pivot a dataframe

I've seen a lot of questions that ask about pivot tablesEven if they don't know that they are asking about pivot tables, they usually are

300
Is it a bad practice to use a class to store globals

Is it a bad practice to use a class to store globals

I have seen this done quite a bitExample:

206
Open Excel file and wait to close it

Open Excel file and wait to close it

Regarding mentioned subject, I need a way in Python to open Excel file and wait to fill it with some data and close it then the script continue work

364