Python is a rather easy-to-learn but powerful language enough to be used in a variety of applications ranging from AI and Machine Learning all the way to something as simple as creating web scraping bots.
That said, random bugs and glitches are still the order of the day in a Python programmer’s life. In this article, we’re talking about the “urllib.error.httperror: HTTP error 403: Forbidden” when trying to scrape sites with Python and what you can to do fix the problem.
Also read: Is Python case sensitive when dealing with identifiers?
Why does this happen?
While the error can be triggered by anything from a simple runtime error in the script to server issues on the website, the most likely reason is the presence of some sort of server security feature to prevent bots or spiders from crawling the site. In this case, the security feature might be blocking urllib, a library used to send requests to websites.
How to fix this?
Here are two fixes you can try out.
Disable mod_security or equivalent security features
As mentioned before, server-side security features can cause problems with web scrapers. Try setting your browser agent as follows to see if you can avoid the issue.
from urllib.request import Request, urlopen
req = Request(
url='enter request URL here',
headers={'User-Agent': 'Mozilla/5.0'}
)
webpage = urlopen(req).read()
A correctly defined browser agent should be able to scrape data from any site.
Set a timeout
If you aren’t getting a response, try setting a timeout to prevent the server from mistaking your bot for a DDoS attacking and hence blocking all requests altogether.
from urllib.request import Request,
urlopen req = Request('enter request URL here', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req,timeout=10).read()
The aforementioned example sets a 10-second timeout between requests to not overload the server while maintaining good request frequency.
Also read: How to fix Javascript error: ipython is not defined?