How do I get the HTML of a website using Python 3?

363
December 10, 2016, at 11:50 AM

I've been trying to do this with repl.it and have tried several solutions on this site, but none of them work. Right now, my code looks like

import urllib
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
print (urllib.urlopen(url).read())

but it just says "AttributeError: module 'urllib' has no attribute 'urlopen'".

If I add import urllib.urlopen, it tells me there's no module named that. How can I fix my problem?

Answer 1

The syntax you are using for the urllib library is from Python v2. The library has changed somewhat for Python v3. The new notation would look something more like:

import urllib.request
response = urllib.request.urlopen("http://www.google.com")
html = response.read()

The html object is just a string, with the returned HTML of the site. Much like the original urllib library, you should not expect images or other data files to be included in this returned object.

The confusing part here is that, in Python 3, this would fail if you did:

import urllib
response = urllib.request.urlopen("http://www.google.com")
html = response.read()

This strange module-importing behavior is, I am told, as intended and working. BUT it is non-intuitive and awkward. More importantly, for you, it makes the situation harder to debug. Enjoy.

READ ALSO
How to send Email or SMS using Asterisk AGI

How to send Email or SMS using Asterisk AGI

I need to send SMS or email via Asterisk IVR

421
BeautifulSoup tag is type bs4.element.NavigableString and bs4.element.Tag

BeautifulSoup tag is type bs4.element.NavigableString and bs4.element.Tag

I'm trying to scrape a table in a Wikipedia article and the type of each table element appears to be both <class 'bs4element

616
Python - Reverse os.path.basename

Python - Reverse os.path.basename

Is there any way to reverse ospath

482