I need to get the title for a news article using Selenium

109
January 29, 2018, at 09:53 AM

I'm using Selenium webdriver to get the html from www.cnn.com Currently, I'm able to get the headlines from cnn but I was wondering if I could save the content into a text file and then search for specific headlines to print out.

My python code:

from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.cnn.com')
content = driver.find_element_by_xpath("html").text

Can anyone help me?

Thank you.

Answer 1

Selenium won't help if you are reading from text file as its APIs work in browser context using webdriver protocol.

If you want to save the whole html content in to a text file and then read headlines, you can make use of BeautifulSoup module. Here's an example.

with open("htmlcontent.txt") as f: 
  html_data = f.readlines() 
soup = BeautifulSoup(html_data, "html.parser")
for elem in soup.select("h1"):
    print(elem.get_text()) 
Answer 2

BeautifulSoup is definitely what best fits your case. But if you want to use Selenium, you could loop through headlines and extract their text using the Selenium driver (not from the file directly).

Looking at CNN's website, cd__headline-text is the class name applied to headlines, so you could get them like this:

from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.cnn.com')
for headline in driver.find_elements_by_class_name('cd__headline-text'):
    print(headline.text)

Output:

Asia's strongmen adopt 'fake news' to slam media
Fitness tracking app reveals info on remote military bases
Seven rescued, 43 missing week after ferry sinks in Pacific
...

READ ALSO
Google OAuth with Django

Google OAuth with Django

I am following How to sign in with the Google+ API using Django? to use Google Sign In for my Django appI am on the step that says

233
Numerical input to a range

Numerical input to a range

In Python I would like to take a numerical value input and put it into a rangeThe goal is to iterate a question multiple times based on the number entered in the original input

149
how to get this fib sequence to start at 0?

how to get this fib sequence to start at 0?

so I have this sequence running from 1 to 21but I need it to start at 0

124
How to show a np polynomial in a string?

How to show a np polynomial in a string?

I'm new to Python and I was wondering if anyone knew an effective way to display a poly1d polynomial in a string (to be put in tkinterLabel), since I can't add a numpy

177