Scraping Real Estate Website using Python

111
January 15, 2022, at 05:50 AM

I am trying to scrape the MLS Number, Price, and Address of real estate listings from a website using BeautifulSoup.

import requests
from bs4 import BeautifulSoup
# string url
str_url = 'https://www.utahrealestate.com/search/map.search'
# get response
response = requests.get(str_url)
# get html
soup = BeautifulSoup(response.text, 'html.parser')
# get the number of listings and assign it to int_n_pages (I cant get this to work; it returns NoneType)
int_n_pages = soup.find('li', {'class': 'view-results'})
# split and get n pages (this does not work because the previous line does not work)
int_n_pages = int(int_n_pages.split(' ')[2])

Next, my plan is to iterate through all pages and extract the information from each listing.

Something like...

# empty list
list_dict_cards = []
# iterate through pages
for int_page in range(1, int_n_pages+1):
    # get url
    str_url = f'https://www.utahrealestate.com/search/map.search/page/{int_page}/vtype/map'
    # get response
    response = requests.get(str_url)
    # get html
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # get property cards
    property_cards = soup.find_all(class_='property___card')
    # iterate through property cards
    for card in property_cards:
        # empty dict
        dict_card = {}
        # get mls number
        int_mls = card.find(class_='mls___number').text.split(' ')[1]
        # put into dict_card
        dict_card['mls'] = int_mls
        # I would get other info here as well and put into dict_card
        # append dict_card to list_cards
        list_dict_cards.append(dict.card)
# make df
df_cards = pd.DataFrame(list_dict_cards)
# save
df_cards.to_csv('./output/df_dict_cards.csv', index=False)

I am pretty sure the site is attempting to prevent programmatically accessing much of the info it displays.

How/is there away around this?

Answer 1

There is an endpoint that looks like it can be scraped effectively if you make a POST request to it with the right headers after you've visited the home page (probably to have the right cookies in your session. The below example seems to do the trick. This site is very slow, not the script.

import requests
s = requests.Session()
headers = {
    'Accept':'application/json, text/javascript, */*; q=0.01',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',
    }
home = 'https://www.utahrealestate.com/search/map.search'
step = s.get(home,headers=headers)
headers =   {
    'Accept':'application/json, text/javascript, */*; q=0.01',
    'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
    'Host':'www.utahrealestate.com',
    'Origin':'https://www.utahrealestate.com',
    'Referer':'https://www.utahrealestate.com/search/map.search',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',
    'X-Requested-With':'XMLHttpRequest'
    }
for page in range(1,5):
    url = f'https://www.utahrealestate.com/search/map.inline.results/pg/{page}/sort/entry_date_desc/paging/0/dh/862'
    data = s.post(url,headers=headers).json()
    results = len(data['listing_data'])
    print(f'Scraped {results} results from page {page}')
Rent Charter Buses Company
READ ALSO
sql query to search string has special characters using mysql

sql query to search string has special characters using mysql

I would like to know how to write sql query for search string that has special characters

130
I Want to Create a Cached Version of a Website in Nginx

I Want to Create a Cached Version of a Website in Nginx

I have a website A whose content gets updated every minute(Its content is related to stock price)However, because displaying the content of website A requires expensive computation, I want to create a website B that shows a cached version of the website...

68
How to setup a bulk action in the woocommerce order list

How to setup a bulk action in the woocommerce order list

I am working on some new code for my website but sadly I hit a roadblockUsing the following question I was able to setup some code in preparation for my other functionalities

139
Spring webclient is not logging error response and performing consumer action on receiving error

Spring webclient is not logging error response and performing consumer action on receiving error

I have a spring boot service where APIs are exposed via RouterFunctionOnce the API request is received , certain validations are triggered

128