Amazon webscraper

September 22, 2023 (1y ago)

0 views

Like so many others, I thoughtlessly browser amazon, so I thought I'd make an (amazon webscrapper) that will do it for you, and extract it into a nice format.

Ideally, this can be optimized and better utilized to be ran on a VPS with a cronjob to run it every day, and then send you an email with the results, or better yet, send you results when there's something you like on sale.

scraper.py, In method get_page_content, retries were added to make a valid connection with amazon servers even if it connection request was denied.

function -> [get_request], returns None when requests.exceptions.ConnectionError occurs and ripples its way down to calling functions to terminate the thread normally instead of abruptly calling sys.exit() which surely will kill the thread but if the thread being killed holds GIL component, in that case it will lead to Deadlock.

function -> [get_page_content], if no valid page was found even after retries it returns None in addition to returning None for Nonetype response from get_request.

Decision number 2 and 3 were made keeping in mind that in a multithreaded program, multiple threads are working simultaneously, while doing that there may be a case where 1 or 2 out of 10 or 20 threads does not get valid response (Please check [check_page_validity] and [get_request] function for documentation and more), then we terminate only those threads safely while others work to produce the valid output.

In the future, this would be improved to add a database to store the results, and then send an email when there's a price drop on an item you like. It would also store it, and utilize multiple proxies to scrape the data, and then send you an email with the results.