I don’t normally talk about personal life, however I have been in the market for a new Ski Boat as my family and I are avid Skiers and Wake boarders. In this pursuit I have found that while Craigslist is an awesome tool for finding a boat, spending hours manually searching the listings is not my cup of tea. So with that said, I found myself looking for a way to automate the process and Python seemed to fit the bill. While I am certainly not an expert in Python, I was able to find many examples online that helped me quickly put together a script that would automate my entire search process. What I needed from the script was the ability to crawl Craigslist to search for listings that matched my criteria and then store the listings in a database and email the results. Below is what I came up with, hopefully this will be as useful for others as it has been for me. In order for this script to run on Debian or Ubuntu, you will need to ensure you have the appropriate python modules installed as seen in the command below.
# Install required python modules apt-get install python-mysqldb python-feedparser python-mailer
Now that we have the required per-requisites installed, lets move on and get our script configured. First, you will need to edit theĀ ### Database Initialization ###
section with your MySQL or MariaDB Server info. Next, you will need edit the ### Search Parameters ###
section with the information you are searching for. Lastly, you will need to edit the ### Email Parameters ###
section so you are not sending me your search results. Note, that after running the script the first time, you can safely remove the database creation area annotated by ### Database Creation, remove after first run ###
.
#!/usr/bin/env python ''' clscraper-mysql - Craigslist Scraper w/MySQL Copyright (c) 2014 Matthew Marable Email: mmarable@oitibs.com Web: https://oitibs.com clscraper-mysql is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. clscraper-mysql is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with clscraper-mysql. If not, see <http://www.gnu.org/licenses/>. ''' import sys import time import urllib import datetime import feedparser import MySQLdb as db from mailer import Mailer from mailer import Message ### Database Initialization ### con = None con = db.connect(host="[Host]",user="[Username]",passwd="[Password]") cur = con.cursor() ### Database Creation, remove after first run ### try: cur.execute("""CREATE DATABASE IF NOT EXISTS `clscraper`""") cur.execute("""USE `clscraper`""") cur.execute(""" CREATE TABLE IF NOT EXISTS `LISTINGS` ( `SAID` INT(11) NOT NULL AUTO_INCREMENT, `URL` TEXT NOT NULL, `TITLE` TEXT NOT NULL, `MODIFIED` TEXT NOT NULL, `NEW` INT(1) NOT NULL DEFAULT '0', PRIMARY KEY (`SAID`))""" ) except db.error, e: print "Error %s:" % e.args[0] sys.exit(1) ### End Database Creation, remove after first run ### ### Search Parameters ### sections = ["boo"] sites = ["smd","annapolis","baltimore","washingtondc", "frederick","fredericksburg","westmd","delaware", "easternshore","martinsburg"] terms = ["malibu","moomba","nautique","supra","mastercraft", "centurion","american skier","tige","brendella","axis", "ski supreme"] ### Craigslist RSS Search URL ### rss_generic_link = "http://%s.craigslist.org/search/%s?query=%s" ### Generate the RSS links ### rss_links = [] modified = datetime.datetime.now() html_text = "" count = 0 for section in sections: for site in sites: for term in terms: term = urllib.quote(term) rss_link = rss_generic_link % (site, section, term) rss_link = rss_link + "&format=rss" rss_links.append(rss_link) for rss_link in rss_links: sections = feedparser.parse(rss_link) for section in sections.entries: title = section["title"] url = section["link"] cur.execute(""" SELECT MODIFIED FROM LISTINGS WHERE TITLE = %s""", (title,) ) if cur.fetchone() == None: count = count + 1 html_text = html_text + "<a href='" + url + "' >" + title + "</a><br><br>" cur.execute(""" INSERT INTO LISTINGS (URL, TITLE, MODIFIED, NEW) VALUES (%s, %s, %s, %s)""", (url, title, modified, 1,) ) con.commit() con.close() if count > 0: ### Email Parameters ### message = Message(From="Search Automation <tech@oitibs.com>",To="mmarable@oitibs.com") message.Subject = "New Listings (" + str(count) + ")" message.Html = html_text.encode('utf-8') sender = Mailer('[SMTP Server IP]') sender.send(message)
Once you have edited the script to your liking, you can schedule the script to run at intervals using your crontab.