I need a web scraper for the site [[login to view URL]][1]. I want all their data every day, tab delimited, with the first record being the field names. There are 12 fields for each property, and about 2 million properties total.
The first step is to enter a zipcode (try "92705") and click "GO".
Next, click "Show Properties". Then, for each property in the list, grab the following fields: Address, City, St, Zip, MLS ID#, Price, BR, BA, Sqft, and Property type (the first words in the description under the picture: e.g. Rental, Single Family Property, Multi-Family Property, Farm, etc.).
Next, I want to know the Lot Size (in sqft, e.g. 4000), and the Garages (e.g. "2", for a two car garage). This is sometimes under the picture, and sometimes you have to click "view details". Other times it's missing.
The program would read a file of zip codes (or allow me to input and reuse a list) and do them in the order I specify, like: 92705,94608,...
To complete the scrape in 4 hours, I want to run about 6 instances at once on different zip code lists, to create 6 output files. I want to specify a delay between each hit to slow it down, in case the webmaster gets mad.
Finally, I want it to talk HTTP to a socket and run in the background, NOT piggyback on Internet Explorer.
The first phase of this project ($100) is to get something that does what I specified above. The second phase ($100) is to tweak it with you so that it meets my performance and usability requirements. And then the third phase ($100) is a turnkey installation process that takes less than 3 minutes to install and configure on a new machine, and gives me the ability to run as many copies as I want simultaneously without any unexpected problems.
This is the first of a long series of similar projects. If you do a great job on this and I like the program, I would like to do a whole bunch of them.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Windows XP