Crawler for a real estate web site ([login to view URL]). There are about 200.000 properties which should be downloaded. They are all spread accross about 20 specific url's, each respresenting a different category of real estate. Result page contains 15 properties per page. Result pages are same for all different categories.
For each property the following data should be stored:
- address (street + number, zipcode, place)
- price
- space
- nr rooms
- category
Parsed data should be stored in a csv file.
[
][1]
## Deliverables
Some examples of specific url's:
<[login to view URL]>
<[login to view URL]>
<[login to view URL]>
<[login to view URL]>
As mentioned before, there are about 200.000 properties on this site. On the results page there are max 15 properties per page. This means the total number of pages is more then 13.500 (200.000/15).
Downloading one for one would about half a day (if a page is downloaded every 2 seconds). Look at the possability to download more then 1 page at a time, and if possible make this ajustible for the user (input box where number of pages to be downloaded can be filled).