What we want to scrape are the hotels from this site:
Please take a look at:
[login to view URL]
&
[login to view URL]
&
[login to view URL]
&
[login to view URL]
&
[login to view URL]
These are 4 different formats I have encountered so far.
The first one only shows the hotel name:
<div class="BEHeader">Zekes Test Hotel - Check Availability</div>
Store "Zekes Test Hotel" and the variable Hotel=1
The second one shows much more information:
<title>The Colonnade Hotel - Reservations</title>
<p class="footer"><strong>120 Huntington Avenue · Boston · 02116 · Phone (617) 424-7000 · Guest Fax (617) 424-1717 · Reservations Fax (617) 425-3222</strong></p>
<a href="[login to view URL]">Home</a>
And store the variable Hotel=2 as well as the other information here. You can filter out the HTML as I don't need that.
The third case shows hotel:
<title>Luxury Boston Hotels - Boston Harbor Hotel - Reservations</title>
But it also has the keyword "inactivehotel" on the page. I need that info and the Hotel=3 variable.
The 4th case shows nothing.
Just store the Hotel=4 variable and N/A for Hotel name.
For the 5th case:
[login to view URL]
You need to find the link: [login to view URL] Maybe you can find a consistent way to do this with all your experience.
Along with the name usually found in the title: <title>Historic Hotels of America - Reservations</title>
Always store the Hotel=70 variable with each record.
I am not certain how many exceptions there are. I have checked quite a few of these and I think I covered most of these.
There don't seem to be any past Hotel=29000 So we can stop there.
Please retrieve the data and deliver the file in a | separated text file as well as the perl code you used.
Also, please let me know if you have any questions regarding this project. Thanks