I am looking for a help finding and setting up a fully hosted service (similar to [login to view URL], [login to view URL]) which will scrap data from a website and save it into CSV files or DB. I should be able to access this CSV files or DB when I log into the hosted service.
****What i need?****
I need to save data (that is ready in a API generated format) from 10 links:
[login to view URL]
remaining 9 links are:
curr1= curr2=
USD PLN
GBP PLN
CHF PLN
EUR USD
GBP EUR
EUR CHF
GBP USD
USD CHF
GBP CHF
for example:
[login to view URL]
etc...
which is in a format:
{"transactions":[{"rate":"4.2205","amount":"4.73","date":"1440884371","dateText":"4 minuty temu"},{"rate":"4.2205","amount":"195.25","date":"1440883443","dateText":"20 minut temu"},...,{"rate":"4.2202","amount":"120.00","date":"1440879931","dateText":"1 godzina temu"} ]}
From which I only need extract: 3 columns x 10 rows of data
rate, amount and date i.e.
4.2205, 4.73, 1440884371
4.2205, 195.25, 1440883443
(....)
4.2202, 120.00, 1440879931
I need to save this data into 10 separate CSV files EUR_PLN, USD_PLN etc or a Database
Since files can get large, the name of the file can change monthly say, EUR_PLN_201508, EUR_PLN_201509
This CSV files (or data from a DB) should be available for me to download from a hosted service.
****How often does the data need to be scrapped?****
All 10 links should be scrapped and data saved to the CSV file EVERY MINUTE.
****Does the data need to be processed?****
Ideally yes, please note that 3rd column is a time stamp, and data between each instance of scraping might not change, or only the newest row(s) of data is/are updated
Hence i only need unique entries to be saved into the CSV, at the bottom of the file.
For example lets assume there are only 3 column x 3 rows of data
4.2205, 4.73, 1440884371
4.2205, 195.25, 1440883443
4.2202, 120.00, 1440879931
At time t this data should be saved into the CSV in REVERSED order
4.2202, 120.00, 1440879931
4.2205, 195.25, 1440883443
4.2205, 4.73, 1440884371
Now at Time t + 1 MINUTE there are anly 2 rows of data that are new (please note that last 2 rows are no longer visible, as only most recent 3 rows are visible)
4.2211, 4.81, 1440884673
4.2210, 4.80, 1440884572
4.2205, 4.73, 1440884371
As there are only 2 new rows of data is 4.2211, 4.81, 1440884673 and 4.2210, 4.80, 1440884572 only those 2 should be added to the CSV file - please note that they should be added in a REVERSE order.
After 2 instances of scraping the following 5 rows should be saved in the file
4.2202, 120.00, 1440879931
4.2205, 195.25, 1440883443
4.2205, 4.73, 1440884371
4.2210, 4.80, 1440884572
4.2211, 4.81, 1440884673
Similar process should be applied to remaining 9 links.
****Is data above processing of data required or optional?****
If the data cannot be processed as described above i am also ok with all data being saved into CSV file (3 columns x 10 rows) every minute, without distinguishing new data.
****Which solutions are acceptable?****
I am looking for a FREE hosted service (probably [login to view URL] will work or [login to view URL]) that will be able to do the above, scrap 10 links EVERY MINUTE and save data to CSV file that I will be able to download myself
Google Docs, doesn't refresh data correctly every minute - so is NOT acceptable
A small VBS JAVA script that I will be able to run on my Windows server every minute via task scheduler is also acceptable (the code has to be open source, ie not compiled)
***What am I expecting?***
In case of a hosted service, i am looking for a step-by step instruction how to configure it
In case of a script, I am looking for such a script to be sent to myself.