Webcrawler / Spider - Data Extraction
We need a webcrawler / spider that can collect the technical specifications of a particular product
•In essence we will want to input a name and or model number of a particular product and the spider should extract the technical specifications from multiple websites (10-20), you may want to query Google first for the top 10-20 results and then crawl those sites. The number of product could range from 100 to 1000's at a time and we should be able to upload the list with a csv or similar.
•The next step in the process is some level of “fuzzy logic” that will compare the specification names/fields and identify a tolerable level of similarity between the different results and that will be the field label for that particular feature/ specification. i.e. there are generally key technical specifications always mentioned for a particular type of product for example: megapixels for digital cameras.
•The next step is to apply similar same fuzzy logic for the actual specifications themselves as often webmasters don’t always post data accurately or completely and leave some specs out.
•All the data should then be stored in a database that is searchable. The data should be presented in a tabular format.
•Where possible the pdf’s with the technical specifications and or user manuals of the said product, a URL should be supplied by the application, the source URL’s of the data should be included as well
•Our preference is for a web based solution using open source such as php and mySql . The application must be secure and scalable.
•We will require a web based front end to display the results to users, so integration into a CMS such as Wordpress or Joomla would be preferable.
We have many ideas of the logical flow of achieving the above as well as the bigger picture to this entire project, however this will be shared with those short listed as potential suppliers. The code must belong to us and you must be prepared to sign a NDA.
This is the initial project and based on the success of the project there will be ongoing enhancements and features required. Please make sure to read the above properly and send through any questions you have as well as constructive responses.
Hi,
We have designed and built websites for various types of businesses very effectively. We work with all of our clients individually to easily coordinate and to keep track of the requirements and scope. We fulfill your vision of getting the website built with the desired output. We offer long term reliable and valuable solutions to all our clients meeting their exact requirements. We have an integrated team of web professionals, creative designers to get your website done on time. We show references for entire satisfaction and assured quality. Looking forward to hear from you.
Best Regards
phpMaestro
%__%