I needed a Data extraction bot. A good part of the functionalities I need are already in the open source software irobotsoft. A customization of this, more appropriate script or one build from scratch will be welcomed. The bot should be able to do the following:
1. Extract all applicable fields and entries given a list of search keywords
2. Extract all entries plus sub entries given a list of links
3. save content in a local MySQL database and CSV
4. Remove duplicates from the final results
5. Sort out items missing with extraction using keywords search vs extracting with respect to given links
Fields to be scraped can be selected by auto-record and can be adjusted manually to improve accuracy. The fields to be scrapped will differ per website. During the selection of the field the user will indicated to which column in the dataset the fields belong. The keywords to be used for the search will be provided as a list.
A good part of what I need exist in irobotsoft or other extraction tools. I need someone who knows the tools or can build one to customize it for my use. Please let me know if you have any questions, suggestions or need further information.