Develop algorithm to remove repeating text

€30-250 EUR

Cerrado

Publicado

hace más de 7 años

€30-250 EUR

Pagado a la entrega

Every web site has repeating text on each page. For example, the header and footer, and perhaps a sidebar. Usually the important text on the page is the unique text on the page. For example, if you look at these two websites, you can see there is duplicate text on both pages (mostly at the top and bottom of the pages) which is not important: [login to view URL] [login to view URL] The important text is mostly the unique job description text. I need you to develop an algorithm in Python (you can use a library; it doesn't need to be original code) which is able to detect duplicate text. So if you could imagine we merged the HTML from the two links above into a single document, your code would remove the header and footer (and perhaps some other text) due to it being duplicate text in the document. Any questions, just ask. I am not interested in a Wordpress website. Thanks.

Python

Web Scraping

ID del proyecto: 12155198

Información sobre el proyecto

20 propuestas

Proyecto remoto

Activo hace 7 años

¿Buscas ganar dinero?

Dirección de email

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto

Cobra por tu trabajo

Describe tu propuesta

Es gratis registrarse y presentar ofertas en los trabajos

20 freelancers están ofertando un promedio de €145 EUR por este trabajo

@DanielVizcaya91

Hello there, my name is Daniel and I would love to help you out with this project. I have a lot of experience parsing texts in order to obtain useful information so I think that can be apply here to identify duplicate text. I also have experience with Python, so there will be no problem with that. Finally, as you can check on my profile, I am a very reliable freelancer with excellent reviews on both service and work quality so I really hope to hear back from you soon.

€198 EUR en 5 días

5,0

(111 comentarios)

8,7

@lkhelladi

hello, I'd be glad to implement the desired Python tool for you. Looking forward to chat with you soon for more details. Best regards,

€94 EUR en 2 días

5,0

(205 comentarios)

8,0

@flashsaiful

Hi, I can do this for you. Please send a massage in the PMB for details.......Best Regards flashsaiful

€155 EUR en 3 días

4,8

(134 comentarios)

6,7

@some235one

Hi, I can do this using python. I have done something similar to wikipedia. The exact solution will depend on how many pages you need

€277 EUR en 3 días

5,0

(14 comentarios)

5,2

@adilhussain0411

Hello! My name is Mehnaz Bashir. I am writing in response to your Project. After carefully reviewing the experience requirements and skills required for the job, I feel that I am a suitable match for the job. I have good experience in python, Django,numpy,Pandas,selenium and Json. I am ready to quickly and efficiently perform your task. I am an expert with 10 years of experience In IT sector . I have extensive experience in Python, Matlab, C++,C I'm ready to start doing your tasks immediately. My main areas of expertise are: - Python web scraping - C++,C - C#, - MATLAB - JSON - Java - Programming - BOTS DEVELOPMENT - SQL /MYSQL I scraped more than 200 different websites, using different tools, mostly Python scripts (selenium web driver, regex...) with or without javascript, extracting data from maps and I was placing data in different output formats (csv, xls, mysql database, json...). Also, I created different application using python. i hope to be invited for an interview soon. I'm confident that i can offer you a good quality of service that you have been searching for. I'm looking forward to get a response from you, thank you for your time! Kind Regards Mehnaz Bashir

€30 EUR en 3 días

4,7

(18 comentarios)

5,3

@cracken

Hi, I am competitive to this kind of task, can take good care of this project. In fact, I already done related to this job before. We can use regex and import difflib to compare both data. Let me know the best of your time so we can discuss further based on your requirements and we can move forward to the next step. Thanks, Joseph C Ocero

€249 EUR en 5 días

4,8

(23 comentarios)

4,9

@Gnus

Hey, I can write such code by scraping links, structuring into some tokens and then comparing them. But are you interested in HTML DOM browsing. In your example link that means to scrape everything in this tag: <article class="large-block accordion"> and some other locations. Regards, Georgi

€70 EUR en 3 días

5,0

(4 comentarios)

3,6

@MacJeremy

To whom it may concern, if I understood you well, I take both pages, compare them and everything that is the same would be deleted, and the rest would be merged to one page? I am at your disposal for further questions. Regards, Daniel

€250 EUR en 10 días

5,0

(2 comentarios)

3,4

@phourxx

Greetings, You're looking for a python programmer to develop a Web scraping tool to scrape the details of a job from the website mentioned in the project details. Talking about a perfect match, I am a core python programmer with 5 years of working experience and I have scraped data from popular website like FedEx, so I sure can deliver more than your expectations. The tool should be developed such that other information about a job is scraped along it the description of that, then all scraped data should be exported to a csv file for easier access. These I can deliver in 48hours without disappointment. I'd keep this brief, then we can go into more details when we chat, as I'd love to chat with you over WhatsApp or here on Freelancer to discuss more about the tool, terms and conditions, and to get started immediately. My WhatsApp contact is: + 234 80 625 92 413. I shall look forward to our chat. Regards, Folayemi.

€90 EUR en 2 días

5,0

(1 comentario)

1,8

@VishalGupta8897

I have been learning and using Python for the past two years. I have completed Python Specialization on Coursera by University of Michigan. I've used multiple Python libraries for creating a variety of programs. I'm considerably experienced with BeautifulSoup, HTML Parser and other HTML parsing libraries which I believe will be needed for this project and which I have used myself to make crawlers in the past.

€55 EUR en 5 días

0,0

(0 comentarios)

0,0

@jlmurphysa

Hi, I have a lot of experience building web scrapers and have done plenty of work in Python. I can build the required algorithm (have done a duplicate text detector in the past). Also, if you provide me with the context in which you will use the algorithm, I can implement it for you and make sure it handles slightly differing text the way you need it to. Budget will be adjusted according to this. Feel free to contact me if you have any questions - your messages will reach me on my mobile. Regards, Jared

€88 EUR en 2 días

0,0

(0 comentarios)

2,5

@halilceliksu

i am a python proggramer that have written lots of projects. python is the best language for your targeted project. i will be pleased if you accept. thank you.

€44 EUR en 3 días

0,0

(0 comentarios)

0,0

@ngemzinou

I think I understood what you want. Still not sure what output format do are you looking for ? do you want html output or just text files ? An algorithm for this task maybe not be perfect if the pages layout/tags are not consistent. And if the content of page is well formatted you get just extract the job details directly without worrying about duplicates and so no. I can start on this any time, thank you

€222 EUR en 3 días

0,0

(0 comentarios)

0,0

@jsbot

Can we go with Selenium Java. (You'll get better robot with selenium if language is not concern) We've scraped many websites with selenium. Some of them are rCommerce giant Amazon, Flipkart. For any query on Automation, Scrapping just pings me on Skype: nishchit14 Thank You!!

€222 EUR en 53 días

0,0

(0 comentarios)

0,0

@yuvalkainan

A proposal has not yet been provided

€133 EUR en 3 días

0,0

(0 comentarios)

0,0

@DevoirTechsoft

Hello, We have studied the requirements and found it matches our skills. We are having an enthusiastic team with us having years of experience in HTML, CSS, UI design, PHP+MySQL, javascript, jquery, AJAX, e-commerce, magento, wordpress, shopify, open cart, API integration, XML java, dotnet, e-commerce, shopify, big commerce, opencart, curl, zoey commerce, wo commerce, nop commerce; SEO,SMM, SMO etc. We have done big and complex projects with quality and most of our business comes with repeat hires because of that, so looking forward to build a long term working relationship with you. Let’s have a chat session!!! We are having expertise team to give you very good support. Best regards Shikha

€155 EUR en 3 días