Text processing system, Search engines query. Java/Perl

Cerrado Publicado Mar 2, 2007 Pagado a la entrega
Cerrado Pagado a la entrega

Hello!

I am looking for a team of Java programmers to create a very special application for me. Since this is a big project and I need it done quite fast I am only looking for team of coders and not for individual freelancers.

In your bid you MUST provide UML diagrams, design and architecture of this application and some of YOUR ideas how you will develop this program. Bidder with best suggestion for development will be selected.

Short about this task:

I need a program that will take text document as input (Microsoft Word, RTF, TXT) and after some processing give a report of how much of the text in the document are directly or partially copied from websites on the internet. Main engine will be in Java, search/request engine will be a open source lib WWW::Search (Perl), front end in PHP but it is not needed in this bid request.

Demo version of working sites that does what this application should do will be provided. It if fully working and are there for testing. How ever source code are not available.

## Deliverables

Hello!

I am looking for a team of Java programmers to create a very special application for me. Since this is a big project and I need it done quite fast I am only looking for team of coders and not for individual freelancers.

In your bid you MUST provide UML diagrams, design and architecture of this application and some of YOUR ideas how you will develope this program. Bidder with best suggestion for development will be selected.

Short about this task:

I need a program that will take text document as input (Microsoft Word, RTF, TXT) and after some processing give a report of how much of the text in the document are directly or partially copied from websites on the internet. Main engine will be in Java, search/request engine will be a open source lib WWW::Search (Perl), front end in PHP but it is not needed in this bid request.

Long version:

Take text document as input. Convert it to a internal format, strip all formatting leave only text behind. Split document in to a smaller chunks of 5-6 sentences. “Google?? sentence after sentence using popular search engines. Collect most relative results, store them, sort and filter using some kind of rating/sorting system. The main idea here is to separate not interesting sites provided by results in search engines from interesting sites. Interesting sites are webpage that has complete sentence as it is written in provided document. If that page has 2 sentence’s from our chunk it gets higher rating and so on. After this filtering and processing of results provided by search engines we will end up with information like this:

Input chunk: I was walking around the house today. I love my house. My house is so great when its pink. I painted it pink yesterday.

Original: I was walking around the house today.

Onling: I was walking around the house today. (100%)

URL: [url removed, login to view]

Original: I love my house.

Online: I love my big house. (80%)

URL: [url removed, login to view]

Original: My house is so great when its pink.

Online: My house is so great when its pink. (100%)

URL: [url removed, login to view]

So the idea here is to process document to get a report from where part are document was copied from, what percentage are original (not found online) and what part are not (found online).

Some VERY important things:

This application must be done totally by the book. UML, use-cases on level of the University must be provided. Complete and full description of the architecture and all necessary diagrams must be provided. If you don’t use to that kind of detailed documentation, please do not bid.

Demo version of working sites that does what this application should do will be provided. It if fully working and are there for testing. How ever source code are not available.

----

----

----

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

Java Perl Linux

Reescritura de artículos Redacción Ingeniería Java MySQL Odd Jobs Perl PHP Redacción de investigaciones Arquitectura de software Verificación de software Escritura técnica

Nº del proyecto: #2864998

Sobre el proyecto

3 propuestas Proyecto remoto Activo Mar 23, 2007

3 freelancers están ofertando un promedio de $567 por este trabajo

juventustech

See private message.

$637.5 USD en 30 días
(8 comentarios)
4.7
adriantarau

See private message.

$637.5 USD en 30 días
(0 comentarios)
3.5
indigodavid

See private message.

$425 USD en 30 días
(0 comentarios)
0.0