Cleanup names - repost

Completado Publicado Jan 22, 2014 Pagado a la entrega
Completado Pagado a la entrega

Round 2 pending- Round 1 already complete.

I have a list of 50,000 names, with IDs associated to each name. This is the output of a computer program that tried to IDs to names. Similar names have been clubbed together by the program, however there are substantial errors as well.

I have sorted the file in excel by lastname, then by firstname and then by ID. I need someone to go over this sorted file once, and:

1. Assign a status of 1 to name spellings that are dissimilar, but have the same ID. This will be in a new column "Status".

Example before processing:

ExistingID Lastname FirstName

32 WAY JAMES CREIGHTON

32 WESEMAN JAMES C

32 WILSON JAMES C

32 WRAY J C

After processing, the names should look like:

ExistingID Lastname FirstName ManualID Status

32 WAY JAMES CREIGHTON

32 WESEMAN JAMES C 1

32 WILSON JAMES C 1

32 WRAY J C

The above indicates that lines 1 and 4 can continue with the same ID(32), but lines 2 and 3 need to be assigned fresh IDs.

2. Assign a status of 2 to name spellings that can easily be seen to be belonging to some other ID that is similar to the current row. Also make note of the new ID to which it should be changed:

example before processing(below are two consecutive entries in the file, I expect the person working on this to remember atleast the last 50 lines and match):

ExistingID LastName FirstName

647 AAGAARD ERIC J

4154 AAGAARD ESQ ERIC J

As can be seen, the two names are practically the same(and this requires some knowledge of American names), so I expect the larger ID to be reassigned the smaller ID, and the entries after processing should look like this:

ExistingID LastName FirstName ManualID Status

647 AAGAARD ERIC J

4154 AAGAARD ESQ ERIC J 647 2

The above indicates that similar names already exist(status=2), and 4154 should be reassigned to 647.

Another example (this time, different IDs due to a misspelling) before processing:

3685 ACKERMAN JEOL G

3052 ACKERMAN JOEL C

The vast majority of errors are expected to be of the second type. I expect the person taking up the work to submit as a sample the output of processing the 1000 lines in the attached file as a test of his/her skill. I would, after the work is complete, also undertake a random sample of another 1000 to make sure there are no huge errors (less than 50 in the sample of 1000) before releasing payment. The work needs to get done in another 2 weeks. Please respond if interested with the completed sample work.

Procesamiento de datos

Nº del proyecto: #5346306

Sobre el proyecto

3 propuestas Proyecto remoto Activo Jan 22, 2014

Adjudicado a:

USCgrad

Hello. As you know I worked on part 1 of this task and I believe it would be best if I did part 2 as well. I have proven myself in Round 1 and will do a good job on part 2.

$50 USD en 10 días
(3 comentarios)
3.7

3 freelancers están ofertando un promedio de $62 por este trabajo

Kalpanasekhar

Hi There, We are ready to start working on this project! We clearly see your requirement of cleansing the data for duplicates and errors. We are 100% sure we can do this job. Please send the first sample, So Más

$105 USD en 1 día
(14 comentarios)
4.4
kamal20

Sir, < My Goal Is Your 100% Satisfaction. minimum rate but quality Guaranteed > I've read Your full descriptions . It so clear. ready with my 20 member's special data Processing team. fast and accurate service gu Más

$30 USD en 1 día
(4 comentarios)
2.3