database cleaning/merging/deduplication & fuzzy matching - repost

Cerrado Publicado Jan 23, 2014 Pagado a la entrega
Cerrado

I have a DataBase I'm building (excel) that contains records from many different sources. 77k rows and 50+ columns in total.

I would like to condense it by unique address but keep all the other unique data cells in the rows.

This will require some type of fuzzy matching as the duplicate addresses are not all 100% exact, ie:

300 Water Street suite #3 | Portland | Oregon

300 Water Street | Portland | Oregon

300 Water St | Portland | Oregon

The above examples would all be the same record. Each row may have different corresponding data in the columns that needs to be condensed into one row.

I have normalized the data as much as I can using my limited excel skills and powergrep. I have made sure the states, cities and abbreviations are all consistent for easier duplicate recognition.

I estimate that there is probably 20k actual unique addresses, which is what this should be condensed to, but keeping all the unique cells. making a very rich data set at the end.

I'm not sure if Excel can handle this type of project perhaps you have a better solution using sql or VBA Access or some other db manipulation/deduplication tool.

Let me know via PM how you would best tackle this.

Mecanografía Entrada de datos Procesamiento de datos Excel Word

Nº del proyecto: #5352074

Sobre el proyecto

3 propuestas Proyecto remoto Activo Mar 1, 2014

3 freelancers están ofertando un promedio de $12 / hora por este trabajo

vlagrome

I have done this exact type of exercise with clients in the past. Usually, they'd have dozens of different spreadsheets and the columns weren't all in the same positions. I know of a few quick tricks in Excel to cl Más

$14 USD / hora
(0 comentarios)
0.0
LiquidAnswer

Hello. My name is Jason and I've been working in the IT department of a large US company for the last 20 years. During that time I've created many in-house applications to automate and customize excel data. With you Más

$12 USD / hora
(0 comentarios)
0.0