Dear PatRoach01,
I think the strategy is to use script to look for key address words as human would. Excel is not powerful enough to do lookup of such inconsistencies. Other than wrong columns, uppercase (easy to solve), your data have problems such as "UNIT OF3."
I will show you that my approach should yield the highest hits.
1. Correct the columns. Currently, the specific address file is mixing up Address3, Address4, and Post code (town info in wrong column sometimes). This can be easily done by working the columns backward.
2. Concatenate Address1+Address2+Address3 and AddressLIne1+AddressLIne2+Locality.
3. Eliminate common words, such as Street, Road, Unit, and duplicates, etc. Set to lower case. Now we have "cleaned" and relevant working address phrases A1s and A2s to compare.
5. Now the tough part. Use script to start from each A1. For each word, look into all of A2, filter by the same Post Code (targets). For each word it finds, score 1 point. Divide points by number of words in A1 for accuracy. Rank all "target" scores for this A1. Accept the top score, but only if score is >40% (top ranking, plus at least 2/5 words match). We can change this threshold.
I've worked with voter's rolls checking for possible frauds, errors, and duplication, as well as Web scripts. I'm a financial analyst strong in Excel formula, but I think Excel alone is inadequate in this case. I say 5 days to ensure enough time for quality control.
Thanks for the opportunity,
Regards,
Jason