This project involves devising a method to match inconsistently coded data. We have a dataset of construction site inspections. Entries describing the location of the same work site are often recorded differently. For example, one entry might have an "address" cell recorded as 123 Elm Rd while another entry might be recorded as 123 Elm Road. In other cases, the same company's "company name" cells might be recorded differently. For example, Acme Inc. might be misspelled as Amce Inc. in one entry. We would like to devise a program to match inconsistently coded entries by comparing them to every other entry. A successful match would occur when there is a high probability that the two entries are actually one and the same. This must be an automated process because our data set contains around 1.5 million observations.
Thank you for all of the responses. I am going to provide a limited sample of the data which I hope will allow for some clarification.
18 freelancer đang chào giá trung bình $235 cho công việc này
We will doing all that you want (and more... :-))). Quickly, Professional, Quality - our answer you and your organization. We work more than 10 years.. There are questions?