I need a map reduce iterative program that completes the k-means algorithm. The code must be in java and there must be 3 separate files. A Mapper class file, a reducer class file and a main run job file. The code must be explained line by line. The data file is attached.
Step1: Initially randomly centroid is selected based on data. In our implementation we used 3 centroids.
Step2: The Input file contains initial centroid and data.
Step3: In Mapper class "configure" function is used to first open the file and read the centroids and store in the data structure( use an ArrayList)
Step4: Mapper read the data file and emit the nearest centroid with the point to the reducer.
Step5: Reducer collect all this data and calculate the new corresponding centroids and emit.
Step6: In the job configuration, we are reading both files and checking
if difference between old and new centroid is less than 0.1 then
convergence is reached
repeat step 2 with new centroids.