Precondition: MySQL table populated with yesterday's data
Postcondition: MySQL table populated with today's data
There are many duplicate entries between yesterday and today (millions). The script I've used to perform part of this update no longer runs on my dedicated box b/c of memory issues. It is attached as an example.
1) A script/app that runs on FreeBSD to take a flat text file of 65,000,000 rows, remove duplicate lines and produce a file similar to what diff would create: a list of additions and subtractions from the data set.
2) A script to make the most optimal INSERTs and DELETEs on a MySQL table given the above list of additions and subtractions.
3) A MySQL table structure capable of efficiently handling 5+ million rows in a way that would allow for querying using AJAX in near realtime.
I will specify fields, send a sample data set and additional requirements for step 2 (minimal additional database flags), I'm mostly concerned with the low memory footprint for part 1 and the indexing of the data structure in step 3.