The job is to write a script to process a long file, consisting of email messages, and extract information.
The script will convert a set of emails from a payment processor into a matrix. Two sample emails are in the attached image. The needed fields are yellow and orange.
INPUT:
Concatenated emails. (Samples follow in next email.)
We will provide a sample file of approx 400 emails.
STEP 1: Parse set of emails (from a flat file), and extract the following fields:
* Transaction Order ID
* Price each
* Quantity
* Description
* Billing name
* Billing address (everything except email)
* Billing email
* Attendee names
STEP 2: Separate attendee names & emails
* Use the quantity of emails from step 1
* Use various natural language processing methods & patterns to parse the "attendee names" filed into: attendee name, and attendee email. Note that this field not NOT standardized. Each user specified this differently.
OUTPUT:
CSV file with all fields.
Perl is probably the best language for this.