Capture meta data for a list of urls from tab delimited input file and create output file that contains the fields in the original input file plus the additional meta data fields. Both input and output file have to be tab/none delimited.
Details described in description. Sample input file is attached.
Input tab/non delimited file has records with 7 fields (a,b,c,d,e,f,g). Field "f" has the url that I want to capture meta data page title and meta data descriptions.
URL could be in the form of:
1. [http:///[url removed, login to view]]
2. [url removed, login to view]
3. [url removed, login to view]
Some input records will not have any URL in column "f", these records should be discarded and not written to tab/none delimited output file.
If url from column "f" can not be found on web, then the record should be discarded and not written to output file. A error output file should be created that lists the url(s) if url from column "f" can not be found on web.
Output tab/none delimited file will have all of the input fields (a,b,c,d,e,f,g) and
two additional columns (h and i ). Column "h" is for meta page title and column "i" if for meta page description. Maximum length of meta title is 200 bytes.
Maximum length of meta description is 300 bytes.
Sample input file is attached.