This project is to create a simple Excel VB macro that fetches and extracts doctors surgery details from a website for a given postcode area and saves this information to the spreadsheet.
The macro must provide the means to specify a seed postcode. It will then request all the data from the site (multiple page requests), parse the results and save the data.
The surgery information is available from the following site:
[url removed, login to view]
The search must be done on UK postcodes. Postcodes are of the form:
"xn" (where x is a char and n is 1-9 e.g. "B1")
"xnn" (where x is a char and nn is 10-99 e.g. "B11")
"xxn" (where xx is 2 chars and n is 1-9 e.g. "PO1")
"xxnn" (where xx is 2 chars and nn is 10-99 e.g. "PO22")
"xny" (where x is a char, n is 1-9, y is N/S/E/W e.g. "B1N")
"xxny" (where xx is 2 chars, n is 1-9, y is N/S/E/W e.g. "EC4S")
The site will not allow a search on the main postcode area (e.g. "PO"). It must be on postcode sub areas (e.g. "PO22"). Not all sub-areas are valid. e.g. for "PO" only postcode areas PO1-PO22 & PO30-PO40 are valid.
If an invalid postcode format is supplied, the site prompts to correct (e.g. try searching on postcode "DEF"). The macro does not need to validate seed postcode format. We can assume it will be specified correctly. If there is no data for a postcode, the site returns a page indicating there is no data (e.g. try searching on postcode "PO99").
To search for a particular postcode the URL is:
[url removed, login to view][postcode]
The site returns a mximum of 5 results per page. If there are more than 5 results, page/next links are included at the bottom of the page (e.g. try searching on postcode "PO2"). A simple scheme to capture all data for a given postcode is to add a page number to the URL e.g.
[url removed, login to view][postcode]&page=1
The script should include the facility to specifiy the maximum number of pages retrieved for each postcode e.g. 3 pages would usually be more than sufficient to capture all surgeries in a given postcode area.
Rather than specify a range for the number part of the postcode, the macro should request all possible numbers e.g. if postcode seed "PO" is requested, it should issue requests for all postcodes "PO1"-"PO99". A simple check for the text "was not recognised" in the returned page will indicate there are no surgery details to parse.
In addition, the app/script must provide the option to use the "xny" or "xxny" formats. Only a small subset of UK postcodes use this format. If specified, the app/script should try all combinations. So if postcode seed is "EC" and this option is selected (call it the NSEW option) then the macro must issue requests for all combinations e.g. request "EC1"-"EC9" and "EC1x"-EC9x" (where x is N,S,E and W).
The information required from each page (up to 5 names/addresses per page) can be identified with the following HTML:
Surgery Name is inside <span class="ResultName"><a... >Surgery Name</a></span>
Addess is inside <span class="ResultAddress">Address </span>
Phone number(s) immediately follow <span class="resultdatalabel">. Note that there may be 2 numbers, comma separated.
All extracted information is to be saved wihtout HTML i.e. plain text. The macro must extract up to 5 entries from each page and save them in the following format:
Surgery Name, Address, Town/City, County, Postcode, Phone-1, Phone-2 (optional).
This is a short-term requirement so this does not need to be anything fancy. Just enough to complete the job is sufficient. Only developers able to complete this within the next 2 weeks (preferably sooner) need apply.