This is an interesting long term development job. This task involves submitting data through to a search box and then scraping the end result to produce an XML feed. We need the system to be as generic as possible, but understand the problems associated with scraping. We will be accepting a reasonable bid and expect a fast turn around. This job is longterm, with a project expectation of two weeks duration. There are the following numbers of sites to scrape: 39 Search engine results for affiliates dealing with flights. 10 Search engine results for affiliates dealing with car rental 24 Search engine results for affiliates dealing with Hotels rooms. The website has been structured in a modular basis. Therefore we will be instructing you on what search module to develop and then beta test on our development server. This process will follow for all other search box's. We expect once a generic module has been built, the rest of the process will be reasonably simple, as all you have to do is write the scraping script. We have written permission from all of the website owners, making this web scraping perfectly legal. The server technology is PHP and prefer the scripts to be built using Perl or Python. So please send us a description of how you intend to build the script, in what language and justfy your reasons for doing so. By bidding on this job, you understand that we will need to test the scripts on our development server during the expected two week duration. A contract and non disclosure agreement will have to be completed, in order to protect both parties. Therefore if you don't feel comfortable testing with us on our development server, please send us a message explaining why and a better method. If you PM me, I'll send across to you more information and explanatory code. If successful, we promise further work, as this forms part of a major project. So if you want to specialise in web scraping for us, please bid now!!
1) Scrape website using WWW::Mechanize CPAN module, below are some examples: [url removed, login to view] [url removed, login to view] [url removed, login to view] [url removed, login to view] 2) The results of those search box's need to be transferred into XML feeds. I've attached a document detailing the XML format that has to be outputted. Please ignore it being SOAP. As long as its XML created by Perl/Python, it's upto you how it's outputted. The most important tags are (everything else can be defaulted): CarrierName - Here is the Airline Name FlightNo - This can be the flight number DepartureAirport - Where we're flying from. DestinationAirport - Where we're flying to. DepartureDate - Date plane takes off. DepartureTime - Time plane takes off. ArrivalDate - Date plane lands. ArrivalTime - Time planes lands. SeatClass - Class of seat 1=first, 2=business, 3=economy Currency - Currency i.e. USD CSD GBP EUR TotalCosts - Total costs of say 2 adults. CostsPerAdult - Price per adult. CostsPerChild - Price per child. CostsPerBaby - Price per baby. Taxes - Enter airport tax's incurred. Other - Any other relevant costs (ask me first) 3) The xml feeds are collated into one, whereby we can display the results onto a webpage.
The platform is Linux, with Perl or Python available.