Đã Hủy

URL Detection System

Require a simple but effective URL detection web (PHP) site/service to be built.

The service will take a pre-formatted CSV file that contains a listing of businesses with the base level information:

Business Name, Category, Street Address, Town/City Name, Postcode(ZIP), State, Phone Number

"Category" will be a basic description of the business line of trade (like Hairdressers) which might be a beneficial additional piece of information to help limit detection results.

The service will take this list and then for each entry in the list it will create a list of up to 30 'candidate' URLs that belong to this business:

1. Attempt to directly 'guess' the most likely domain name based on the business name (using a couple of simple rules such as removing plural 'florists' > 'florist', removing common words like 'limited', etc.)

2. Connect to the top 3 search engines (Google, Bing, Yahoo) and conduct a couple of searches (using the name individually and then with the phone number/city name and other potential combos), to capture the top 5 - 10 URLs returned that most likely match this business

Then, based on this candidate list of URLs for the business, the service must connect and scrape each detected website and try and match the range of fields provided (Name, Address, Postcode, Phone Number, etc.) to appearing in the text somewhere on the candidate website.. depending on how many of these values are detected on the candidate URL will add to a 'confidence' score which attributes how likely that particular website URL belongs to the business in question.

Some intelligent parsing of the incoming data (ie. we may be able to input Post Office and Physical Address details for multi-address checks/matching, phone numbers will be provided with bracketed area codes and these can be optionally stripped for additional phone number checks etc.), as well as scrape results, will yield the most effective outcomes.

The return (or saved to disk) results will then be a return list of the businesses, along with all the candidate URLs, and the confidence rating attributed to each.

Based on the results of your diligent work, you may consequently win a second following project which is to expand this into a much larger ($3k-$5k) and comprehensive interface that allows interactive viewing of the candidate URL data returned, selection of the site to scrape, and then complex parsing of scraped data (images, text blocks, external feeds, etc.) The procession to that phase depends on how well this first stage is built and the kinds of accuracy/depth of results it achieves.

FYI there will be a simple 'black list' of URLs that the detection (through search engines) should ignore - we will load this up with a list of common internet directories which can easily be mistaken as a website home page (because they list the business details as part of their directory)

You will deliver this project in your own hosting environment and allow us to upload some sample (50-100) record sets to see the tool's efficiency prior to commitment and payment (at which time you will then release the full source code for us to host in our own environment).

Happy bidding!

Kỹ năng: PHP, Web Scraping

Xem thêm: web search engines list, top work home businesses, top codes, sample system using php, release domain name, post office data entry, parsing input, interactive data entry, create directory using php, home office php, expand url, depth limited search code, depth first search code, depth first search , depth first, csv codes, candidate matching, black list yahoo, home businesses work, yahoo web hosting, yahoo business hosting, multi phone line system, business efficiency, part time input data, home based business list

Về Bên Thuê:
( 38 nhận xét ) Seaford, Australia

Mã Dự Án: #1040367

3 freelancer đang chào giá trung bình $800 cho công việc này

priboy

Please check PMB

$1200 USD trong 10 ngày
(9 Đánh Giá)
5.6
balagesoft

please check your pmb.

$499 USD trong 15 ngày
(6 Đánh Giá)
4.0
temkin

our team ready for this job

$700 USD trong 12 ngày
(0 Đánh Giá)
0.0