Đã Đóng

Speedy high volume web page scraper

I have a software product that reads online text and creates a detailed profile (a profile is then compared to other profiles and recommendations can then be served).

The profiling engine is a single-server Java application that is served off Tomcat. It has a REST API.

Up till now, the profiles have reached my server via full text RSS feeds, or XML files (that I then create a custom parser for in Java).

I now have a project where I will receive a high volume of urls (around 80,000 arriving during the course of the day) and will need to 'scrape' the text off these pages before passing this to the profiling engine.

For this development operational speed is very important and so the 'scraper' needs to be fast acting in order to handle the perceived transaction volume but also accurate enough so that most of the page 'junk' does not affect adversely the profile that is made.

Ideally the web scraper will take the page 'title' and 'article' text and use these for profiling.

However, there will not be a standard format for these pages and so the web scraper needs to be fairly generic too.

Get in contact if you feel you can achieve this but please you must have experience in this field!!

Kỹ năng: HTML, Java, PHP

Xem thêm: web page format, web page development online, web development in java, tomcat rest, rss java parser, project title for web development, profile web application, php or java for web development, order web development, online web application development, my fast web page software, generic parser, full custom web, fast web development, development of web page, create web page software, create java web application, article for web development, page title scraper, page scraper, java for web development, xml online course, web page development software, software to create a web page, format xml online

Về Bên Thuê:
( 0 nhận xét ) London, United Kingdom

Mã Dự Án: #1010422

11 freelancer đang chào giá trung bình $1095 cho công việc này

IMSeriousBidder

Hello, Please check your inbox Thanks

$1380 USD trong 15 ngày
(112 Đánh Giá)
7.4
$750 USD trong 5 ngày
(48 Đánh Giá)
6.3
clearware

Hello, we have a great experience in web scraping. A detailed experience information will be sent as PM. We can handle between 100-150K web sources (URLs) per day (have few servers doing this for years). Looking Thêm

$960 USD trong 25 ngày
(2 Đánh Giá)
5.8
priboy

Please check PMB

$1200 USD trong 15 ngày
(10 Đánh Giá)
5.6
SamvitInfotech1

Hello Please check pmb

$1500 USD trong 12 ngày
(1 Đánh Giá)
4.5
akhter1987

can we discuss Reffer to pmb

$750 USD trong 7 ngày
(9 Đánh Giá)
3.6
KiPa

I can help you really quickly! Check your inbox.

$750 USD trong 3 ngày
(3 Đánh Giá)
3.5
lenzai

see PM for details

$1500 USD trong 20 ngày
(6 Đánh Giá)
3.5
opensourcesoft

Hi, I have 4+ exp. Believe in quality output In programming and do several project related to your project and I can easy do this job plaese check my pmb and be a part of my services forever and I have also good clie Thêm

$1450 USD trong 10 ngày
(0 Đánh Giá)
0.0
kashinath01

please see pm

$800 USD trong 20 ngày
(0 Đánh Giá)
0.0
asiletto

I have long experience in J2EE and I have done many scrapers in java using htmlunit or jakarta commons.

$1000 USD trong 10 ngày
(0 Đánh Giá)
0.0