Scrape text only of huge sites and storing them in MS Word or RTF formats
$30-250 USD
Đã đóng
Đã đăng vào khoảng 9 năm trước
$30-250 USD
Thanh toán khi bàn giao
I need a PHP or Python program that crawls and scrapes whole web site (all its pages) and puts pages text into a file of MS Word or RTF formats.
The solution for a simple site is already done (with GAS), but for huge sites a better soluton is needed.
Ex. huge sites:
[login to view URL] (~2M words)
[login to view URL] (larger)
[login to view URL]
I need both a code and a service solution seemless work.
If you have other language proposition for this tool, feel free to propose.
When applying please provide:
- the workflow outline/scketch/description (what are you going to do and how you are going to do it?)
- resources consumption overview