We want to scrape text and images from a 1,100 page website. This will require:
1) mapping all of the pages on the site;
2) creating a directory for each page and sub-subdirectories for each page nested within another page;
3) Screen shot the entire current page;
4) Create a text file of all the text on the page;
5) Copy all of the images on each page.
This needs to be undertaken carefully and fully QA'd. No missing text and every image copied across. We estimate that it will take one minute per page to undertake the tasks above. Someone else will need to QA and ensure this is all correct. We will then want each page that has been scraped to be recoded in a table with the URL name, the name of the directory where the data is saved, and the status (e.g Scraped, QA'd etc). We will provide this table.
We are going to be producing a new site (Wordpress) to rebuild the site and will want to reassemble each page. We estimate it will take maybe 3 minutes per page to reassemble. This will be phase two but we would be happy to award the contract for both pieces to the same firm or these could be done separately.
We can show you the site once you have been shortlisted.
12 freelancer đang chào giá trung bình $475 cho công việc này
Hi There, Need to know more about the complete job description. I have good experience in all type of backend work and spreadsheet task. Let's discuss and work together. Thanks Prakash