I am looking for java developers who can write a java code* that screen-scrape a specific site. I am only interested in collecting the comments found in that site for academic research purposes. It is an Arabic news site, where it has major sections like: Political news, Financial News, Sports news, Technology news,... each major section has subsections, e.g. Political News has the following subsections: Middle East News, Global News, ... . Each subsection has news items, and each news item may or may not have comments. The comments are paged.
I need the scraping code to collect the comments and put them in the following structure:
ID | Section Title | Subsection Title | News Item Title | URL | Comment | Comment Author | Timestamp
1 | Political News | Global News | Mission impossible diplomacy in Beijing | [url removed, login to view] | this is a comment | someone | 2012-03-01 12:00:00
2 | Political News | Global News | Mission impossible diplomacy in Beijing | [url removed, login to view] | this is 2nd comment | som2 | 2012-03-01 12:00:00
3 | Financial News | Banking | IMF to support Some country | [url removed, login to view] | this is a comment | someone | 2012-05-11 12:00:00
I will run the script daily, and the output should be a CSV file.
The code must be provided.
* If you prefer to code in a different language, like Python, you may still bid on this project, but putting in mind that you must then deliver an annotated and explained code to be used and run by someone who only knows java.
18 freelancer đang chào giá trung bình $156 cho công việc này
Expert scrapper here. I am perfect for this job, since I also know only java. I have coded all my scrappers in java only. I am confident to handle this job and provide the required output(csv). Please check your PMB.