We are looking for a company or group or individual who has experience in Talend and HDP. (Hortonworks Data Platform)
This project is prototype project to verify the technical issues before the real project comes.
Therefore, if the result of this project is good, we will hire you for the real project also.
If you have experience of Talend, HDP and JAVA, this project is NOT hard to perform.
Only experienced candidates will be welcomed.
1. By user-defined keywords, extract specific data(mainly, area) of a specified website and store the data by a text file or db format on my server.
2. By a scheduled Job, inject the extracted data into HDP. And the job must be designed by Talend Open Studio.
- The process condition for the job will be defined by discussion as you design the job.
- Create 3~5 tables on a mySQL DB for the result
- The transfer from HDP to mySQL DB is performed by Talend.
- The design of the tables will be discussed as you reach that step.
** Reference : The configuration of the prototype system is attached.
1. First, you should install all the components on my servers by remote. (TeamViewer will be used)
2. You should work on my servers from the beginning.
3. All the installation steps must be transferred to my staff by following your way.( Skype will be used for this communication)
: This is the key principle condition for payment.
1. Linux Web Crawler : you should prepare a proper crawler to satisfy the needs, below.
: This crawler must be offered to us. and we can ask some additional customizing needs for the project propose. (discussion needed)
- Searching specified Web site by keywords (including the subdirectories)
- Extraction should be repeated by time setting.
- Extracting Items
b. meta tag(title, description, keyword)
c. plain text between to tag
d. page size
e. last modified date value
2. Talend Open Studio for Big data (free version)
- Already Installed on my machine. but, if you need, you can use user own.
- But, the result project files(Talend project files) must be offered us.
3. HDP (free version)
- Already Installed on my machine. and you should work on it during this project.
- You can reinstall it or change the configuration of it if you need.
- All the changed and modified history must be transferred to us.
** This project will start by this Wednesday(22th/June), at 9:30am (GMT+9).
** The bid will be closed until this Tuesday(21th/June), at 11pm(GMT+9).
** The desired due date for completing this project is this Sunday, at 2pm(GMT+9)