Need offline data sourcing of images having Chinese text
$120000-150000 USD
Đã đóng
Đã đăng vào khoảng 3 năm trước
$120000-150000 USD
Thanh toán khi bàn giao
Service Type: Sourcing of Images with Simplified Chinese Text
Language: Chinese (Simplified Chinese)
Data to be sourced:
Data Type: Sourcing of 1 set of images with Simplified Chinese text.
Requirements:
Type: Images (png, jpeg) that have text in them
Text Language: Chinese
Required Words Per Image (Average): 25 Words Per image
Amount of Images: 100,000 total
Categories and Category Amounts (stay within 2% of these amounts):
i. Quotes/Greetings/Wishes – ~6000 (6%)
ii. Cartoons – ~6666 (6.67%)
iii. Apparel – ~12446 (12.45%)
iv. Billboards/Banners/Posters – ~12444 (12.44%)
v. Product Packaging – ~12444 (12.4%)
vi. Book Covers – ~12000 (12%)
vii. License Plates – ~8000 (8%)
viii. Street Signs – ~4000 (4%)
Other Instructions:
Text should not be occluded/redacted
Image should be higher than (500 X500) resolution
In case of scanned images, DPI needs to be minimum 150 - 300
Blurry image should be avoided
Repetition of same image more than once should be rejected
Sourced documents should be segregated & kept in separate folders according to the categories mentioned above
Any type of mathematical formulas, symbols OR Chemical formulas OR superscripts & subscripts should be avoided
Text written in any other language other than Hebrew should be avoided
Image of multiple pages visible together should be avoided
Images having explicit content should be avoided
Sample Requirement: Share the samples of 5 images in each category and also share the cost for this request.
I'm interested in this job. And here's the reason why you should choose me.
- Major in Management Information System
- Work in IT Industry
- Good at gathering information from Internet manually and automatically
Simplifed Chinese is my mother togue. And I've publish 2 books translated from English.
As a professional data analyst, I'm confident to source these images with high quality.
Experienced data scrapper. Having used Scrapy framework and other tools to crawl data for NLP & CV fields’ training usage.
Able to apply preliminary data cleaning for required images data set with opencv and other tools in order to have better quality.
p.s. Some requirements hard to reach for objective reasons, "Required Words Per Image (Average): 25 Words Per image", since "street signs" and "license plates" contain always less than 25 words.
Best wishes.