
Completed
Posted
Paid on delivery
I have a working Python pipeline that already pulls clean text from scanned voter lists in Tamil and English by combining custom image pre-processing, a light AI layer, and Tesseract OCR. The next milestone is to make the very same code read Telugu with comparable performance—my target is 99 % character-level accuracy across the entire page, not just names or voter IDs. Once Telugu is solid, we will roll the same approach out to the rest of the major Indian scripts (Hindi, Bengali, Marathi, Malayalam, Kannada, Assamese, Gujarati, Punjabi and Odiya), but this job is strictly about nailing Telugu first. What you’ll work with • Current codebase (Python, OpenCV, pytesseract, a few custom TensorFlow helpers) • A curated set of high-resolution scanned PDFs and images of Telangana and Andhra Pradesh voter rolls for training / validation • My existing language-agnostic pre- and post-processing modules, which you are free to tweak Key responsibilities 1. Train or fine-tune a Tesseract language data set (or an alternative open-source OCR engine if it yields better accuracy) for printed Telugu voter-list fonts. 2. Integrate the new language file into the existing code, keeping the same API and CLI behaviours. 3. Validate against my test suite and push accuracy to ≥99 % on a per-character basis; document any edge-case failures and patches. 4. Hand over updated code, trained data files, and a concise technical note explaining changes and future-language scaling steps. Acceptance criteria • ≥99 % per-character accuracy on the provided blind test batch • Same or faster processing speed than the current Tamil run • telugu code will be a separate version, the same code need not read Tamil House number accuracy is extremely important I will prioritise freelancers who can point me to prior OCR/Tesseract projects in Indian scripts and explain, in a few lines, how they usually drive accuracy past the 95 % mark.
Project ID: 40187457
7 proposals
Remote project
Active 4 mos ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

This is ZenByte, an AI & OCR-focused agency with hands-on experience training Tesseract and custom OCR pipelines for Indian scripts, including Telugu, Kannada, Hindi, and Tamil government documents. We’re confident about hitting your ≥99% character-level accuracy target for Telugu voter rolls—including house numbers, which we know are usually the hardest.
₹2,000 INR in 4 days
0.0
0.0
7 freelancers are bidding on average ₹39,526 INR for this job

Hi there, Your "Extend OCR for Telugu Voter Lists" job looks interesting and matches the kind of work I usually do with Python, Machine Learning (ML), OCR, Artificial Intelligence, Image Processing, OpenCV, Natural Language Processing, Text Recognition. I can help you get a clean result and keep you updated at each step. You can see similar projects here: https://www.freelancer.com/u/msaadarshadkhan When would you like to start and do you have any examples of styles you like?
₹1,500 INR in 2 days
2.9
2.9

Hi, I’m a professional web development and also in various programming languages with 3 years of experience. I have read your project description and can deliver high-quality results on time. I’m excited to work with you and ensure your satisfaction. Best regards, Vivek pratap singh
₹7,000 INR in 7 days
0.0
0.0

Hi There, I understand you're looking to enhance your Python pipeline for extracting text from scanned voter lists in Telugu, aiming for a remarkable 99% character-level accuracy. My approach includes fine-tuning Tesseract and leveraging my experience with OCR, ensuring optimal performance for Indian scripts. I am Raja Hunain, with over 2 years of expertise in Python, Machine Learning, OCR, Artificial Intelligence, Image Processing, OpenCV, and Natural Language Processing. I have successfully worked on OCR-related projects, focusing on accuracy improvements in multilingual environments. Here’s my portfolio: https://www.freelancer.com/u/rajahunainweb I would love to discuss how my skills can contribute to achieving your goals for this project. Thank you, Regards, Raja Hunain
₹248,184 INR in 12 days
0.0
0.0

Hi there, I am excited about the opportunity to work on enhancing your Python pipeline to incorporate Telugu text extraction with high accuracy. With my expertise in OCR and Indian scripts, I am confident in delivering exceptional results for your project. Having successfully completed similar OCR projects in Indian scripts, I understand the complexities involved in achieving accuracy above the 95% mark. My approach involves meticulous training and fine-tuning of Tesseract language data sets, coupled with rigorous validation against diverse test suites to ensure thorough coverage of edge cases. To achieve the target of ≥99% per-character accuracy in Telugu, I will carefully integrate the new language file into the existing codebase while maintaining consistent API and CLI behaviors. I will leverage my experience in working with Python, OpenCV, pytesseract, and custom TensorFlow helpers to optimize processing speed without compromising accuracy. I am eager to collaborate with you on this milestone and provide you with updated code, trained data files, and a detailed technical note outlining the implemented changes and scalability for future language expansions. Let's work together to elevate your pipeline's capabilities to read Telugu text efficiently and accurately. Looking forward to the opportunity to contribute to your project's success. Miljanan
₹7,000 INR in 7 days
0.0
0.0

Hi, this project is a perfect match for my experience. I’ve worked extensively on OCR pipelines for Indian-language documents, including PDF/image → structured CSV extraction, using OpenCV, Tesseract, custom preprocessing, and AI-based post-processing. I’m from Salem, Tamil Nadu, and I have hands-on experience handling Tamil + multilingual document OCR, so extending this to Telugu at high accuracy is very feasible. How I’ll approach Telugu OCR (99% target): • Fine-tune Tesseract Telugu LSTM model using your curated dataset • Improve image preprocessing (binarization, denoising, skew correction, font normalization) • Add language-aware post-processing & error correction • Character-level validation & targeted patching for edge cases • Optimize pipeline to match or exceed Tamil processing speed Deliverables: • High-accuracy Telugu OCR model + trained data files • Integrated Python pipeline (no API/CLI changes) • Validation reports + failure analysis • Clear documentation for scaling to other Indian scripts I’ve already built OCR systems converting scanned voter lists & documents into structured CSV with high precision, so this aligns extremely well with my background. I’m confident I can push accuracy beyond 99% with clean engineering. Ready to start immediately.
₹3,999 INR in 3 days
0.0
0.0

Chennai, India
Member since Jan 23, 2026
₹1500-12500 INR
₹1500-12500 INR
₹1500-12500 INR
₹1500-12500 INR
$30-250 USD
$250-750 USD
₹1500-12500 INR
$750-1500 AUD
₹12500-37500 INR
$10-30 USD
₹600-7000 INR
₹750-1250 INR / hour
₹750-1250 INR / hour
₹750-1250 INR / hour
₹1500-12500 INR
₹37500-75000 INR
₹12500-37500 INR
€8-30 EUR
₹12500-37500 INR
$30-250 USD
₹1500-12500 INR
$2-8 AUD / hour
$30-250 USD