
Closed
Posted
Paid on delivery
I need a small, self-contained application that can take one or more image-based PDF files of a weekly grocery sale ad, run OCR, find every sale item, and export a clean, standard CSV. It should be able to load several PDF's and process them in batches (I have approximately 250 to do, each 5-30MB in size). The CSV output file should be named the same as the corresponding PDF, and it should also write to a "master" CSV file. So if I processed 5 PDF's, there would be 6 CSV files, 5 individual CSV's with the name of each PDF, and the master CSV that contains data from all 5 individual CSV's. CSV file should contain the following columns. "Store Number", "Start Date", "Item Name", "Item Description, "Sale Type" "Sale Price", "Savings Amount". "Start Date" should equal the first day the prices are valid for. If the PDF is valid for 01/01/2026 to 01/07/2026 the start date would be 01/01/2026. Add an additional column at the end for "Sale Price Per Unit". This should either be the sale price of the item, or if it is a buy 1 get one free assume the "Save Up To" price is the price of each item, so a buy 1 get 1 free, save up to $3 would assume the normal price is $3 each, but you are getting 2 for $3 so the per unit price would be $1.50. If it is a buy two get 1 free, and the price is $3 each, so the per unit price would be $2.00 ($3+$3)/3. If a sale item doesn't fit above, for example buy Product X and receive a free Product Y, the sales type should be labeled "custom". The PDFs contain little or no embedded text, so the workflow has to start with reliable OCR—Tesseract, PaddleOCR, AWS Textract, or another engine you trust is fine as long as the accuracy is high. The ads come in different layouts, so the logic that pairs text regions with the right price blocks needs to be flexible (OpenCV or similar image-analysis libraries will probably help). I will supply several sample PDFs that reflect the typical variety. Deliverables • Fully-working source code and any helper scripts • A brief README with setup steps and command-line usage • A sample run that produces the requested CSV in standard comma-separated format Acceptance criteria When I run the tool on the provided samples, the output must list every visible item, with at least 95 % field-level accuracy and no missing rows. Feel free to build in Python, Java, or C#—whatever lets you meet the accuracy target quickly and keeps dependencies easy to install. Attached are 3 of the files.
Project ID: 40196592
31 proposals
Remote project
Active 2 mos ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
31 freelancers are bidding on average $21 USD for this job

Hi, I am very interested in this project and would like to offer you my services for your project. I am an expert in typing and PDF/Image conversion. I can type data and convert PDF/image files into word/excel/PowerPoint. I will provide full satisfactory results. I have years of experience in data entry. I am ready to start now. I can provide the sample to ensure you I have completely understood the project. Thank you.
$25 USD in 1 day
7.2
7.2

Hello, I can deliver a self-contained, production-ready OCR pipeline that batch-processes image-based grocery ad PDFs, accurately extracts every visible sale item, and exports both per-PDF CSVs and a consolidated master CSV exactly as specified. Using Python with high-accuracy OCR (Tesseract/PaddleOCR or Textract as needed) plus OpenCV-based layout analysis, I’ll reliably pair item text with price blocks across varying ad layouts and compute Sale Price Per Unit correctly for BOGO, multi-buy, and custom promotions. The tool will handle 250+ PDFs efficiently, name outputs to match source files, enforce schema validation, and include robust error handling and logs. You’ll receive clean source code, a clear README, CLI usage, and a sample run, all built with a high level of professionalism to meet the 95%+ field-level accuracy acceptance criteria.
$20 USD in 1 day
5.3
5.3

Hi, I understand you need a robust application to extract data from image-based PDFs for grocery sales ads and create clean CSV outputs. My approach will utilize advanced OCR technologies like Tesseract, paired with image-processing libraries to ensure high accuracy in capturing sale items, prices, and other required data. I will build a self-contained solution that processes multiple PDFs in batches and exports both individual and a master CSV file, as specified. My goal is to achieve at least 95% field-level accuracy by thoroughly testing with your provided samples and implementing the necessary logic for different layouts. Regards, Davide
$30 USD in 1 day
4.9
4.9

Hi, I am expert in Image Pdf to text data extraction. I have more than 10 years experience in this skill. I can do this project 100% accuracy. I am ready to start right now. Thank you.
$10 USD in 1 day
5.2
5.2

Hello , I've just reviewed your project description regarding the Grocery Ads Data Extraction from image based PDF's, not text based PDF. and I'm confident in my ability to meet your expectations. With over 7 years of experience as a Senior Graphic Designer, I possess a strong skill set in Data Processing, Data Extraction, Image Processing, Excel, OCR, Data Entry, Web Scraping and Python I kindly request you to take a moment from your busy schedule to explore our portfolio, where you can see the quality of my work and read feedback from previous clients: [Portfolio Links] https://www.freelancer.com/u/afshan2176 Could you please specify the final file formats you'll require? Feel free to award me the project so that we can discuss it further. Looking forward to connecting with you. Best regards, Afshan Z.
$10 USD in 1 day
4.4
4.4

Dedicated Freelancer Ready to Elevate Your Project for Grocery Ads Data Extraction from image based PDF's, not text based PDF.. I have a solid background in Web Scraping, Python, Data Processing, Image Processing, Excel, Data Extraction, OCR and Data Entry, I bring valuable expertise to your project. I have successfully completed many projects with 100% client satisfaction. Clear and timely communication is my priority. I believe in keeping you informed throughout the project lifecycle. I am available for a discussion at your earliest convenience. Please feel free to contact me to further discuss your project details. Thank you for considering my bid. I am excited about the opportunity to contribute to the success of your project. Please visit my portfolio to check my previous work samples, here - https://www.freelancer.com/u/GraphicsHub2k24?page=portfolio&w=f&ngsw-bypass= Best regards, Muhammad Asim Khan
$10 USD in 1 day
4.4
4.4

Hello sir, With my extensive experience in Python and data science, I am well-equipped to tackle the specific requirements of your project. The ability to perform reliable OCR, extract structured data from varying PDF layouts, and generate customized CSV outputs aligns perfectly with my skillset. I have successfully delivered similar projects involving batch processing of image-based documents, leveraging tools like Tesseract, OpenCV, and pandas to automate data extraction and transformation. My familiarity with handling large data volumes and producing maintainable, well-documented code ensures I can deliver a high-quality, tailored solution for your needs. Let's discuss the project in details. Best regards, Anil
$30 USD in 3 days
4.3
4.3

Hello there, I’m available to start immediately and can process batches without delay. I’ll run high-accuracy OCR on image-based PDFs, extract every sale item, normalize pricing logic (BOGO, multi-buy, custom), and output per-PDF CSVs plus a master CSV with correct naming. I’m comfortable using Adobe Acrobat OCR for clean scans and flexible image parsing to handle varied layouts. I’ve handled large, repetitive OCR-to-CSV workflows where speed, 100% accuracy, and same-day correction matter; if images aren’t readable enough, I can manually retype to ensure zero missing rows. If you’d like, share the samples and we can align quickly on accuracy checks and turnaround. I'm offering $10 per hour if manual retyping is needed. Or $5 per pdf if OCR works fine. Regards, Md Laden Islam
$10 USD in 1 day
4.1
4.1

As a versatile web developer with expertise in Python and web scraping, I have transformed data across countless projects, and I'm eager to bring this skill-set to your Grocery Ads Data Extraction project. Despite the lack of embedded text, I can leverage Tesseract, PaddleOCR, or AWS Textract, ensuring high accuracy and extracting every sale item your business needs. The task of cleaning and organizing data is one of my specialties. Regardless of how varied the ad layouts are, I feel comfortable implementing OpenCV or similar image-analysis libraries to ensure seamless pairing of relevant text regions with price blocks. The end result will be a well-organized standard CSV file containing all the necessary columns like 'Store Number', 'Item Name', 'Sale Price', and 'Savings Amount'. Moreover, my proficiency in Python aligns comfortably with your preference for a codebase in either Python, Java, or C#. So why Hire me? You can expect not just a working source code but also an easy-to-understand README file---because your convenience matters to me as much as accurate outputs. Let's team up to bring the power of OCR and data manipulation to streamline your grocery operations!
$30 USD in 2 days
3.6
3.6

Hi there, I’ll deliver a compact, self-contained OCR workflow that converts image-based grocery ads into precise CSVs, handling 250 PDFs in batches with per-PDF outputs named after the source file plus a master CSV. The solution uses robust OCR (Tesseract/Paddle/AWS Textract) and flexible OpenCV-based region matching to adapt to varied layouts; Best regards,
$10 USD in 3 days
3.4
3.4

Welcome to professional Python development services! Hi there, I'm Alema, a Python expert programmer who strives for clear code in atmospheric, numerical weather prediction, physics, and all other seminal fields. I'm ready to provide you with high-quality services. I have completed 350+ projects with a 100% Positive Rating. If you are looking for Quality work, look no further. Also, we are a team of professional workers, and we are always available 24/7 to help employers without limitations, and delivery is guaranteed on time. Your faithfully. Eng. Alema Akter
$15 USD in 1 day
3.6
3.6

As a seasoned FULL STACK SOTWARE ENGINEER with over 12 years of experience, I am well-versed in project management and meeting strict deadlines. The task at hand of extracting grocery ads data from image-based PDFs aligns perfectly with my expertise in Python, Web Scraping, Data Processing, and Excel. With an extensive knowledge of OCR engines such as Tesseract, PaddleOCR, AWS Textract, etc., I will guarantee an accuracy level exceeding your expectations, providing you with at least 95% field-level accuracy and no missing rows. Flexibility is key for this project due to varying layouts of the PDF files. Fortunately, my proficiency in image analysis libraries like OpenCV will come in handy to tailor-make a solution fitting each unique layout. Take advantage of my capability to batch-process the high number of PDF files (up to 250) efficiently and effectively. My track record speaks volumes about my ability to deliver fully-working source code along with helper scripts, a comprehensive README file for hassle-free setup and command-line usage, and sample runs that match your precise requirements.
$15 USD in 7 days
3.5
3.5

Hey there, I am a Software engineer with over 3 years of experience building document-processing and data extraction pipelines. I can develop a self-contained application that batch-processes image-based grocery ad PDFs using high-accuracy OCR and flexible layout analysis to extract every sale item into clean per-PDF and master CSV files. My expertise includes Python, OpenCV, Tesseract/PaddleOCR/Textract, PDF image processing, rule-based pricing logic, and robust CSV/data pipeline design. With my experience, I’m sure I can meet the 95%+ accuracy target quickly and deliver clean, well-documented code ready to run end-to-end. Feel free to check my profile and contact me for more details. Regards,
$20 USD in 2 days
1.6
1.6

Hi there, I’ve reviewed your Grocery Ads Data Extraction project and can deliver a self-contained app that OCRs image-based PDFs, extracts every sale item, and outputs per-PDF CSVs plus a master CSV. I’ve built Python OCR pipelines (Tesseract, PaddleOCR) with OpenCV-based layout analysis to handle varied ad formats, batch-loading 250 PDFs (5–30 MB each), and parallel processing. Output CSV columns will be: Store Number, Start Date, Item Name, Item Description, Sale Type, Sale Price, Savings Amount, Sale Price Per Unit. Start Date is the first valid date; per-unit price is computed from Buy offers per your rules; non-standard combos are labeled "custom". Deliverables: source code, README with setup and CLI usage, and a sample run that produces the requested CSVs. Timeline: MVP in about 14 days; initial 3-PDF proof within a few days. Bid: 28 USD. Next steps: If you confirm OCR engine preference and share a couple of representative PDFs, I’ll tailor the layout pairing logic and deliver a working MVP for review. Best regards,
$25 USD in 3 days
0.0
0.0

Hello, With a strong background in data entry, processing and web scraping, I am confident in my ability to tackle your project of Grocery Ads Data Extraction from image based PDFs. I've worked extensively with OCR technologies such as Tesseract, PaddleOCR, and AWS Textract and have a keen eye for employing image analysis libraries like OpenCV to deliver highly accurate data extraction results such as your project requires. In fact, this project seems tailor-made for my skillset. In addition to my skills specifically related to this project, I also bring a wealth of knowledge in AI and machine learning that could elevate the accuracy and efficiency of your project even more. My Full-Stack Development expertise can ensure clean architecture and efficient performance for your application, while my experience in Mobile App Development can provide you with additional avenues to access and utilize your grocery ad data. In conclusion, I believe that my technical skills, problem-solving acumen and commitment to timely delivery make me an ideal fit for this project. Moreover, I pledge to ensure at least 95% field-level accuracy, no missing rows and deliver all the acceptance criteria in line with your expectations. I eagerly anticipate discussing your project further and embarking on a collaborative journey that will result in the seamless extraction of vital data from your PDFs. Thanks!
$25 USD in 3 days
0.0
0.0

I want to say this honestly and from my heart. Since you came into my life, everything feels more meaningful and calm. Your smile, your words, and the way you make me feel have slowly become very important to me. I enjoy every moment we talk or spend together, and I find myself caring about you more each day. I may not be perfect, but my feelings for you are real and sincere. I want to grow with you, support you, and share both happiness and challenges together. Would you give me a chance to be a special part of your life and start this beautiful journey with me?
$20 USD in 7 days
0.0
0.0

You need a small app that takes image based weekly grocery ad PDFs, runs OCR, extracts every sale item, and outputs one CSV per PDF plus a master CSV, with the exact columns you listed including Sale Price Per Unit and the sale type rules for BOGO and buy 2 get 1. Success is you run it on your samples and it finds every visible item with at least 95 percent field level accuracy and no missing rows, even when layouts differ. In the first hour I will run OCR on one of your attached PDFs to benchmark engines, then build the extraction pipeline that groups OCR text into item blocks and maps each block into Store Number, Start Date, name, description, sale type, price, savings, and per unit price. Do your ads follow one store per PDF and is Store Number and date range always printed in a consistent header area. For pricing, do you see patterns like 2 for 5, buy one get one, save up to, or percent off that you want supported beyond custom. Pitfalls are OCR errors on small fonts, prices separated from item names, multi item blocks in one box, inconsistent date formats, and missing rows if grouping is too strict. I have built OCR to CSV pipelines using OpenCV plus OCR with rule based parsers to keep recall high and batch runs stable for hundreds of PDFs. I can deliver a Python CLI with README, batch folder input, per file CSV plus master CSV, and a sample run output from your three PDFs. Danylo Podolskyi
$20 USD in 7 days
0.0
0.0

"I am highly proficient in data entry and data extraction tasks. I can accurately extract sale items from your grocery ad PDFs and export them into a clean, standard CSV file as requested. I am familiar with OCR processes and can ensure all columns like 'Item Name', 'Sale Price', and 'Start Date' are correctly filled. I am detail-oriented and ready to deliver high-quality work within your deadline."
$25 USD in 3 days
0.0
0.0

I am writing to express my strong interest in your PDF to Excel conversion project. Having reviewed your specific requirements, I am confident that my team and I possess the exact blend of technical skill, historical context, and meticulous attention to detail necessary to deliver a flawless product. With over 20 years of experience in the data processing industry, we have witnessed the evolution of document technology—from simple flat-file entries to complex, encrypted, and multi-layered PDF structures. This deep-rooted experience means we don’t just "convert" files; we architect data solutions that maintain integrity, logic, and usability.
$60 USD in 30 days
1.6
1.6

Hello, I can build a Python tool to process grocery ad PDFs, extract all products using Optical Character Recognition (OCR), and create clean CSV files (individual files and a master file). I have experience with Python, OCR (Tesseract/PaddleOCR), batch processing, and CSV work. I will provide you with a reliable and well-documented program with at least 95% accuracy for all fields.
$25 USD in 2 days
0.0
0.0

Toronto, United Kingdom
Member since Feb 1, 2026
₹12500-37500 INR
$10-30 USD
$250-750 USD
$10-30 USD
₹100-400 INR / hour
₹400-750 INR / hour
₹750-1250 INR / hour
₹600-1500 INR
£250-750 GBP
₹400-750 INR / hour
$15-25 USD / hour
₹37500-75000 INR
₹100-400 INR / hour
$250-750 USD
$250-750 USD
$10-30 USD
$10-50 USD
₹1500-12500 INR
₹750-1250 INR / hour
$15-25 USD / hour