
Completed
Posted
Paid on delivery
I need a small, self-contained application that can take one or more image-based PDF files of a weekly grocery sale ad, run OCR, find every sale item, and export a clean, standard CSV. It should be able to load several PDF's and process them in batches (I have approximately 250 to do, each 5-30MB in size). The CSV output file should be named the same as the corresponding PDF, and it should also write to a "master" CSV file. So if I processed 5 PDF's, there would be 6 CSV files, 5 individual CSV's with the name of each PDF, and the master CSV that contains data from all 5 individual CSV's. CSV file should contain the following columns. "Store Number", "Start Date", "Item Name", "Item Description, "Sale Type" "Sale Price", "Savings Amount". "Start Date" should equal the first day the prices are valid for. If the PDF is valid for 01/01/2026 to 01/07/2026 the start date would be 01/01/2026. Add an additional column at the end for "Sale Price Per Unit". This should either be the sale price of the item, or if it is a buy 1 get one free assume the "Save Up To" price is the price of each item, so a buy 1 get 1 free, save up to $3 would assume the normal price is $3 each, but you are getting 2 for $3 so the per unit price would be $1.50. If it is a buy two get 1 free, and the price is $3 each, so the per unit price would be $2.00 ($3+$3)/3. If a sale item doesn't fit above, for example buy Product X and receive a free Product Y, the sales type should be labeled "custom". The PDFs contain little or no embedded text, so the workflow has to start with reliable OCR—Tesseract, PaddleOCR, AWS Textract, or another engine you trust is fine as long as the accuracy is high. The ads come in different layouts, so the logic that pairs text regions with the right price blocks needs to be flexible (OpenCV or similar image-analysis libraries will probably help). I will supply several sample PDFs that reflect the typical variety. Deliverables • Fully-working source code and any helper scripts • A brief README with setup steps and command-line usage • A sample run that produces the requested CSV in standard comma-separated format Acceptance criteria When I run the tool on the provided samples, the output must list every visible item, with at least 95 % field-level accuracy and no missing rows. Feel free to build in Python, Java, or C#—whatever lets you meet the accuracy target quickly and keeps dependencies easy to install. Attached are 3 of the files.
Project ID: 40194741
52 proposals
Remote project
Active 3 mos ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

I can build this OCR data extraction tool for your grocery ads. My approach: 1. Use PaddleOCR or Tesseract for text extraction from image-based PDFs 2. OpenCV for layout analysis to pair items with prices 3. Python script with batch processing support for your 250 PDFs 4. Clean CSV output with all required columns (Store Number, Start Date, Item Name, Description, Sale Type, Sale Price, Savings, Per Unit Price) I have experience with similar data extraction projects and can handle the varying ad layouts. The tool will meet your 95% accuracy requirement. Happy to discuss the details further.
$15 USD in 1 day
0.0
0.0
52 freelancers are bidding on average $32 USD for this job

With your project on grocery ads data extraction from image-based PDFs, I am the perfect match for you. In my years of experience as a web scraping specialist, I have dealt with websites having complex and advanced anti-bot protection systems, which reflects my ability to handle challenging tasks like yours. Able to scrape data from dynamic and protected websites proficiently using Python (Selenium, BeautifulSoup, Scrapy, Requests etc), I can extract the desired information from each PDF and export them in a clean and structured CSV format with at least 95% accuracy—a metric I've always striven to surpass. Using my data processing skills with Excel and Python, I can calculate the "Sale Price Per Unit" for every item accurately as per your requirement. Furthermore, flexibility is necessary to deal with different layouts of ads and I'm skilled in leveraging OpenCV or similar image-analysis libraries for such purposes. This along with my proficiency in OCR engines like Tesseract or PaddleOCR
$50 USD in 3 days
7.5
7.5

I have over 10+ years of experience Grocery Ads Data Extraction from image based PDF's, not text based PDF.. Please feel free to further discuss the requirements and timeline for the project. I'd be happy to assist you. I am ready to start right now. You can visit my Profile https://www.freelancer.com/u/HiraMahmood4072 Thank you
$30 USD in 1 day
6.3
6.3

Hello, I can build a reliable batch tool that OCRs image-based grocery ad PDFs, extracts every sale item with high accuracy, and outputs clean CSVs—one per PDF plus a master file—exactly to your format. Using a proven OCR engine with flexible layout analysis, the app will handle varied designs, calculate correct per-unit pricing, classify sale types, and meet your 95% accuracy requirement. You’ll get complete source code, a simple README, and a working sample run. Regards, Zafar
$50 USD in 1 day
6.3
6.3

Hi, I can build a small, reliable script to batch-process image-based grocery PDF ads using OCR and export clean CSVs exactly as specified. I’ve worked with OCR (Tesseract / PaddleOCR / Textract), OpenCV-based layout handling, and CSV normalization for real-world, inconsistent PDFs. The solution will process multiple PDFs, generate individual CSVs plus a master CSV, handle date logic, and correctly compute “Sale Price Per Unit” including BOGO and similar offers. I’ll keep the setup simple and provide a short README with run instructions. Happy to test on your sample PDFs and ensure the required accuracy.
$30 USD in 2 days
5.5
5.5

Hello, I’m Muhammad Muneeb. I specialize in advanced web scraping and data extraction, including OCR-based processing of image-heavy PDFs. I can build a robust Python application that batch-processes all your grocery ad PDFs, runs accurate OCR (Tesseract/PaddleOCR/AWS Textract), intelligently maps items to prices, and outputs both individual CSVs and a master CSV exactly as you specified. The tool will calculate per-unit prices for BOGO or multi-buy offers and label custom promotions correctly. I will provide fully working source code, a README, and a sample run with at least 95% field-level accuracy. I can deliver this efficiently with minimal dependencies for easy setup.
$30 USD in 1 day
5.0
5.0

As a seasoned data analyst with over 50 successful projects under my belt, I am well-versed in all aspects of data extraction, processing, and analysis. My stronghold is data-heavy projects where attention to detail and accuracy are paramount. Combining this expertise with my proficiency in using Python, I am confident that I can deliver an automated solution for your grocery ads data extraction that meets all your specifications. Given the project's reliance on Optical Character Recognition (OCR), I have comprehensive experience with various OCR engines including Tesseract and AWS Textract. Moreover, my command over image-analysis libraries like OpenCV allows me to create flexible registration logic tailored to different layouts and adjust-multiple CSV generating codes based on individual PDFs. In terms of deliverables, you can expect a fully-operational script with clear set-up tips and CLI instructions, as well as a sample run that produces required CSV matching the high accuracy target you've set. Putting your needs first, my aim is to make the process as user-friendly and efficient as possible to ensure smooth integration into your workflow. By choosing me for this project, you opt for robust, reliable, and accurate-data solutions!
$60 USD in 3 days
5.0
5.0

I can build a robust batch OCR pipeline to extract every sale item from image-based grocery PDFs and export clean per-PDF and master CSVs. I’ll use high-accuracy OCR (Tesseract/PaddleOCR or Textract) plus OpenCV layout analysis to reliably pair items with prices across varying designs. Python-based, easy setup, CLI-driven, documented, and tested to meet the 95% accuracy requirement.
$30 USD in 2 days
5.0
5.0

Dear Client, Greetings!! I have gone through the project description, and found that all of the mentioned requirements fall over my expertise, as I have hands-on experience on python, AI/ML, Data Science, software building, etc. I’ll put together a small, self-contained Python app that takes your image-based grocery ad PDFs, runs reliable OCR on them, figures out which text belongs to which price even when the layouts change, and spits out clean CSVs. You’ll get one CSV per PDF plus a master file that combines everything. The script will pull store number, start date, item name and description, sale type, sale price, savings, and it will also calculate sale price per unit for things like buy-one-get-one or buy-two-get-one deals, with anything odd tagged as custom. It’s built to run in batches (hundreds of files at once), no manual steps, just a simple command-line run. I’ll include the full source, a short README, and a sample run so you can see the output straight away. It won’t be over-engineered, just accurate and practical, aiming to hit that around 95% percent accuracy you’re after. Lets discuss further over a chat. Also, I have been coding on Machine Learning and Data Science with python from past 7 years. I have the experience of working with 4 giant tech companies, including freelancing on upwork, fiverr and freelancer. Hope to hear from you soon!!. Regards, Rojan
$50 USD in 7 days
4.4
4.4

Greetings! I'm a dedicated Data Scientist specializing in extracting valuable insights from vast datasets, particularly for Microsoft applications Development. Proficient in Excel, Word, Access, PowerPoint, Power BI, and Outlook, I excel in crafting clean and effective solutions, including advanced array formulas, pivot tables, and VBA macros. With a commitment to delivering high-quality results promptly and professionally, my aim is to foster trust and long-term relationships with clients. Entrust your project to me, and together, we'll achieve tangible, positive impacts for your business. Looking forward to collaborating with you. Best regards, Zeeshan
$30 USD in 1 day
4.6
4.6

Hello , I've just reviewed your project description regarding the Grocery Ads Data Extraction from image based PDF's, not text based PDF. and I'm confident in my ability to meet your expectations. With over 7 years of experience as a Senior Graphic Designer, I possess a strong skill set in Python, AWS Textract, OCR, Data Processing, OpenCV, Data Entry, Image Processing, Data Extraction, Excel and Web Scraping I kindly request you to take a moment from your busy schedule to explore our portfolio, where you can see the quality of my work and read feedback from previous clients: [Portfolio Links] https://www.freelancer.com/u/afshan2176 Could you please specify the final file formats you'll require? Feel free to award me the project so that we can discuss it further. Looking forward to connecting with you. Best regards, Afshan Z.
$10 USD in 1 day
4.2
4.2

Hello! I understand you need a robust application to extract grocery sale data from image-based PDFs and output it to CSV files. This is a great project and aligns perfectly with my skills in OCR and data processing. Having successfully developed data extraction applications before, I achieved over 95% accuracy in field-level data with customizable extraction logic tailored for various document layouts. For instance, I implemented similar tasks using Tesseract and OpenCV, ensuring high reliability for OCR tasks. ✅My Plan - Utilize Tesseract or PaddleOCR for effective text recognition. - Develop a flexible logic to pair text regions with price blocks using OpenCV. - Implement batch processing capabilities for multiple PDFs. - Ensure CSV files are generated with required columns, each named after the corresponding PDF, including a master CSV. - Provide a README for setup and usage instructions. Could you clarify if there are any specific layouts or unique features in the provided PDFs that I should prioritize? Also, do you have a preferred programming language for this project? Best regards, Hongqiang Chen
$35 USD in 1 day
4.0
4.0

Greetings, It looks like you need a tool that can extract grocery sale data from image-based PDFs using OCR and output it in a clean CSV format. I can definitely help you with that. My approach would involve using a reliable OCR solution, like Tesseract or AWS Textract, to accurately capture the text from your PDFs. I would also incorporate image processing techniques with OpenCV to ensure that we can handle the different layouts of the ads effectively. The application will be designed to process multiple PDFs in batches, generating individual CSV files for each PDF along with a master file that aggregates all the data. I’ll make sure that the output meets your specifications, including calculating the sale price per unit for various promotions. With my experience in Python and data processing, I'm confident in delivering a solution that meets your accuracy requirements. Best regards, Saba Ehsan
$32 USD in 30 days
3.9
3.9

Hi there, I’ve read your project on extracting every sale item from image-based grocery ads and I’ll deliver a robust, self-contained Python tool that uses OCR (Tesseract/PaddleOCR) and OpenCV to handle diverse layouts, batch-process ~250 PDFs, and export per-PDF CSVs plus a master CSV. Two quick checks to tailor the build: and a second detail about the preferred OCR engine or batch-processing workflow; deliverables include fully-working source, README with setup steps, and a sample run within 7–10 days; Best regards,
$10 USD in 4 days
3.4
3.4

Hello, I’m an experienced and highly organized professional with expertise in typing, data entry, data scraping, and administrative support. I pride myself on accuracy, speed, and delivering high-quality results that exceed expectations. Core skills include: * Typing & Data Entry: Fast, accurate, and detail-oriented data input. * Data Scraping & Mining: Extracting valuable information from websites, directories, and databases to deliver clean and structured datasets. * Data Cleaning & Organization: Ensuring data is reliable, accessible, and ready for analysis. * Tools & Software Proficiency: * Microsoft Excel (advanced formulas, pivot tables, formatting) * Microsoft Word (document creation & formatting) * Google Sheets & Docs (collaboration & management) * Web scraping tools (manual & automated) * File Handling: Skilled in managing PDFs, Word, Excel, scanned docs, and converting them into usable formats. * Efficiency: Capable of handling large volumes of data quickly while maintaining quality and confidentiality. I am committed to reliable, timely, and accurate results with complete data integrity. Available to start immediately, I’d be glad to discuss how my skills can benefit your project. Best regards, Tooba S.
$30 USD in 1 day
2.8
2.8

Hi, James. Thank you for reaching out. I have been working with Python OCR PaddleOCR OpenCV and PDF processing for over 5years. I have solved similar batch extraction challenges from image based grocery ad PDFs at scale with strict accuracy targets. I understand the main logic of OCR then layout detection to link each item name and description with the correct sale price savings and promotion. I will batch process your PDFs and output one CSV per PDF plus a master CSV with store number start date item name item description sale type sale price savings amount and sale price per unit. I will calculate per unit price for buy one get one free and buy two get one free deals and label other promo patterns as custom. I am available for screenshare consultation and I will explain everything clearly so your programmer can implement it. Looking forward to solving this.
$30 USD in 7 days
2.4
2.4

Hey , I just finished reading the job description and I see you are looking for someone experienced in Image Processing, Data Entry, Excel, AWS Textract, Web Scraping, Data Extraction, OpenCV, Python, Data Processing and OCR. This is something I can do. Please review my profile to confirm that I have great experience working with these tech stacks. While I have few questions: 1. These are all the requirements? If not, Please share more detailed requirements. 2. Do you currently have anything done for the job or it has to be done from scratch? 3. What is the timeline to get this done? Why Choose Me? 1. I have done more than 250 major projects. 2. I have not received a single bad feedback since the last 5-6 years. 3. You will find 5 star feedback on the last 100+ major projects which shows my clients are happy with my work. Timings: 9am - 9pm Eastern Time (I work as a full time freelancer) I will share with you my recent work in the private chat due to privacy concerns! Please start the chat to discuss it further. Regards, Salik.
$10 USD in 6 days
1.4
1.4

As a professional quality engineer in the automobile industry, accuracy is paramount to me. My experienced background has equipped me with the skills to efficiently extract and analyze data, an integral part of this project. My extensive use of Excel and efficiency in data entry are undoubtedly assets that will contribute positively to completing this task promptly and accurately. I may not have direct experience in grocery ads, but my proficiency in image processing makes me a valuable asset for you. Adapting and customizing existing software to cater to diverse needs is a routine part of my work, which aligns well with your project's demand for a flexible logic to handle various layouts of your PDFs. Through OCR tools such as Tesseract, I am confident in achieving the necessitated 95% field-level accuracy you demand within the stipulated timeframe. I assure you that delivering impeccably organized CSV files along with fully-working source code, helper scripts and setup instructions won't be a challenge.
$10 USD in 7 days
1.4
1.4

Hello, I appreciate the opportunity to bid on your project for developing an OCR application that processes grocery sale ads from PDF files. I understand the importance of accurately extracting sale item information and generating clean CSV files, especially given the scale of approximately 250 PDFs. With over five years of experience in developing data extraction tools and working with OCR technologies, I have successfully implemented solutions using Tesseract and OpenCV. My expertise in Python and Java ensures that I can build a robust application tailored to your needs. To achieve your goals, I propose the following approach: - Utilize a reliable OCR engine like Tesseract to extract text from the PDFs, ensuring high accuracy. - Implement flexible logic with OpenCV to handle various layouts and associate text with corresponding price blocks. - Create a batch processing feature to manage multiple PDFs at once, generating individual and master CSV files. - Ensure the output CSV files meet your specified format, including all required fields and accurate calculations for sale prices. I am eager to bring this project to life and confident in my ability to deliver high-quality results within your deadline. I would love to discuss any further details and clarify any questions you may have. Thank you for considering my proposal, and I look forward to the opportunity to work together.
$30 USD in 7 days
1.0
1.0

Hi, I noticed you’re looking to work on OCR-based extraction from image-based grocery ad PDFs, and I’ve built very similar solutions with Tesseract and OpenCV for flexible text and price region pairing in complex layouts. I have 7+ years of experience in data extraction and image processing and have mastered Python, OCR engines, and CSV export workflows. In a recent project, I developed an OCR-driven tool for extracting product and pricing data from retail flyers, achieving over 95% accuracy and scalable batch processing. For this project, I would combine Tesseract OCR with Python's OpenCV to dynamically detect and pair text blocks with pricing sections despite varied layouts. I would parse dates and sale types programmatically to calculate unit prices, then export well-structured CSVs by PDF and a master file, ensuring easy traceability. The solution will be self-contained with a simple CLI interface and include comprehensive README documentation. I look forward to working with you. Best Regards, Renata Lopez
$100 USD in 1 day
0.0
0.0

Hello, I’ve reviewed your Grocery Ads Data Extraction project and understand the need for a self-contained application to extract sale item data from image-based PDFs. I am experienced in developing applications that utilize OCR technology for data extraction tasks. For this project, I will create a custom application that can efficiently process multiple image-based PDF files, extract the relevant information, and export it into a clean, standard CSV format. I will ensure that the output CSV files contain the required columns such as Store Number, Start Date, Item Name, Item Description, Sale Type, Sale Price, Savings Amount, and Sale Price Per Unit. I will utilize reliable OCR engines like Tesseract or AWS Textract to ensure high accuracy in data extraction. Additionally, I will implement flexible logic to handle different layouts using image-analysis libraries like OpenCV. I look forward to working on this project and delivering a solution that meets your requirements. Clear communication and a smooth workflow are guaranteed throughout the project. Regards, Mairaj Ahmed Khan Data Extraction Specialist
$10 USD in 1 day
0.0
0.0

Jacksonville Beach, United States
Payment method verified
Member since Oct 20, 2006
$30-100 USD
$30-50 USD
$500-1000 USD
$30-100 USD
$20-25 USD
$30-250 USD
$15-25 USD / hour
₹750-1250 INR / hour
$30-250 USD
₹1500-12500 INR
₹400-750 INR / hour
$8-15 USD / hour
€250-750 EUR
€250-750 EUR
£10-15 GBP / hour
₹750-1250 INR / hour
₹600-1500 INR
₹750-1250 INR / hour
$15-25 USD / hour
₹750-1250 INR / hour
₹750-1250 INR / hour
₹1500-12500 INR
$10-50 USD
$8-15 USD / hour
$10-30 USD