
Đã hoàn thành
Đã đăng vào
Thanh toán khi bàn giao
I need an end-to-end ML experiment to predict duplicate customer records in a financial dataset from Kaggle. The goal is to build a proactive classification model that flags likely duplicates before they reach reporting, analytics, or risk pipelines. The workflow should include data loading, EDA, synthetic duplicate labelling (since labels won’t exist), feature engineering, model training, and evaluation. Duplicate pairs will be created using techniques like exact duplication, small perturbations, and formatting inconsistencies. Features should include exact matches, numeric differences (age, income, spending), and similarity measures. Models to test include Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar, but deliver one final tuned model. Evaluation should focus on F1-score (target ≥0.85), with a balance between precision and recall. Deliverables: reproducible notebook, clean code, short report, and README.
Mã dự án: 40330729
5 đề xuất
Dự án từ xa
Hoạt động 17 ngày trước
Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc

Hey, I have extensive experience working in the Fintech Domain as a Applied ML Engineer and Data Scientist, since last 6+ years. I can complete your task and also provide you with report in less than 1 day.
$70 USD trong 1 ngày
2,9
2,9
5 freelancer chào giá trung bình $44 USD cho công việc này

Hello, With over 7 years of experience in Excel, Data Science, Data Visualization, Statistical Analysis, and Statistics, I have the expertise to handle your project efficiently. I have carefully reviewed the requirements for the project. To address the predictive data quality modeling for financial customer data using machine learning, I will begin by loading the dataset from Kaggle and performing exploratory data analysis (EDA). Synthetic duplicate labeling will be implemented due to the absence of labels. Feature engineering will involve creating features based on exact matches, numeric differences, and similarity measures. The workflow will include model training and evaluation using techniques like Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or similar algorithms to develop a tuned model. Evaluation will focus on achieving an F1-score of ≥0.85, balancing precision and recall. The deliverables will include a reproducible notebook, clean code, a concise report, and a README file detailing the project setup. I would like to discuss this project further with you. Please connect with me via chat for a detailed conversation. You can visit my profile at https://www.freelancer.com/u/HiraMahmood4072 Thank you.
$36 USD trong 2 ngày
6,4
6,4

Hi there, I am A.R.M. MASUD, with a strong Data Science background. As a Python developer, I have extensive experience building robust, scalable, and efficient solutions that address various business needs. I understand the importance of delivering high-quality, well-architected code, and I am committed to working closely with you to ensure the success of this project. I implement core functionality using Python, utilizing relevant libraries and frameworks such as Pandas, NumPy, GUI, SciPy, Matplotlib, Seaborn, Plotly, Scikit-learn, TensorFlow, Keras, PyTorch, spaCy, Flask, Django, FastAPI, OpenCV, and Jupyter. I am a professional responsible for extracting actionable insights and knowledge from large volumes of data through Machine Learning models, including CNNs, RNNs, LSTMs, GANs, Transformers, FNNs, ANNs, and DNNs. I conduct comprehensive unit, integration, and performance testing to ensure the solution is error-free and optimized. https://www.freelancer.com/u/MZITSERVICES I appreciate the opportunity to submit this proposal and am excited about the possibility of working with you to bring your project to life. Thanks A.R.M MASUD
$40 USD trong 7 ngày
4,7
4,7

Your duplicate detection challenge needs synthetic labeling since real financial datasets won't have duplicate flags. I'd start by loading your Kaggle dataset, creating controlled duplicates through exact matches and small perturbations (typos, formatting changes), then engineer similarity features like Levenshtein distance, numeric differences, and exact match indicators. XGBoost typically performs well for this type of classification with proper hyperparameter tuning. I built a price aggregation engine that tracks 800+ products across multiple stores, handling fuzzy matching and duplicate detection for similar products with slight naming variations. The pattern recognition work translates directly to customer record deduplication. You can see my automation projects at ffulb.com. Can deliver the complete notebook, tuned model hitting your F1≥0.85 target, and documentation within a week. Ready to start immediately.
$28 USD trong 2 ngày
1,5
1,5

South Africa
Phương thức thanh toán đã xác thực
Thành viên từ thg 3 20, 2026
$250-750 USD
€8-20 EUR
$250-750 USD
₹600-1500 INR
$15-25 USD/ giờ
₹600-1500 INR
₹600-1500 INR
₹12500-37500 INR
$30-250 USD
$30-250 USD
$250-750 USD
$15-25 USD/ giờ
$15-25 USD/ giờ
₹1250-2500 INR/ giờ
₹600-1500 INR
₹12500-37500 INR
₹750-1250 INR/ giờ
₹750-1250 INR/ giờ
₹12500-37500 INR
€12-18 EUR/ giờ