Indiamart being a product listing platform has given freedom its sellers to upload the products by themselves. Indiamart has restricted few types of product categories to be listed on platform. We call it banned categories.
Whenever a user lists its products it goes through a product approval process which checks whether the listed product falls in banned categories or not.
We have a EMSEMBLE model which uses three model’s output for predicting that any product should fall in banned category or not.
We have three models (CBS, SOA & CAT) which predict that a product should fall in banned category or not. By combining prediction of these three models, ENSEMBLE model finally predicts that a product should be banned or not.
Output of ENSEMBLE model then gets audited manually. May products which were predicted as banned by ENSEMBLE model gets approved after manual audit and some products which were predicted as banned by ENSEMBLE model gets banned
Columns meaning in the data set:
- fk_process_id: Process ID = 1
- predicted_ban: =1 means product was predicted as banned product (0 & -1 means predicted as not banned) by EMSEMBLE model
Probability of product being banned
(higher probability means > lower chance of product being banned)
(lower probability means > higher chance of product being banned)
- SUSPECT_BY_SOA_MODEL: whether product was finally marked suspected or not
(1: Suspected, 0: Not suspected)
- SUSPECT_BY_CBS_MODEL: 1 Means product got suspected by CBS model, 0 means got approved
- CAT_Name: Category name in which product was listed
- SOA_MODEL_banned_keyword: Keywords due to which SOA model marked a product suspected
- cbs_model_Suspected_Keyword: Keywords due to which CBS model marked a product suspected.
- rejection_code & audit_ban: When Rejection code = 132 and audit_ban =1 then product is banned after manual approval too
- We want to minimize the manual effort of auditing the products before finally putting in banned or not banned category.
- We need a statistical or machine learning approach by considering different factors (i.e.
Probabilities of different models, whether it was marked suspected by models or not, in which product category it was listed, due to which banned keyword product got banned) and reduce the manual task of auditing as less as possible