Find Jobs
Hire Freelancers

Data Mining 1

$10-50 USD

Đã hủy
Đã đăng vào hơn 5 năm trước

$10-50 USD

Thanh toán khi bàn giao
Data Mining (word count: 2000 words) The ‘Hepatitis’ Data set (provided in arff. Format. Will send seperately) contains information about patients affected by the Hepatitis disease. The task is to predict if these patients have or have not hepatitis (Histology: Yes or No). You should use the Weka data mining package, which is installed in the university computers and also available to download from: [login to view URL]~ml/weka/ You should hand in a report covering the following: a) Select a suitable tree building algorithm and build a model. Describe how you split the data for training and testing purposes. Explain the splitting method. (9 points) b) Interpret the output results: - The accuracy rates; - Which attributes were used to make the predictions; - How many nodes and leaves you obtained; - Include a visual tree diagram showing the structure of the model that you built. (12 points, with a greater attention to the accuracy rate interpretation) c) Give a detailed technical description of the classification model: - What tree induction method is utilised; - Which attribute selection criteria is used; - Give an example of how the attributes were selected for growing the tree. (20 points) d) Change the confidence factor to 35%, report any change in the model accuracy, explaining reasons behind the change. (5 points) e) Set the ‘REP’ parameter (Reduced Error Pruning) to ‘TRUE’. Explain the meaning of this operation. Report and explain any change in the model accuracy. (7 points) f) Set the parameter ‘unpruned’ to ‘TRUE’, Report and explain any change in the model accuracy and in the tree structure. Explain which pruning method for this algorithm is used. Carefully explain how pruning was performed. (11 points) g) Report on the model’s comparative ability to other 2 models of your choice (for example, neural networks or SVM or Bayesian network etc.) to predict the class variable. Which model classified data most accurately and what are the possible reasons of its prevalence? (20 points) h) Show a confusion matrix for the model and interpret it. Show a ROC curve and a Lift chart for the decision and interpret them. (6 points) i) Generate a set of rules along the subtree path: Ascites – Class – Spiders – Bilirubin – Sex – Class ‘No’. What would you recommend to reduce the number of rules in the set? Hint: speculate about Support and Confidence. (10 points) Note: the allocated points are given as tentative benchmarks only. The report will be assessed based on overall understanding of the data mining process as well, i.e. relating to the domain of the problem, cross-referencing between the points in the questions, notifying anything interesting etc. The report structure, quality of references will also be evaluated.
Mã dự án: 18337138

Về dự án

1 đề xuất
Dự án từ xa
Hoạt động 5 năm trước

Bạn muốn kiếm tiền?

Lợi ích khi chào giá trên Freelancer

Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
1 freelancer chào giá trung bình $150 USD cho công việc này
Avatar người dùng
I have done MS Software Engineering. I had a course on DATA ENGINEERING, ML and Artificial Intelligence. I know all data mining techniques, Deep learning and data analysis techniques. I have worked on K-mean, ID3, Bayesian theorem, confusion matrix, Hungarian algo and so on .My research was on Rough Set Theory. Tools I use are Matlab, Weka, Python, R studio and RapidMiner. Please see my profile and reviews as well. Thanks
$150 USD trong 3 ngày
4,9 (28 nhận xét)
5,4
5,4

Về khách hàng

Cờ của UNITED KINGDOM
Southampton, United Kingdom
5,0
44
Phương thức thanh toán đã xác thực
Thành viên từ thg 3 20, 2017

Xác thực khách hàng

Cảm ơn bạn! Chúng tôi đã gửi email chứa đường link để bạn lấy tín dụng miễn phí.
Đã xảy ra lỗi trong khi gửi email của bạn. Hãy thử lại.
Người Dùng Đã Đăng Ký Tổng Số Việc Đã Đăng
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Đang tải xem trước
Đã cấp quyền truy cập vị trí.
Phiên đăng nhập của bạn đã hết hạn và bạn đã bị đăng xuất. Hãy đăng nhập lại.