Đã đóng

Data Preprocessing java code

Dự án này vừa được trao cho dobreiiita với giá $100 USD.

Nhận tin báo giá cho dự án tương tự
Ngân sách dự án
$10 - $30 USD
Tổng đặt giá
Mô tả dự án

In this project, the students are to implement data pre-processing techniques and apply them to a gene expression dataset.

The dataset contains 62 samples collected from colon-cancer patients. 40 of the samples are labeled as ”negative” and 22 are labeled as ”positive.” Each tuple (row) in the dataset is a sample containing the readings for the genes, and the class (which is the last column) of the sample. Each gene is an attribute. The columns are separated by ”,”, which is a commonly used format in data mining. We will refer to the genes as G0, ..., GN, assigned in the left-to-right order as given in the original file.

You will write a C++ or Java program to handle the following two tasks:

Task 1. Task 2.

Discretize the data using equi-density binning with 3 bins for each of the first k attributes.

Use the entropy-based binning method to discretize all genes and to select the top-k genes, ranked in decreasing information gain order. Use 3 bins for each gene. Information gain for three bins is a generalization of the two-bins case (based on size-weighted entropy). To get three bins you should first divide the range of a given attribute into two bins and then divide one of the two bins into two more bins. The two splits should maximize the size-weighted entropy gain for the three intervals. (You should select between the two splits (one for the left interval and one for the right interval) as the the second split based on size-weighted entropy gain.)

Được trao cho:
Các kỹ năng được yêu cầu

Muốn kiếm tiến?

  • Hạn định ngân sách và khung thời gian
  • Mô tả đề xuất của bạn
  • Nhận thanh toán cho công việc của bạn

Thuê những Freelancer đã đặt giá trong dự án này

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online