Using Python Machine Learning for Spam Filter

$10-30 USD

Đã đóng

Đã đăng vào

khoảng 4 năm trước

$10-30 USD

Thanh toán khi bàn giao

Making use of the email module, write a function load_tokens(email_path) that reads the email at the specified path, extracts the tokens from its message, and returns them as a list. Specifically, you should use the email.message_from_file(file_obj) function to create a message object from the contents of the file, and the email.iterators.body_line_iterator(message) function to iterate over the lines in the message. Here, tokens are considered to be contiguous substrings of non-whitespace characters. Write a function log_probs(email_paths, smoothing) that returns a dictionary from the words contained in the given emails to their Laplace-smoothed log-probabilities. Specifically, if the set V denotes the vocabulary of words in the emails, then the probabilities should be computed by taking the logarithms of hw5-eqn1 where w is a word in the vocabulary V, α is the smoothing constant (typically in the range 0 < α ≤ 1), and <UNK> denotes a special word that will be substituted for unknown tokens at test time. Write an initialization method __init__(self, spam_dir, ham_dir, smoothing) in the SpamFilter class that creates two log-probability dictionaries corresponding to the emails in the provided spam and ham directories, then stores them internally for future use. Also compute the class probabilities P(spam) and P(¬spam) based on the number of files in the input directories. [25 points] Write a method is_spam(self, email_path) in the SpamFilter class that returns a Boolean value indicating whether the email at the given file path is predicted to be spam. Tokens which were not encountered during the training process should be converted into the special word "<UNK>" in order to avoid zero probabilities. Recall from the lecture slides that for a given class c ∈ {spam, ¬spam}, hw5-eqn2where the normalization constant 1 / P(document) is the same for both classes and can therefore be ignored. Here, the count of a word is computed over the input document to be classified. These computations should be computed in log-space to avoid underflow. Suppose we define the spam indication value of a word w to be the quantity hw5-eqn3 Similarly, define the ham indication value of a word w to be hw5-eqn4Write a pair of methods most_indicative_spam(self, n) and most_indicative_ham(self, n) in the SpamFilter class which return the n most indicative words for each category, sorted in descending order based on their indication values. You should restrict the set of words considered for each method to those which appear in at least one spam email and one ham email. The probabilities computed within the __init__(self, spam_dir, ham_dir, smoothing) method are sufficient to calculate these quantities.

Python

Software Architecture

Machine Learning (ML)

Mã dự án: 24450738

Về dự án

7 đề xuất

Dự án từ xa

Hoạt động 4 năm trước

Bạn muốn kiếm tiền?

Địa chỉ email

Lợi ích khi chào giá trên Freelancer

Thiết lập ngân sách và thời gian

Nhận thanh toán cho công việc

Phác thảo đề xuất của bạn

Miễn phí đăng ký và cháo giá cho công việc

7 freelancer chào giá trung bình $205 USD cho công việc này

@letsstartcoding

i am a python script and on bases of project description within 24 hours max i will deliver you the python script

$20 USD trong 1 ngày

4,8

(17 nhận xét)

5,1

@fabienbenoit1984

Hello! I'd like to deliver fraud assesment model for incoming letters. I'm familiar with theory of probability and computer science. I'll do the job blazingly fast. Please, give me a try!

$896 USD trong 1 ngày

5,0

(8 nhận xét)

4,7

@Aliascorp

Greetings, I am an experienced developer/programmer with an on-time delivery target. Kindly feel free to get in touch to initiate the project. Thanks & Regards.

$250 USD trong 3 ngày

4,7

(21 nhận xét)

4,5

@shaheryartariq90

Hi ! I have read your project description .I have 2 year experience of Machine Learning .Deep Learning, Computer Vision and Natural Language Processing (NLP). I have worked on the following projects : - Face Recognition - Price Predictions - Stock market Prediction - Image classifications - Spam filtering - Voice Recognition - Object Detection - Object Clasification using CNN - Object Detection using R-CNN, FAST R-CNN,FASTER R-CNN Tool i use :- - Python - opencv - R - Matlab - Numpy/ pandas/matplotlib - scikit learn - tensorflow - keras - yolo api - pytorch

$20 USD trong 7 ngày

5,0

(11 nhận xét)

4,3

@braincenter

am an expert in python and machine learning.. .....................................................................................................................................................

$200 USD trong 7 ngày

4,5

(9 nhận xét)

4,2

@DjSalman

I am a python, software architecture, machine learning (ml) Expert and have got plenty of experience in these kinds of work I have done the same sort of project before and I am available from now Looking for a positive response

$30 USD trong 7 ngày

0,0

(0 nhận xét)

0,0

@RahulBadhan

A spam filter is a program that is used to detect unsolicited and unwanted email and prevent those messages from getting to a user's inbox. Like other types of filtering programs, a spam filter looks for certain criteria on which it bases judgments.

$20 USD trong 7 ngày