Find Jobs
Hire Freelancers

Using Python Machine Learning for Spam Filter

$10-30 USD

Đã đóng
Đã đăng vào khoảng 4 năm trước

$10-30 USD

Thanh toán khi bàn giao
Making use of the email module, write a function load_tokens(email_path) that reads the email at the specified path, extracts the tokens from its message, and returns them as a list. Specifically, you should use the email.message_from_file(file_obj) function to create a message object from the contents of the file, and the email.iterators.body_line_iterator(message) function to iterate over the lines in the message. Here, tokens are considered to be contiguous substrings of non-whitespace characters. Write a function log_probs(email_paths, smoothing) that returns a dictionary from the words contained in the given emails to their Laplace-smoothed log-probabilities. Specifically, if the set V denotes the vocabulary of words in the emails, then the probabilities should be computed by taking the logarithms of hw5-eqn1 where w is a word in the vocabulary V, α is the smoothing constant (typically in the range 0 < α ≤ 1), and <UNK> denotes a special word that will be substituted for unknown tokens at test time. Write an initialization method __init__(self, spam_dir, ham_dir, smoothing) in the SpamFilter class that creates two log-probability dictionaries corresponding to the emails in the provided spam and ham directories, then stores them internally for future use. Also compute the class probabilities P(spam) and P(¬spam) based on the number of files in the input directories. [25 points] Write a method is_spam(self, email_path) in the SpamFilter class that returns a Boolean value indicating whether the email at the given file path is predicted to be spam. Tokens which were not encountered during the training process should be converted into the special word "<UNK>" in order to avoid zero probabilities. Recall from the lecture slides that for a given class c ∈ {spam, ¬spam}, hw5-eqn2where the normalization constant 1 / P(document) is the same for both classes and can therefore be ignored. Here, the count of a word is computed over the input document to be classified. These computations should be computed in log-space to avoid underflow. Suppose we define the spam indication value of a word w to be the quantity hw5-eqn3 Similarly, define the ham indication value of a word w to be hw5-eqn4Write a pair of methods most_indicative_spam(self, n) and most_indicative_ham(self, n) in the SpamFilter class which return the n most indicative words for each category, sorted in descending order based on their indication values. You should restrict the set of words considered for each method to those which appear in at least one spam email and one ham email. The probabilities computed within the __init__(self, spam_dir, ham_dir, smoothing) method are sufficient to calculate these quantities.
Mã dự án: 24450738

Về dự án

7 đề xuất
Dự án từ xa
Hoạt động 4 năm trước

Bạn muốn kiếm tiền?

Lợi ích khi chào giá trên Freelancer

Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
7 freelancer chào giá trung bình $205 USD cho công việc này
Avatar người dùng
i am a python script and on bases of project description within 24 hours max i will deliver you the python script
$20 USD trong 1 ngày
4,8 (17 nhận xét)
5,1
5,1
Avatar người dùng
Hello! I'd like to deliver fraud assesment model for incoming letters. I'm familiar with theory of probability and computer science. I'll do the job blazingly fast. Please, give me a try!
$896 USD trong 1 ngày
5,0 (8 nhận xét)
4,7
4,7
Avatar người dùng
Greetings, I am an experienced developer/programmer with an on-time delivery target. Kindly feel free to get in touch to initiate the project. Thanks & Regards.
$250 USD trong 3 ngày
4,7 (21 nhận xét)
4,5
4,5
Avatar người dùng
Hi ! I have read your project description .I have 2 year experience of Machine Learning .Deep Learning, Computer Vision and Natural Language Processing (NLP). I have worked on the following projects : - Face Recognition - Price Predictions - Stock market Prediction - Image classifications - Spam filtering - Voice Recognition - Object Detection - Object Clasification using CNN - Object Detection using R-CNN, FAST R-CNN,FASTER R-CNN Tool i use :- - Python - opencv - R - Matlab - Numpy/ pandas/matplotlib - scikit learn - tensorflow - keras - yolo api - pytorch
$20 USD trong 7 ngày
5,0 (11 nhận xét)
4,3
4,3
Avatar người dùng
am an expert in python and machine learning.. .....................................................................................................................................................
$200 USD trong 7 ngày
4,5 (9 nhận xét)
4,2
4,2
Avatar người dùng
I am a python, software architecture, machine learning (ml) Expert and have got plenty of experience in these kinds of work I have done the same sort of project before and I am available from now Looking for a positive response
$30 USD trong 7 ngày
0,0 (0 nhận xét)
0,0
0,0
Avatar người dùng
A spam filter is a program that is used to detect unsolicited and unwanted email and prevent those messages from getting to a user's inbox. Like other types of filtering programs, a spam filter looks for certain criteria on which it bases judgments.
$20 USD trong 7 ngày
0,0 (0 nhận xét)
0,0
0,0

Về khách hàng

Cờ của UNITED STATES
State College, United States
5,0
4
Phương thức thanh toán đã xác thực
Thành viên từ thg 2 4, 2020

Xác thực khách hàng

Cảm ơn bạn! Chúng tôi đã gửi email chứa đường link để bạn lấy tín dụng miễn phí.
Đã xảy ra lỗi trong khi gửi email của bạn. Hãy thử lại.
Người Dùng Đã Đăng Ký Tổng Số Việc Đã Đăng
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Đang tải xem trước
Đã cấp quyền truy cập vị trí.
Phiên đăng nhập của bạn đã hết hạn và bạn đã bị đăng xuất. Hãy đăng nhập lại.