Making use of the email module, write a function load_tokens(email_path) that reads the email at the specified path, extracts the tokens from its message, and returns them as a list.
Specifically, you should use the email.message_from_file(file_obj) function to create a message object from the contents of the file, and the email.iterators.body_line_iterator(message) function to iterate over the lines in the message. Here, tokens are considered to be contiguous substrings of non-whitespace characters.
Write a function log_probs(email_paths, smoothing) that returns a dictionary from the words contained in the given emails to their Laplace-smoothed log-probabilities. Specifically, if the set V denotes the vocabulary of words in the emails, then the probabilities should be computed by taking the logarithms of
hw5-eqn1
where w is a word in the vocabulary V, α is the smoothing constant (typically in the range 0 < α ≤ 1), and <UNK> denotes a special word that will be substituted for unknown tokens at test time.
Write an initialization method __init__(self, spam_dir, ham_dir, smoothing) in the SpamFilter class that creates two log-probability dictionaries corresponding to the emails in the provided spam and ham directories, then stores them internally for future use. Also compute the class probabilities P(spam) and P(¬spam) based on the number of files in the input directories.
[25 points] Write a method is_spam(self, email_path) in the SpamFilter class that returns a Boolean value indicating whether the email at the given file path is predicted to be spam. Tokens which were not encountered during the training process should be converted into the special word "<UNK>" in order to avoid zero probabilities.
Recall from the lecture slides that for a given class c ∈ {spam, ¬spam},
hw5-eqn2where the normalization constant 1 / P(document) is the same for both classes and can therefore be ignored. Here, the count of a word is computed over the input document to be classified.
These computations should be computed in log-space to avoid underflow.
Suppose we define the spam indication value of a word w to be the quantity
hw5-eqn3
Similarly, define the ham indication value of a word w to be
hw5-eqn4Write a pair of methods most_indicative_spam(self, n) and most_indicative_ham(self, n) in the SpamFilter class which return the n most indicative words for each category, sorted in descending order based on their indication values. You should restrict the set of words considered for each method to those which appear in at least one spam email and one ham email. The probabilities computed within the __init__(self, spam_dir, ham_dir, smoothing) method are sufficient to calculate these quantities.
Hello! I'd like to deliver fraud assesment model for incoming letters. I'm familiar with theory of probability and computer science. I'll do the job blazingly fast. Please, give me a try!
Greetings,
I am an experienced developer/programmer with an on-time delivery target.
Kindly feel free to get in touch to initiate the project.
Thanks & Regards.
Hi !
I have read your project description .I have 2 year experience of Machine Learning .Deep Learning, Computer Vision and Natural Language Processing (NLP). I have worked on the following projects :
- Face Recognition
- Price Predictions
- Stock market Prediction
- Image classifications
- Spam filtering
- Voice Recognition
- Object Detection
- Object Clasification using CNN
- Object Detection using R-CNN, FAST R-CNN,FASTER R-CNN
Tool i use :-
- Python
- opencv
- R
- Matlab
- Numpy/ pandas/matplotlib
- scikit learn
- tensorflow
- keras
- yolo api
- pytorch
am an expert in python and machine learning.. .....................................................................................................................................................
I am a python, software architecture, machine learning (ml) Expert and have got plenty of experience in these kinds of work
I have done the same sort of project before and I am available from now
Looking for a positive response
A spam filter is a program that is used to detect unsolicited and unwanted email and prevent those messages from getting to a user's inbox. Like other types of filtering programs, a spam filter looks for certain criteria on which it bases judgments.