Create NLTK corpus reader customized for structured .txt file from Proquest

$10-30 USD

Đã hoàn thành

Đã đăng vào

khoảng 5 năm trước

$10-30 USD

Thanh toán khi bàn giao

I need a customized corpus reader that will take a plain .txt file that I have downloaded from Proquest and process it so that I can use in further analysis. The .txt file is structured in a consistent way with standardized tags identifying data fields. These tags are either words or phrases that start with a capitalized letter and that end with ': '. I need two types of data extracted from the plain .txt file. First, I need the 'Full text: ' field (i.e., everything that comes after this tag and before the next tag, which is "Subject: ") extracted with a unique identifier (which could be the existing unique identifier i the file, which is indicated by the tag "ProQuest document ID: ") so that it can be parsed at word, phrase, sentence and paragraph level by NLTK. Second, I need all of the fields EXCEPT the 'Full text: " field extracted and placed into a Pandas dataframe.

Data Mining

Data Processing

Machine Learning (ML)

Python

Software Architecture

Mã dự án: 19149534

Về dự án

3 đề xuất

Dự án từ xa

Hoạt động 5 năm trước

Bạn muốn kiếm tiền?

Địa chỉ email

Lợi ích khi chào giá trên Freelancer

Thiết lập ngân sách và thời gian

Nhận thanh toán cho công việc

Phác thảo đề xuất của bạn

Miễn phí đăng ký và cháo giá cho công việc

Đã trao cho:

@kojecka

Hello. I do have extensive experience in data processing, including text files parsisng, like this one. I'm very confident Ì can deliver exactly what you need. Looking forward to work with you.

$25 USD trong 1 ngày

4,8

(1 nhận xét)

0,8

3 freelancer chào giá trung bình $38 USD cho công việc này

@BWebscraper

Hi, I will be able to deliver this task within the deadlines. Please let me know if you need help Thanks Bharath.k

$55 USD trong 1 ngày

0,0

(0 nhận xét)

0,0

@mohitgarg67

Hey, I can help you in extracting words and phrases from the .txt file based on your conditions which that can be further processed by NLTK word tokenizer. Contact me for further discussion.

$35 USD trong 1 ngày