Create NLTK corpus reader customized for structured .txt file from Proquest
$10-30 USD
Đã hoàn thành
Đã đăng vào khoảng 5 năm trước
$10-30 USD
Thanh toán khi bàn giao
I need a customized corpus reader that will take a plain .txt file that I have downloaded from Proquest and process it so that I can use in further analysis. The .txt file is structured in a consistent way with standardized tags identifying data fields. These tags are either words or phrases that start with a capitalized letter and that end with ': '. I need two types of data extracted from the plain .txt file. First, I need the 'Full text: ' field (i.e., everything that comes after this tag and before the next tag, which is "Subject: ") extracted with a unique identifier (which could be the existing unique identifier i the file, which is indicated by the tag "ProQuest document ID: ") so that it can be parsed at word, phrase, sentence and paragraph level by NLTK. Second, I need all of the fields EXCEPT the 'Full text: " field extracted and placed into a Pandas dataframe.
Hello.
I do have extensive experience in data processing, including text files parsisng, like this one. I'm very confident Ì can deliver exactly what you need.
Looking forward to work with you.
$25 USD trong 1 ngày
4,8 (1 nhận xét)
0,8
0,8
3 freelancer chào giá trung bình $38 USD cho công việc này
Hey, I can help you in extracting words and phrases from the .txt file based on your conditions which that can be further processed by NLTK word tokenizer.
Contact me for further discussion.