Đã Đóng

fine-tuning the XLSR-Wav2vec 2.0 pre-trained model for the Turkish language and Hungarian language

[login to view URL]%97_Transformers.ipynb#scrollTo=LBSYoWbi-45k

This script can be used for Turkish, but a few changes and visualizations here would be better and model output and script should be able to upload my drive.

facebook/wav2vec2-large-xlsr-53 will be pre-trained model.

• Mozilla Common Voice dataset should be used to train the models

• The models must be trained using wav2vec2 architecture [login to view URL]

2 pre-trained models are enough to train:

o wav2vec2-xlsr-53

([login to view URL])

o wav2vec2-xls-r-300m ([login to view URL])

3) Please pay extra attention to this subsection:

You should follow this script:

[login to view URL]

Inside this script, database installation and model trainings are given in detailed way.

Inside script, database is installed in this part:

3.1) Here instead of “common_voice” dataset you should write

“mozilla-foundation/common_voice_9_0” or other versions (7,8)

All other cleaning and pre-processing steps should be the same as in script.

3.2) And here in this script you can deifne pre-trained model that you want to fine-tune

In the above picture “facebook/wav2vec2-large-xlsr-53” pre-trained model is given.

3.3) After you finish the training, last thing you need to do is to boost the final models with n-gram language model (either 4 or 5). Here is the script for it:

[login to view URL]

This script is intended for Swedish language. For Turkish language you can use Turkish Wikipedia dump. You can find link below:

[login to view URL]

You will follow the given script, but you need to use the given Turkish data above. This is the part you need to change

Or you can generate .arpa file by using this extractor directly:

[login to view URL]

To sum up, you need to run the given colab script and boost the final models with n-gram language model.

This is all about experiments.

4) At the end, you need to write results of the trained models, compare them against each other by using charts, graphs, or tables.

The models should be evaluated on 4 metrics:

word error rate (WER)

character error rate (CER).

RTF= time needed for recognizing the full test set / total length of the full test set

memory requirement = peak GPU memory load (during test)

Additionally compare the final Turkish language models with Hungarian models (minimum 2 comparative graphs). you ’need to train the model for Hungarian. I provide already trained ones below:

[login to view URL]

[login to view URL]

check this for getting dataset for Hungarian n-gram (and also helpful script)

Kĩ năng: Python, Deep Learning, Machine Learning (ML), Data Visualization, Xử lí dữ liệu

Về khách hàng:
( 1 Nhận xét ) Budapest, Hungary

ID dự án: #33751920

7 freelancer chào giá trung bình$168 cho công việc này


I have done similar projects to this, please send me a message right away let's get started. I'm a senior engineer with rich experience in Python, Data Processing, Machine Learning (ML), Data Visualization, Deep Learn Thêm

$160 USD trong 5 ngày
(7 Nhận xét)

I am familiar with the wav2vec2 model and its applications. I'm really interested in the project and I am bidding with the least amount to start working on it. Kindly start the chat to discuss about the project.

$140 USD trong 7 ngày
(2 Nhận xét)

Hi @hmdv002. I read your document and saw all links. I have a experience about your project. I am a senior Python programmer with 5+ years of extensive experience. You can read my reviews to check me. I read your job Thêm

$200 USD trong 7 ngày
(2 Nhận xét)
(1 Nhận xét)

Hi, I am a very talented software programmer with 13+ years of development experience (5+ years professional work experience). I am a results-oriented professional and possess experience using cutting-edge development Thêm

$140 USD trong 3 ngày
(4 Nhận xét)

Hi, I have been an academic at a top-ranked engineering university, since 2013. Currently, I am on a sabbatical, residing in the UK, as a stay-home-dad. I have adequate knowledge of the breadth of ML algorithms with a Thêm

$200 USD trong 7 ngày
(1 Nhận xét)
(1 Nhận xét)