Web Scraping from ~20 websites: retrieve simple information from catalogs of biological data

  • Tình trạng: Closed
  • Giải thưởng: $300
  • Các bài thi đã nhận: 4
  • Người chiến thắng: rkfcccccc

Tóm tắt cuộc thi

Dear all,

I am looking for a specialist in data scrapping/python who could help me retrieve information from a few websites, using a list of keywords or a uniform structure of web-pages of these websites.

In brief, there are ~20 websites that contain catalogs of germs and their optimal growth temperatures. I need to extract the name of these germs and the values of their optimal growth temperatures, then merge these data into one .csv file, remove duplicates, and make a few minor changes to this file to make it ready to use by others. This way, we can create a unified and the largest list of germs with experimentally determined growth conditions (instead of having 20 different and difficult-to-use websites). The rest are details.

How our work will be organized:

Step 1
I use this contest to choose a person who can do the work. To win this contest, you need to retrieve names of germs and their growth temperatures from these two databases and show me a random print screen (or any other evidence) of the output files:

Database A:

You will see a table of germs (Trametes versicolor, Aspergillus brasiliensis, etc) in which each germ having a link to its unique web page. Here is the page that describes the first germ in the list, Trametes versicolor: https://catalog.bcrc.firdi.org.tw/BcrcContent?bid=36525

Each page of this database is organized uniformly, with germ’s names following the word “Organism:” and the optimal growth temperature value following the word “Growth conditions:”

The expected output file describing this database will be a simple .csv file with two columns:

Organism Temperature (°C)
Trametes versicolor 26
Aspergillus brasiliensis 24
Aspergillus flavus 24
and so on.

Database B:

Here you can see a full catalog where you need to click a certain letter or number, pick a “microorganism” (=germ) to get to a page that contains the germ’s name (first two words on the page), and its optimal growth temperature (the last value on the page).

For instance, just go to the Alphabetical List of Species,
then pick the letter “A”,
and the pick the first germ in the list,
it will bring me to the page that contains the germ’s name “Granulicatella adiacens” and its growth temperature “37 °C”

As in the previous case, the output file for this database will be a simple .csv file with two columns:

Organism Temperature (°C)
Granulicatella adiacens 37
Abiotrophia defectiva 37
etc. etc. etc.

Step 2
When I see that you can deliver, we have a brief Zoom/Skype meeting to discuss the rest of the project. I award you the project and we work for a couple of days to finish everything up before I release the payment and we call it a deal.

Attached is the list of the remaining databases, so that you understand the scale and the complexity of the project. I am expecting to work together with you, brainstorming, suggesting shortcuts and solutions and accepting that some websites will be too hard to tackle.

Các kĩ năng yêu cầu

Phản hồi của người thuê

“Egor is exceptional and highly prolific! Pleasure working with him and will definitely consider for my future projects.”

Hình ảnh hồ sơ sergeyvmelnikov, United States.

Những bài dự thi tốt nhất dự cuộc thi này

Xem thêm bài dự thi

Bảng thông báo công khai

  • ashikmohann
    • cách đây 2 tháng

    hi, is the contest closed?

    • cách đây 2 tháng
  • bsharp101
    • cách đây 2 tháng

    Are entries still accepted? Thanks in advance

    • cách đây 2 tháng
    1. sergeyvmelnikov
      Chủ cuộc thi
      • cách đây 2 tháng

      I will clarify this point by today's 12 pm UK time as I am still figuring out if I can proceed with entry #3 .

      • cách đây 2 tháng
    2. bsharp101
      • cách đây 2 tháng

      Ok, thanks for clarifying and waiting for your confirmation

      • cách đây 2 tháng

Làm thế nào để bắt đầu với cuộc thi

  • Đăng cuộc thi của bạn

    Đăng cuộc thi của bạn Nhanh chóng và dễ dàng

  • Nhận được vô số bài dự thi

    Nhận được vô số Bài dự thi Từ khắp nơi trên thế giới

  • Trao giải cho bài thi xuất sắc nhất

    Trao giải cho bài thi xuất sắc nhất Download File - Đơn giản!

Đăng cuộc thi ngay hoặc tham gia với chúng tôi ngay hôm nay!