Call for contributions to build a benchmark Q&A dataset for evaluating LLM performance in environment and sustainability

Large Language Models (LLMs) are developing so fast. There are also many prospective applications in the environment and sustainability fields. However, there lacks a standard benchmark to test various LLMs and LLM applications for their performance in environment and sustainability fields.

We, the TianGong Team at Tsinghua University, are excited to invite you to work with us to put together a benchmark Q&A dataset to evaluate the applications of LLMs in the environment and sustainability fields. Your contributions would be greatly appreciated as we work together to improve AI’s role in environmental management.

Examples of the Q&A pair:

  • Q: If the primary pollutant in Beijing's air quality on that day is PM2.5, with a 24-hour average concentration of 100 µg/m³, how would you calculate the Air Quality Index (AQI) for that day?
  • A: According to the Technical Regulation on Ambient Air Quality Index (HJ 633-2012) issued by China's Ministry of Environmental Protection, we can calculate the Air Quality Index (AQI) based on the 24-hour average concentration of PM2.5. First, we identify the PM2.5 concentration range. According to the regulation, we find that 100 μg/m³ falls within the third level range of 75-115 μg/m³. The formula for calculating the Individual Air Quality Index (IAQI) for PM2.5 is: IAQI = (IAQIH - IAQIL) / (BPH - BPL) × (C - BPL) + IAQIL, where IAQIH is the upper limit of AQI for the concentration range, IAQIL is the lower limit of AQI for the concentration range, and BPH is the upper limit of concentration range.

You can help in this endeavor by following three simple steps:

  1. Download the “Q&A Template and Instructions.docx”;
  2. Fill out the Q&A pairs and save your document;
  3. Click the link to submit your responses: Submit Your Q&A Pairs

 

Thank you for your support!