
AI large model training data package (Advanced Mathematics 1986G) (This series includes 4 sets of data packages to choose from)
HK$49,999.00 - HK$149,999.00
HK$299,999.00
Advanced Mathematics 1986G)
Payment and Currency Settlement:
Accepted payment methods: VISA, Alipay, etc.
Other payment methods available upon contacting customer service.
Settlement currency: Hong Kong Dollars (HKD).
Automatic currency conversion to local currency at current exchange rates.
Delivery and Service:
All listed products are in stock.
Automatic delivery via email upon successful payment.
Detailed service and after-sales policies available in our Terms of Service and Privacy Policy.
AI Large Model Training Data Pack (Advanced Mathematics 1986G) Introduction:
- On March 22, 2025, the latest advanced mathematics data package (1986G) was released . The repetition rate between this data package and the mathematics data package (516G) does not exceed 1%.
- The data comes from questions, answers, materials , etc. at the university level or above obtained from relevant information channels of world-renowned universities , and is comprehensively generated by allowing major model technologies to use a mixture of text reasoning and code blocks executed by the Python interpreter to generate solutions.
- The university directory in the data package includes: Yale University, Harvard University, Oxford University, New York University, University of Chicago, Cambridge University, etc. Teaching problem-solving data and teaching materials, etc. The data set is divided into training and validation subsets that we use in the ablation experiment.
- The LLM large model training data package (Advanced Mathematics class) contains the following fields :
Question: Advanced math questions from prestigious schools and other sources.
generated_solution: The solution generated using a mix of textual reasoning and code blocks.
expected_answer: The true answer provided in the original dataset.
predict_answer: The answer predicted by the Mixtral model in the corresponding solution (from which \boxed{} is extracted).
error_message: <not_executed> if the code was not used. Otherwise empty or contains a Python exception from the corresponding code block. The string timeout indicates that the code block took more than 10 seconds to execute. In the current dataset version, we always stop generating after any error or timeout.
is_correct: Whether our scoring script considers the final answer correct.
Dataset: neuronicx_math_high or neuronicxLLM-math_high.
generation_type: without_reference_solution or masked_reference_solution.
Other supplements: Due to the large amount of data and the numerous formats, in addition to organizing the data into the above format suitable for LLM training, we have provided additional data explanations and supplements for more complex questions. Some of the data that need to be supplemented will have relevant data features added (such as adding more fields and formats).
Original data set: Since some mathematics symbols are too complex, they are generally trained directly with original data, so no conversion to Jason format is performed.
( LLM training format example Jason format data example)
- The data package contains: original data, processed data, data package user manual, etc., which can be used directly for large model training. The total volume is about 2000G, which contains about 100 million high-level mathematical data, most of which are in the formats of documents, text, Jason, Latex, pictures, videos, etc.
- Due to the large amount of data, this data is divided into 4 data packages, each package is 49,000 Hong Kong dollars, each package contains about 30 million advanced mathematical data. Buy 4 packages together to enjoy a 31% discount.
Release date: March 22, 2025 (This data package will update and increase the data volume every 3 months. Users who have purchased it can get the latest data for free in the download link)
( LLM large model raw data example LaTex format example)
( Example of the effect of using the LLM data package after training )
When purchasing multiple data packages on the official website, you can use the following discount codes to get discounts.
- 10% discount code: LLM10 (use when purchasing 2 Chegg data packages to get a 10% discount)
- 20% discount code: LLM20 (use when purchasing 4 Chegg data packages to get a 20% discount)
- 30% discount code: LLM30 (use when purchasing 6 Chegg data packages to get a 30% discount)
- 40% discount code: LLM40 (use when purchasing 8 Chegg data packages to get a 40% discount)
- 50% discount code: LLM50 (use when purchasing 10 Chegg data packages to get a 50% discount)
Note: If the amount of the self-service order placed on the official website is large, it may not be possible to pay, and you need to contact customer service to obtain a large payment method.