LLM large model training data (Macau social comprehensive data category 229G)
HK$19,999.00
This platform accepts a variety of payment methods, including: VISA, PayPal, Alipay, WeChat Pay, etc. (If the Alipay option is not displayed on the payment page, please refresh or restart the webpage.)
This mall uses Hong Kong dollars (HKD) as the settlement currency. When users use payment tools such as Alipay to pay, the system will automatically convert the Hong Kong dollar amount into RMB according to the current exchange rate for payment.
Delivery and Service:
All products that can be ordered are in stock. After successful payment, the system will automatically ship the goods to your email.
For more information about our services and after-sales policies, please refer to our Terms of Service and Privacy Policy.
Macao Social Comprehensive Dataset (1850–2024)
Overview:
The dataset comprehensively covers all aspects of Macau society, including local news, industry figures, the latest social system and laws, academic research, cultural humanities, and financial center data. The time span is from 1850 to 2024, suitable for the training of large language models (LLM) and AI algorithms, and supports a variety of natural language processing tasks such as text generation, knowledge question answering, and sentiment analysis.
Data Format:
- Text file: The data format is .txt, .csv and .json, supporting structured and unstructured text, which is convenient for importing into the LLM training framework.
- Metadata: Provide detailed metadata, such as source, time, author, etc., in .csv and .json formats.
- Annotated data: Some datasets provide pre-annotated annotations, such as entity recognition and text classification, in .json or .xml format.
Data collection and source:
The datasets are sourced from various authoritative resources in Macau, including:
- News Archives: A collection of local Macau newspapers and news reports from 1850 to 2024, covering major events in the political, social and economic fields.
- Industry Figures: covers biographical data of famous people in Macau from all walks of life, including important figures in the fields of finance, culture, politics, etc.
- Legal Documents: Contains the latest laws and regulations, government announcements and social systems in Macao, providing rich data support for legal and social research.
- Academic Literature: It collects academic papers and research results from Macao, covering multiple disciplines.
- Cultural and humanities data: covers Macau’s cultural heritage, art reviews and social changes, showing Macau’s unique cultural landscape.
- Financial data: including data on the Macau financial center, such as economic reports, market indices, etc., providing a rich foundation for financial research.
Data preprocessing and training methods:
- Preprocessing: The data set undergoes standardized processing, including text cleaning, deduplication, sensitive information filtering, and other steps to ensure the high quality and compliance of the data.
- Training methods: Optimized for mainstream LLM training frameworks such as transformer, GPT, etc. The data package comes with a fine-tuning guide to support specific applications such as chatbots and summary generation.
- Data augmentation: The dataset is augmented through technical means, such as text paraphrase, synonym replacement, and random sentence ordering, to ensure diversity in model training.
Update:
- Data update to 2024: The dataset contains the latest data to 2024, ensuring that the model captures the latest social, legal, and economic developments in Macau.
- Continuous update support: The dataset supports regular updates. Purchasers can obtain the latest patch packages by subscribing to ensure the timeliness of the data.
Delivery process:
- Purchase: Users select and purchase data packages on the platform.
- Payment: After completing the payment, the user will receive a download link or data transfer instructions.
- Data delivery: Users can download data to a local storage device to complete data acquisition.
release date:
September 19, 2024
Update Package:
- Version control: The dataset version control is clear, and incremental update packages for new data are provided at any time.
- Update frequency: twice a year, or customized update service based on user needs.