Go Back
LLM large model training data (Singapore social comprehensive category 500G)
HK$29,999.00
Payment and currency settlement:
This platform accepts a variety of payment methods, including: VISA, PayPal, Alipay, WeChat Pay, etc. (If the Alipay option is not displayed on the payment page, please refresh or restart the webpage.)
This mall uses Hong Kong dollars (HKD) as the settlement currency. When users use payment tools such as Alipay to pay, the system will automatically convert the Hong Kong dollar amount into RMB according to the current exchange rate for payment.
Delivery and Service:
All products that can be ordered are in stock. After successful payment, the system will automatically ship the goods to your email.
For more information about our services and after-sales policies, please refer to our Terms of Service and Privacy Policy.
This platform accepts a variety of payment methods, including: VISA, PayPal, Alipay, WeChat Pay, etc. (If the Alipay option is not displayed on the payment page, please refresh or restart the webpage.)
This mall uses Hong Kong dollars (HKD) as the settlement currency. When users use payment tools such as Alipay to pay, the system will automatically convert the Hong Kong dollar amount into RMB according to the current exchange rate for payment.
Delivery and Service:
All products that can be ordered are in stock. After successful payment, the system will automatically ship the goods to your email.
For more information about our services and after-sales policies, please refer to our Terms of Service and Privacy Policy.
Quantity
Add to cart
More Details
Product Name:
Singapore Social Comprehensive Dataset (1850–2024.9)
Overview:
The dataset covers multiple areas of Singaporean society, including local news, industry figures, social systems and laws, academic research, cultural humanities, and financial data. The data spans from 1850 to 2024, providing rich training data resources for large language models (LLMs) and AI algorithms, and is suitable for natural language processing tasks such as text generation, sentiment analysis, and knowledge retrieval.
Data Format:
- Text file: The data formats are .txt, .csv and .json, supporting structured and unstructured text, which is convenient for integration into the LLM training framework.
- Metadata: Contains detailed metadata, such as source, author, time, etc., in .csv and .json formats.
- Annotated data: Some datasets provide pre-annotated text for tasks such as entity recognition and topic classification in .json or .xml format.
Data collection and source:
This dataset comes from multiple authoritative resources in Singapore, including:
- News Archives: Contains local news reports from Singapore from 1850 to 2024, covering important social, political, economic and other events.
- Industry Figures: Covers biographical data of important figures in various industries in Singapore, covering areas such as business, technology, finance, culture, etc.
- Legal documents: including the latest laws, regulations and government announcements, providing the latest social and legal background information.
- Academic Literature: Academic papers collected from universities and research institutions in Singapore, covering multidisciplinary research fields.
- Humanities and cultural data: covers Singapore’s cultural heritage, art criticism, social changes, etc., showcasing the uniqueness of its multiculturalism.
- Financial Data: Includes Singapore’s financial data, market reports and economic trend analysis, providing valuable data support for financial research.
Data preprocessing and training methods:
- Preprocessing: The data undergoes rigorous cleaning and standardization to ensure data integrity and compliance, and to meet privacy protection and data security standards.
- Training methods: Optimized to support mainstream LLM architectures such as Transformer and GPT, providing fine-tuning guides for specific tasks such as automated text generation and question answering systems.
- Data augmentation: The dataset is expanded through techniques such as synonym replacement and sentence rearrangement to increase diversity and improve the robustness of model training.
Update:
- 2024 Update: The dataset contains the latest data from 2024 to ensure that the model can reflect Singapore’s latest social, economic and legal environment.
- Continuous update support: The dataset supports continuous updates to ensure that it keeps pace with the latest developments in Singapore society. Users can obtain the latest update package by subscribing.
Delivery process:
- Purchase: Users can select and purchase datasets on the platform.
- Payment: After completing the payment, the user will receive a download link or data transfer instructions.
- Data delivery: Users can download data to a local storage device to complete data acquisition.
release date:
September 19, 2024
Update Package:
- Version control: The dataset version is clear, and each update is accompanied by a version number to ensure that users can obtain the latest data at any time.
- Update frequency: Regular updates, twice a year, or more frequent updates based on user demand.