Go Back
LLM large model training data (Hong Kong social comprehensive category 900G)
HK$39,999.00
Payment and currency settlement:
This platform accepts a variety of payment methods, including: VISA, PayPal, Alipay, WeChat Pay, etc. (If the Alipay option is not displayed on the payment page, please refresh or restart the webpage.)
This mall uses Hong Kong dollars (HKD) as the settlement currency. When users use payment tools such as Alipay to pay, the system will automatically convert the Hong Kong dollar amount into RMB according to the current exchange rate for payment.
Delivery and Service:
All products that can be ordered are in stock. After successful payment, the system will automatically ship the goods to your email.
For more information about our services and after-sales policies, please refer to our Terms of Service and Privacy Policy.
This platform accepts a variety of payment methods, including: VISA, PayPal, Alipay, WeChat Pay, etc. (If the Alipay option is not displayed on the payment page, please refresh or restart the webpage.)
This mall uses Hong Kong dollars (HKD) as the settlement currency. When users use payment tools such as Alipay to pay, the system will automatically convert the Hong Kong dollar amount into RMB according to the current exchange rate for payment.
Delivery and Service:
All products that can be ordered are in stock. After successful payment, the system will automatically ship the goods to your email.
For more information about our services and after-sales policies, please refer to our Terms of Service and Privacy Policy.
Quantity
Add to cart
More Details
Product Name:
Hong Kong Social Comprehensive Dataset (1850–2024)
Overview:
The dataset is a carefully curated collection covering multiple areas of Hong Kong society, including local news, industry figures, legal system, academic, humanities, and financial data, spanning two centuries (1850–2024). It provides rich resources for training large language models (LLMs) and AI algorithms, and is suitable for tasks such as text generation, sentiment analysis, and knowledge retrieval.
Data Format:
- Text files: structured and unstructured text in .txt, .csv and .json formats for easy integration into the LLM training framework.
- Metadata: Contains metadata such as publication date, author information, and source details in .csv and .json formats.
- Annotations: Pre-annotated datasets for natural language processing tasks including entity recognition and topic classification (in .json or .xml format).
Data collection and source:
This dataset is collected from authoritative sources including:
- News Archive: Local newspapers and media covering political, social and economic events from 1850 to 2024.
- Industry Figures: Biographical data on key figures in various industries in Hong Kong, including business, finance and politics.
- Legal Documents: The latest Hong Kong laws, regulations and government announcements, providing legal and social background information.
- Academic Collection: Academic articles and research reports from Hong Kong universities and think tanks.
- Humanities and cultural data: Humanities texts, art reviews and social trends reflecting the cultural development of Hong Kong.
- Financial Data: Historical and real-time data from the Hong Kong financial center, including stock market indices and economic reports.
Data preprocessing and training methods:
- Pre-processing: Data undergoes rigorous cleansing, normalization, and tokenization to ensure sensitive information is filtered out and privacy regulations are adhered to.
- Training methods: Optimized for the latest LLM architectures such as transformer, GPT, etc. The dataset contains fine-tuning instructions for specific use cases such as chatbot development, summary generation, or sentiment analysis.
- Augmentation Techniques: To improve the robustness of the data, the dataset also includes augmentation techniques such as paraphrasing, synonym replacement, and sentence rearrangement.
Update:
- 2024 Update: The dataset contains the latest data from 2024, ensuring that the models trained with this dataset can reflect the latest legal, economic, and social environment in Hong Kong.
- Continuous Update Support: Regular updates are provided to ensure that the dataset keeps pace with the evolving social landscape of Hong Kong. Updates are available to purchasers through subscription or direct download.
Delivery process:
- Purchase: Users can select this dataset on the platform.
- Payment: Complete the transaction through a secure payment process.
- Delivery: After payment is confirmed, the user will receive a download link or data transfer instructions, and the delivery method will be customized according to the user's storage device.
release date:
September 19, 2024
Update Package:
- Version control: The dataset is released using version control, and update packages for new data are provided.
- Update frequency: Update packages will be released every six months, or upon request from premium subscribers.