LLM large model training data (Taiwan social comprehensive data 400G)
HK$29,999.00
This platform accepts a variety of payment methods, including: VISA, PayPal, Alipay, WeChat Pay, etc. (If the Alipay option is not displayed on the payment page, please refresh or restart the webpage.)
This mall uses Hong Kong dollars (HKD) as the settlement currency. When users use payment tools such as Alipay to pay, the system will automatically convert the Hong Kong dollar amount into RMB according to the current exchange rate for payment.
Delivery and Service:
All products that can be ordered are in stock. After successful payment, the system will automatically ship the goods to your email.
For more information about our services and after-sales policies, please refer to our Terms of Service and Privacy Policy.
Product Name:
Taiwan Social Comprehensive Dataset
Overview:
The dataset contains rich content from multiple fields of Taiwanese society, including local news, industry figures, the latest social systems and laws, academic research, cultural humanities, and financial data. The data covers the period from 1850 to 2024, providing rich training resources for large language models (LLMs) and AI algorithms, and is suitable for natural language processing tasks such as text generation, sentiment analysis, and knowledge question answering.
Data Format:
- Text files: Use .txt, .csv and .json formats, support structured and unstructured text, and facilitate importing into the LLM training framework.
- Metadata: Provide detailed metadata files, including source, time, author and other information, in .csv and .json formats.
- Annotated data: Provides some annotated datasets, including annotations for tasks such as entity recognition and sentiment analysis, in .json or .xml format.
Contains a small number of images, videos and audio files.
Data collection and source:
This dataset is derived from multiple authoritative resources in Taiwanese society, including:
- News Archives: Contains local news reports from Taiwan from 1850 to 2024, covering major events in the fields of politics, society, and economy.
- Industry Figures: It collects data on important figures in various industries in Taiwan, including leaders in the fields of business, finance, culture, and technology.
- Legal Documents: Contains Taiwan’s latest laws, social systems, and government announcements, providing data support for legal research and social sciences.
- Academic Literature: Academic papers and research reports from universities and research institutions in Taiwan, covering multiple academic fields.
- Humanities and cultural data: reflects Taiwan’s cultural heritage, art criticism, and social changes, and showcases Taiwan’s unique cultural landscape.
- Financial data: including Taiwan’s economic data, market indexes, financial reports, etc., providing valuable data support for financial research.
Data preprocessing and training methods:
- Pre-processing: Data is rigorously cleaned and standardized before import to ensure high data quality and compliance with privacy and security regulations.
- Training methods: Data optimization is used for training LLM architectures such as Transformer and GPT, and specific fine-tuning methods are provided to support multiple tasks, including text generation and knowledge extraction.
- Enhancement technology: including text data expansion technology, such as synonym replacement, text transformation and sentence rearrangement, to improve the diversity and robustness of model training.
Update:
- 2024 Update: The dataset contains the latest data from 2024, helping the model reflect the latest social, economic, and legal developments in Taiwan.
- Continuous update support: Through regular updates, buyers can obtain the latest social and legal data, ensuring that the dataset is consistent with the latest changes in Taiwanese society.
Delivery process:
- Purchase: Users can select and purchase the dataset on the platform.
- Payment: After completing the payment, the user will be notified of the download link or data delivery method.
- Data delivery: Users can download data to local storage devices and the data set will be delivered complete.
release date:
September 19, 2024
Update Package:
- Version control: The dataset uses version control and provides regular update packages.
- Update frequency: Updates twice a year, or personalized based on the needs of premium subscribers.