chh-crawler

10 Commits 3 Branches 0 Tags

Author	SHA1	Message	Date
Ching L	6d1fffb63d	feat(crawler): add main category support for better classification - Add CATEGORY_MAPPING dictionary to map sub-categories to main categories - Implement get_main_category function to find parent category - Include main_category field in article data structure - Update toot function to display both main and sub categories intelligently - Avoid duplication when main category is the same as sub category	2025-12-09 10:58:01 +08:00
Ching L	3bbe483c64	feat(crawler): add cloudscraper to bypass Cloudflare protection - Replace requests with cloudscraper for image downloading - Update log file path to use home directory logs - Add timeout parameter for image requests to prevent hanging	2025-12-05 21:07:40 +08:00
Ching L	da1969b103	fix(crawler): replace print statements with logger for better logging Updated the crawler to use the logger for outputting article information and toot notifications, enhancing the logging mechanism for improved monitoring and debugging.	2025-04-07 11:33:28 +08:00
Ching L	15addaba24	feat(crawler): update crawler to use RSS feed for article retrieval Replaced HTML scraping with RSS feed parsing to fetch article details including title, URL, author, date, category, content, and image link. This improves reliability and efficiency in gathering articles from the source.	2025-04-07 11:32:00 +08:00
Ching	c5bf60858c	refactor: 修改获取短链逻辑	2024-04-17 10:18:17 +08:00
Ching	a4c7f76216	refactor: Add URL shortening functionality	2024-04-16 14:12:58 +08:00
Ching	c0533a4772	feat(crawler): 修改嘟文格式修改嘟文格式 Signed-off-by: Ching <loooching@gmail.com>	2023-07-17 14:34:03 +08:00
Ching	129df366ed	feat(crawler): 增加 logger，修改发送逻辑增加 logger，修改发送逻辑 Signed-off-by: Ching <loooching@gmail.com>	2023-07-17 11:49:05 +08:00
Ching	5dfbfa5c57	feat(crawler): 增加 chh 爬虫函数和发嘟嘟函数增加 chh 爬虫函数和发嘟嘟函数 Signed-off-by: Ching <loooching@gmail.com>	2023-07-16 21:20:04 +08:00
ching	a8de1b5643	Initial commit	2023-07-16 18:47:14 +08:00