wiseflow/core/scrapers
2025-01-16 10:56:57 +08:00
..
__init__.py add weixin scrapers 2025-01-14 20:39:28 +08:00
action_dict_scraper.py add weixin scrapers 2025-01-14 20:39:28 +08:00
deep_scraper.py method to seperate links area from content 2025-01-16 10:56:57 +08:00
mp_scraper.py method to seperate links area from content 2025-01-16 10:56:57 +08:00
README_EN.md modify scrapers 2025-01-12 16:22:37 +08:00
README.md add weixin scrapers 2025-01-14 20:39:28 +08:00
scraper_data.py new deep scraper 2025-01-15 00:33:41 +08:00

Custom Scraper Registration

Register in core/scrapers/__init__.py, for example:

from .mp import mp_scarper

customer_scrapers = {'mp.weixin.qq.com': mp_scarper}

Note that the key should use the domain name, which can be obtained using urllib.parse:

from urllib.parse import urlparse

parsed_url = urlparse("site's url")
domain = parsed_url.netloc