wiseflow

mirror of https://github.com/TeamWiseFlow/wiseflow.git synced 2025-01-23 10:50:25 +08:00

History

bigbrother666sh 77c3914d12 method to seperate links area from content		2025-01-16 10:56:57 +08:00
..
__init__.py	add weixin scrapers	2025-01-14 20:39:28 +08:00
action_dict_scraper.py	add weixin scrapers	2025-01-14 20:39:28 +08:00
deep_scraper.py	method to seperate links area from content	2025-01-16 10:56:57 +08:00
mp_scraper.py	method to seperate links area from content	2025-01-16 10:56:57 +08:00
README_EN.md	modify scrapers	2025-01-12 16:22:37 +08:00
README.md	add weixin scrapers	2025-01-14 20:39:28 +08:00
scraper_data.py	new deep scraper	2025-01-15 00:33:41 +08:00

Custom Scraper Registration

from .mp import mp_scarper

customer_scrapers = {'mp.weixin.qq.com': mp_scarper}

Note that the key should use the domain name, which can be obtained using urllib.parse:

from urllib.parse import urlparse

parsed_url = urlparse("site's url")
domain = parsed_url.netloc