mirror of
https://github.com/TeamWiseFlow/wiseflow.git
synced 2025-01-23 10:50:25 +08:00
.. | ||
__init__.py | ||
action_dict_scraper.py | ||
deep_scraper.py | ||
mp_scraper.py | ||
README_EN.md | ||
README.md | ||
scraper_data.py |
Custom Scraper Registration
Register in core/scrapers/__init__.py
, for example:
from .mp import mp_scarper
customer_scrapers = {'mp.weixin.qq.com': mp_scarper}
Note that the key should use the domain name, which can be obtained using urllib.parse
:
from urllib.parse import urlparse
parsed_url = urlparse("site's url")
domain = parsed_url.netloc