Commit Graph

51 Commits

Author SHA1 Message Date
c469591
c3eef25af3
新增用於windows的入口py和env (#202)
* 新增windows可用的入口py和env文件

* 修改入口py的文件ˇ名以區分不同的操作系統

* 更新V0.3.7的windows入口py,刪除windows.env文件中的個人資訊
2025-01-21 20:38:33 +08:00
bigbrother666sh
3e07d63757 llm wrapper and prompt opz, base url bug, mp scraper opz 2025-01-18 13:47:47 +08:00
bigbrother666sh
dd7d92476e 0.3.7 release 2025-01-17 23:28:22 +08:00
bigbrother666sh
77c3914d12 method to seperate links area from content 2025-01-16 10:56:57 +08:00
bigbrother666sh
aa49216acb new deep scraper 2025-01-15 00:33:41 +08:00
bigbrother666sh
3523b126c7 add weixin scrapers 2025-01-14 20:39:28 +08:00
bigbrother666sh
26bf9a573a modify scrapers 2025-01-12 16:22:37 +08:00
Tusik
eec15ba037
feat: openai SDK使用异步客户端提升效率 (#182)
* feat: async tasks for get_more_related_urls

* feat: max LLM concurrent number

* fix(core/agents/get_info.py): 降低默认并发数量

* fix(get_info.py): 在部分模型指定response_format为json的情况下需要在prompt中显式的指明json格式

* fix: text长度不足的情况

* feat: 并发部分移动到openai_wapper

* ♻️ refactor(openai_wrapper.py): 重构异步LLM调用逻辑,优化异常处理和日志记录

- 将响应结果提取到`resp`变量中,避免重复代码
- 简化异常处理逻辑,确保`finally`块中释放信号量
- 优化日志记录位置,确保在返回结果前记录调试信息

* Update openai_wrapper.py

to resolve error raise by 'logger is None'
(This problem existed in the previous version. It was not caused by your code. I just modified it.)

Signed-off-by: bigbrother666 <96130569+bigbrother666sh@users.noreply.github.com>

---------

Signed-off-by: bigbrother666 <96130569+bigbrother666sh@users.noreply.github.com>
Co-authored-by: bigbrother666 <96130569+bigbrother666sh@users.noreply.github.com>
2025-01-08 09:56:08 +08:00
bigbrother666sh
1f79cb3c4d v0.3.6fix 2025-01-05 21:54:06 +08:00
bigbrother666sh
35fbff0f27 v0.3.6 release 2025-01-05 18:12:36 +08:00
bigbrother666sh
1f9b6d5d6c v0.3.6 mockup 2025-01-04 23:36:18 +08:00
bigbrother666sh
86cabc4e28 v0.3.6test update 2025-01-04 13:57:12 +08:00
bigbrother666sh
c6d05d0210 update deepscraper for crawl4ai bug 2025-01-03 13:17:24 +08:00
bigbrother666sh
b4da3cc853 v0.3.6test 2025-01-02 22:05:51 +08:00
bigbrother666sh
dc8391c357 deep scraper 2025-01-02 10:14:33 +08:00
bigbrother666sh
ae7b5d7f65 v0.3.6 2024-12-27 14:07:37 +08:00
bigbrother666sh
de6d5cdbb1 test&report 2024-12-24 13:11:17 +08:00
bigbrother666sh
fd9d9f9a4e add test for v0.3.6 2024-12-23 10:12:52 +08:00
bigbrother666sh
1416ab29c8 add test 2024-12-18 22:45:20 +08:00
bigbrother666sh
7752b4b3b4 V0.3.5 2024-12-10 14:18:03 +08:00
bigbrother666sh
cad383b0fe feat:docker file 2024-12-09 18:18:10 +08:00
bigbrother666sh
de549c6334 fix: erros 2024-12-08 21:30:39 +08:00
bigbrother666sh
ec514b49dd refactor(core): the new general crawler 2024-12-08 18:03:34 +08:00
bigbrother666sh
8c64749ba7 feat(core): update pb data sheet structure 2024-12-06 14:14:28 +08:00
bigbrother666sh
3e4454a33b add batch process 2024-12-06 12:16:02 +08:00
bigbrother666sh
b83ca2369a second commit for V0.3.22 2024-12-06 11:42:22 +08:00
bigbrother666sh
f18d9ba084 first commit for V0.3.22 2024-12-05 20:45:39 +08:00
bigbrother666sh
61251547a0 first commit for V0.3.22 2024-12-05 12:11:28 +08:00
bigbrother666sh
2e01ba5ba7 update reademe 2024-11-23 15:53:18 +08:00
bigbrother666sh
61e3c042b4 repair openai wrapper 2024-10-10 20:23:58 +08:00
bigbrother666
f87b68e6b1
issues repair (#88)
* issues repair

* improve mp_alblum for #55

* prompt engineering for get info

* update to V0.3.1

* update to V0.3.1
2024-09-03 22:42:29 +08:00
madizm
c309cf7afe
feat 解析微信文章目录 (#55)
* feat 解析微信文章目录

* fix mp_crawler should return https url
2024-09-02 10:03:14 +08:00
GuanYixuan
a09571d4b3
Fix typo (#28) 2024-08-06 23:11:25 +08:00
bigbrother666
4d0b993bd9 add summary to url-info 2024-08-04 18:26:58 +08:00
bigbrother666
be13ce50e0 fix-support weixin url card info 2024-08-04 17:56:20 +08:00
bigbrother666
a320aca071 support weixin url card info 2024-08-04 12:41:53 +08:00
bigbrother666
7c84fbba60 update mp_crawler 2024-07-30 22:32:48 +08:00
bigbrother666
ee13b7bcdd fix mistake 2024-06-27 09:20:12 +08:00
bigbrother666
4a2ace0e25 fix url-repeat and some img path miss base-url 2024-06-22 16:47:13 +08:00
bigbrother666
c20c4a0a27 little fix 2024-06-21 13:55:25 +08:00
bigbrother666
10cda47778 update README 2024-06-21 10:05:33 +08:00
bigbrother666
6b85b90429 add scripts 2024-06-20 15:01:27 +08:00
bigbrother666
6b358d65e7 docker file add 2024-06-19 20:00:53 +08:00
bigbrother666
82f0041469 scrapers updated 2024-06-19 10:05:10 +08:00
bigbrother666
e8db4fac87 mulity-language readme 2024-06-16 20:42:01 +08:00
bigbrother666
06a6ac19e3 0.12 final code 2024-06-16 14:33:21 +08:00
bigbrother666
23b7f76d9e new llm crawler 2024-06-15 23:58:37 +08:00
bigbrother666
b683073fde add start-up scrip 2024-06-15 20:04:10 +08:00
bigbrother666
31411cd8f4 scrapers updated 2024-06-15 15:41:31 +08:00
bigbrother666
b1dad1533f code review 2024-06-14 09:08:12 +08:00