jack пре 3 месеци
родитељ
комит
e2e51537b4
14 измењених фајлова са 1031 додато и 189 уклоњено
  1. 2 1
      .gitignore
  2. 191 1
      Readme.md
  3. 92 0
      config.py
  4. 88 0
      logger.py
  5. 184 82
      main.py
  6. 81 0
      performance.py
  7. 117 0
      realtime_logger.py
  8. 25 22
      requirements.txt
  9. 40 0
      start.py
  10. 129 24
      static/script.js
  11. 28 20
      step1.py
  12. 25 22
      step2.py
  13. 10 6
      templates/index.html
  14. 19 11
      utils.py

+ 2 - 1
.gitignore

@@ -63,4 +63,5 @@ target/
 other/split_clash_config/split_config
 other/split_clash_config/split_config
 ai_news/save_data
 ai_news/save_data
 daily/*.txt
 daily/*.txt
-data/
+data/
+eh.tar

+ 191 - 1
Readme.md

@@ -1,4 +1,194 @@
+# EH-Downloader
+
+一个基于 FastAPI 的 E-Hentai 画廊下载工具,支持异步批量下载画廊图片。
+
+## 功能特性
+
+- 🚀 **异步下载**: 基于 asyncio 的高性能异步下载
+- 🌐 **Web界面**: 现代化的 Web 用户界面
+- 🔧 **代理支持**: 支持 HTTP 代理配置
+- 📁 **智能管理**: 自动创建目录结构,按画廊分文件夹存储
+- 🔄 **断点续传**: 支持中断后继续下载
+- 📊 **进度监控**: 实时显示下载进度和状态
+- 🧹 **自动清理**: 一键清理临时文件和日志
+
+## 快速开始
+
+### 环境要求
+
+- Python 3.11+
+- 网络代理(可选,用于访问 E-Hentai)
+
+### 安装依赖
+
+```bash
 pip install -r requirements.txt
 pip install -r requirements.txt
+```
+
+### 运行应用
+
+```bash
 python main.py
 python main.py
+```
+
+访问 `http://localhost:8000` 使用 Web 界面。
+
+### Docker 部署
+
+```bash
+# 构建镜像
+docker build -t eh-downloader .
+
+# 运行容器
+docker-compose up -d
+```
+
+## 使用说明
+
+### 1. 配置代理
+
+在项目根目录的 `proxy.txt` 文件中添加代理配置,每行一个:
+
+```
+127.0.0.1:7890
+192.168.1.100:8080
+```
+
+### 2. 添加目标URL
+
+在 `data/targets.txt` 文件中添加要下载的画廊URL,每行一个:
+
+```
+https://e-hentai.org/g/1234567/abcdef123456
+https://e-hentai.org/g/2345678/bcdefg234567
+```
+
+### 3. 开始下载
+
+1. 打开 Web 界面
+2. 选择代理设置
+3. 点击"读取目标URL"加载URL列表
+4. 点击"下载URL"抓取画廊链接
+5. 点击"下载图片"开始下载图片
+
+## 项目结构
+
+```
+ehentai-fastapi/
+├── main.py              # 主应用文件
+├── config.py            # 配置管理
+├── logger.py            # 日志管理
+├── utils.py             # 工具函数
+├── step1.py             # 画廊链接抓取
+├── step2.py             # 图片下载
+├── downloader.py        # 下载器类
+├── templates/           # HTML模板
+├── static/              # 静态资源
+├── data/                # 数据目录
+│   ├── targets.txt      # 目标URL列表
+│   ├── downloads/        # 下载文件存储
+│   └── *.log           # 日志文件
+├── proxy.txt            # 代理配置
+├── requirements.txt     # 依赖列表
+├── Dockerfile           # Docker配置
+└── docker-compose.yaml  # Docker Compose配置
+```
+
+## 配置说明
+
+### 应用配置 (config.py)
+
+- `concurrency`: 并发数,默认20
+- `max_page`: 单专辑最大翻页数,默认100
+- `retry_per_page`: 单页重试次数,默认5
+- `retry_per_image`: 单图重试次数,默认3
+- `timeout`: 请求超时时间,默认10秒
+- `image_timeout`: 图片下载超时时间,默认15秒
+
+### 日志配置
+
+- 日志级别:INFO
+- 日志文件:`data/app.log`, `data/crawl.log`, `data/download.log`
+- 日志格式:`[时间] [级别] 消息`
+
+## API 接口
+
+### GET /
+主页面
+
+### POST /load_urls
+读取目标URL列表
+
+### POST /download_urls
+开始抓取画廊链接
+
+### POST /download_images
+开始下载图片
+
+### POST /check_incomplete
+检查未完成的下载
+
+### POST /clean_files
+清理临时文件
+
+### POST /clear
+清除输出
+
+## 注意事项
+
+1. **网络要求**: 需要稳定的网络连接和合适的代理
+2. **存储空间**: 确保有足够的磁盘空间存储下载的图片
+3. **合规使用**: 请遵守相关法律法规和网站使用条款
+4. **代理配置**: 建议使用稳定的代理服务以确保下载成功率
+
+## 故障排除
+
+### 常见问题
+
+1. **下载失败**: 检查代理配置和网络连接
+2. **文件损坏**: 重新下载或检查存储空间
+3. **权限错误**: 确保应用有读写权限
+4. **内存不足**: 降低并发数或增加系统内存
+
+### 日志查看
+
+```bash
+# 查看应用日志
+tail -f data/app.log
+
+# 查看抓取日志
+tail -f data/crawl.log
+
+# 查看下载日志
+tail -f data/download.log
+```
+
+## 开发说明
+
+### 代码结构
+
+- **main.py**: FastAPI 应用主文件
+- **config.py**: 配置管理模块
+- **logger.py**: 日志管理模块
+- **utils.py**: 工具函数模块
+- **step1.py**: 画廊链接抓取逻辑
+- **step2.py**: 图片下载逻辑
+
+### 扩展功能
+
+1. 添加新的下载源
+2. 支持更多图片格式
+3. 实现下载队列管理
+4. 添加用户认证系统
+
+## 许可证
+
+本项目仅供学习和研究使用,请遵守相关法律法规。
+
+## 更新日志
 
 
-open: ip:8000
+### v1.0.0
+- 初始版本发布
+- 支持基本的画廊下载功能
+- Web界面和API接口
+- Docker支持

+ 92 - 0
config.py

@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+配置管理模块
+"""
+import os
+from pathlib import Path
+from typing import List, Optional
+from pydantic import BaseModel, Field
+
+
+class AppConfig(BaseModel):
+    """应用配置"""
+    # 基础配置
+    app_name: str = "EH-Downloader"
+    app_version: str = "1.0.0"
+    debug: bool = False
+    
+    # 服务器配置
+    host: str = "0.0.0.0"
+    port: int = 8000
+    
+    # 数据目录配置
+    data_dir: str = "data"
+    downloads_dir: str = "data/downloads"
+    targets_file: str = "data/targets.txt"
+    proxy_file: str = "proxy.txt"
+    
+    # 爬虫配置
+    concurrency: int = 20
+    max_page: int = 100
+    retry_per_page: int = 5
+    retry_per_image: int = 3
+    timeout: float = 10.0
+    image_timeout: float = 15.0
+    
+    # 日志配置
+    log_level: str = "INFO"
+    log_format: str = "[%(asctime)s] [%(levelname)s] %(message)s"
+    
+    # 文件清理配置
+    cleanup_patterns: List[str] = ["**/*.log", "**/*.json"]
+    cleanup_exclude: List[str] = ["data/targets.txt"]
+    
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        # 确保目录存在
+        self._ensure_directories()
+    
+    def _ensure_directories(self):
+        """确保必要的目录存在"""
+        Path(self.data_dir).mkdir(exist_ok=True)
+        Path(self.downloads_dir).mkdir(parents=True, exist_ok=True)
+    
+    @property
+    def targets_path(self) -> Path:
+        """获取targets文件路径"""
+        return Path(self.targets_file)
+    
+    @property
+    def proxy_path(self) -> Path:
+        """获取proxy文件路径"""
+        return Path(self.proxy_file)
+    
+    def get_proxies(self) -> List[str]:
+        """读取代理列表"""
+        if not self.proxy_path.exists():
+            return ["127.0.0.1:7890"]
+        
+        try:
+            with open(self.proxy_path, 'r', encoding='utf-8') as f:
+                proxies = [line.strip() for line in f.readlines() if line.strip()]
+            return proxies if proxies else ["127.0.0.1:7890"]
+        except Exception:
+            return ["127.0.0.1:7890"]
+    
+    def get_targets(self) -> List[str]:
+        """读取目标URL列表"""
+        if not self.targets_path.exists():
+            return []
+        
+        try:
+            with open(self.targets_path, 'r', encoding='utf-8') as f:
+                urls = [line.strip() for line in f.readlines() if line.strip()]
+            # 过滤掉注释行
+            return [url for url in urls if url and not url.startswith('#')]
+        except Exception:
+            return []
+
+
+# 全局配置实例
+config = AppConfig()

+ 88 - 0
logger.py

@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+日志管理模块
+"""
+import logging
+import sys
+from pathlib import Path
+from typing import Optional
+
+from config import config
+
+
+class LoggerManager:
+    """日志管理器"""
+    
+    _loggers = {}
+    
+    @classmethod
+    def get_logger(cls, name: str, log_file: Optional[str] = None) -> logging.Logger:
+        """获取日志记录器"""
+        if name in cls._loggers:
+            return cls._loggers[name]
+        
+        logger = logging.getLogger(name)
+        logger.setLevel(getattr(logging, config.log_level.upper()))
+        
+        # 避免重复添加处理器
+        if logger.handlers:
+            return logger
+        
+        # 控制台处理器
+        console_handler = logging.StreamHandler(sys.stdout)
+        console_handler.setLevel(getattr(logging, config.log_level.upper()))
+        console_formatter = logging.Formatter(config.log_format)
+        console_handler.setFormatter(console_formatter)
+        logger.addHandler(console_handler)
+        
+        # 文件处理器
+        if log_file:
+            log_path = Path(config.data_dir) / log_file
+            file_handler = logging.FileHandler(log_path, encoding='utf-8')
+            file_handler.setLevel(getattr(logging, config.log_level.upper()))
+            file_formatter = logging.Formatter(config.log_format)
+            file_handler.setFormatter(file_formatter)
+            logger.addHandler(file_handler)
+        
+        # WebSocket 实时日志处理器
+        logger.addHandler(WebSocketLogHandler())
+        
+        cls._loggers[name] = logger
+        return logger
+    
+    @classmethod
+    def setup_root_logger(cls):
+        """设置根日志记录器"""
+        logging.basicConfig(
+            level=getattr(logging, config.log_level.upper()),
+            format=config.log_format,
+            handlers=[
+                logging.StreamHandler(sys.stdout),
+                logging.FileHandler(Path(config.data_dir) / "app.log", encoding='utf-8'),
+                WebSocketLogHandler(),
+            ]
+        )
+
+
+# 便捷函数
+def get_logger(name: str, log_file: Optional[str] = None) -> logging.Logger:
+    """获取日志记录器的便捷函数"""
+    return LoggerManager.get_logger(name, log_file)
+
+
+class WebSocketLogHandler(logging.Handler):
+    """将日志通过实时日志器广播到 WebSocket 客户端"""
+    
+    def emit(self, record: logging.LogRecord) -> None:
+        try:
+            message = self.format(record)
+            level = record.levelname
+            source = record.name
+            # 走同步接口,内部会尝试调度到事件循环
+            # 延迟导入,避免循环依赖
+            from realtime_logger import realtime_logger
+            realtime_logger.broadcast_log_sync(message, level, source)
+        except Exception:
+            # 保证日志不因 WebSocket 发送失败而中断
+            pass

+ 184 - 82
main.py

@@ -1,49 +1,99 @@
-from fastapi import FastAPI, Request
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+EH-Downloader 主应用
+"""
+import glob
+import os
+from pathlib import Path
+from typing import List
+
+from fastapi import FastAPI, Request, HTTPException, WebSocket, WebSocketDisconnect
 from fastapi.staticfiles import StaticFiles
 from fastapi.staticfiles import StaticFiles
 from fastapi.templating import Jinja2Templates
 from fastapi.templating import Jinja2Templates
 from fastapi.responses import JSONResponse, FileResponse
 from fastapi.responses import JSONResponse, FileResponse
-import uvicorn
-import glob
-import os
 from pydantic import BaseModel
 from pydantic import BaseModel
+import uvicorn
+import asyncio
+import threading
+import json
+
+from config import config
+from logger import get_logger
+from realtime_logger import realtime_logger
 import step2
 import step2
-from utils import *
+from utils import run_step1, run_step2
+
+# 设置日志
+logger = get_logger("main", "app.log")
 
 
-app = FastAPI(title="EH-Downloader", version="1.0.0")
+app = FastAPI(
+    title=config.app_name,
+    version=config.app_version,
+    description="E-Hentai 画廊下载工具"
+)
 
 
-# 在应用启动时检查并创建data文件夹和targets.txt,以及proxy.txt
 @app.on_event("startup")
 @app.on_event("startup")
 async def startup_event():
 async def startup_event():
-    # 检查并创建data文件夹
-    data_dir = "data"
-    if not os.path.exists(data_dir):
-        os.makedirs(data_dir)
-        print(f"创建目录: {data_dir}")
+    """应用启动事件"""
+    logger.info(f"启动 {config.app_name} v{config.app_version}")
+    # 注册事件循环到实时日志器,便于跨线程广播
+    try:
+        realtime_logger.set_loop(asyncio.get_running_loop())
+    except RuntimeError:
+        # 若获取失败则忽略
+        pass
     
     
-    # 检查并创建targets.txt文件
-    targets_file = os.path.join(data_dir, "targets.txt")
-    if not os.path.exists(targets_file):
-        with open(targets_file, 'w', encoding='utf-8') as f:
+    # 确保目录存在
+    config._ensure_directories()
+    
+    # 创建默认targets.txt文件
+    if not config.targets_path.exists():
+        with open(config.targets_path, 'w', encoding='utf-8') as f:
             f.write("# 在这里添加目标URL,每行一个\n")
             f.write("# 在这里添加目标URL,每行一个\n")
             f.write("# 示例:\n")
             f.write("# 示例:\n")
             f.write("https://e-hentai.org/g/3550066/47d6393550\n")
             f.write("https://e-hentai.org/g/3550066/47d6393550\n")
-        print(f"创建文件: {targets_file}")
-    else:
-        print(f"文件已存在: {targets_file}")
+        logger.info(f"创建文件: {config.targets_path}")
     
     
-    # 检查并创建proxy.txt文件
-    proxy_file = "proxy.txt"
-    if not os.path.exists(proxy_file):
-        with open(proxy_file, 'w', encoding='utf-8') as f:
+    # 创建默认proxy.txt文件
+    if not config.proxy_path.exists():
+        with open(config.proxy_path, 'w', encoding='utf-8') as f:
             f.write("127.0.0.1:7890\n")
             f.write("127.0.0.1:7890\n")
-        print(f"创建文件: {proxy_file}")
-    else:
-        print(f"文件已存在: {proxy_file}")
+        logger.info(f"创建文件: {config.proxy_path}")
+    
+    logger.info("应用启动完成")
 
 
 # 挂载静态文件和模板
 # 挂载静态文件和模板
 app.mount("/static", StaticFiles(directory="static"), name="static")
 app.mount("/static", StaticFiles(directory="static"), name="static")
 templates = Jinja2Templates(directory="templates")
 templates = Jinja2Templates(directory="templates")
 
 
+# WebSocket 路由
+@app.websocket("/ws")
+async def websocket_endpoint(websocket: WebSocket):
+    """WebSocket连接处理"""
+    await websocket.accept()
+    realtime_logger.add_connection(websocket)
+    
+    try:
+        # 发送最近的日志
+        recent_logs = await realtime_logger.get_recent_logs(20)
+        for log_entry in recent_logs:
+            await websocket.send_text(json.dumps(log_entry, ensure_ascii=False))
+        
+        # 保持连接
+        while True:
+            try:
+                # 等待客户端消息(心跳检测)
+                data = await websocket.receive_text()
+                if data == "ping":
+                    await websocket.send_text("pong")
+            except WebSocketDisconnect:
+                break
+    except Exception as e:
+        logger.error(f"WebSocket错误: {e}")
+    finally:
+        realtime_logger.remove_connection(websocket)
+
 # favicon 路由
 # favicon 路由
 @app.get("/favicon.ico", include_in_schema=False)
 @app.get("/favicon.ico", include_in_schema=False)
 async def favicon():
 async def favicon():
@@ -52,37 +102,22 @@ async def favicon():
 @app.get("/")
 @app.get("/")
 async def home(request: Request):
 async def home(request: Request):
     """主页面"""
     """主页面"""
-    # 读取proxy.txt中的代理列表
-    proxies = []
     try:
     try:
-        with open("proxy.txt", 'r', encoding='utf-8') as f:
-            proxies = [line.strip() for line in f.readlines() if line.strip()]
+        proxies = config.get_proxies()
+        return templates.TemplateResponse("index.html", {
+            "request": request,
+            "proxies": proxies,
+            "default_proxy": proxies[0] if proxies else "127.0.0.1:7890"
+        })
     except Exception as e:
     except Exception as e:
-        print(f"读取proxy.txt失败: {e}")
-        proxies = ["127.0.0.1:7890"]
-    
-    # 如果没有代理配置,使用默认值
-    if not proxies:
-        proxies = ["127.0.0.1:7890"]
-    
-    return templates.TemplateResponse("index.html", {
-        "request": request,
-        "proxies": proxies,
-        "default_proxy": proxies[0] if proxies else "127.0.0.1:7890"
-    })
+        logger.error(f"渲染主页失败: {e}")
+        raise HTTPException(status_code=500, detail="服务器内部错误")
 
 
 @app.post("/load_urls")
 @app.post("/load_urls")
 async def load_urls():
 async def load_urls():
     """读取 targets.txt 文件中的URL"""
     """读取 targets.txt 文件中的URL"""
     try:
     try:
-        file_path = "data/targets.txt"
-        
-        # 读取文件内容
-        with open(file_path, 'r', encoding='utf-8') as f:
-            urls = [line.strip() for line in f.readlines() if line.strip()]
-        
-        # 过滤掉空行和注释行(以#开头的行)
-        urls = [url for url in urls if url and not url.startswith('#')]
+        urls = config.get_targets()
         
         
         if not urls:
         if not urls:
             return JSONResponse({
             return JSONResponse({
@@ -91,6 +126,7 @@ async def load_urls():
                 "urls": []
                 "urls": []
             })
             })
         
         
+        logger.info(f"成功读取 {len(urls)} 个URL")
         return JSONResponse({
         return JSONResponse({
             "success": True,
             "success": True,
             "message": f"成功读取 {len(urls)} 个URL",
             "message": f"成功读取 {len(urls)} 个URL",
@@ -98,6 +134,7 @@ async def load_urls():
         })
         })
         
         
     except Exception as e:
     except Exception as e:
+        logger.error(f"读取URL失败: {e}")
         return JSONResponse({
         return JSONResponse({
             "success": False,
             "success": False,
             "message": f"读取文件时出错: {str(e)}",
             "message": f"读取文件时出错: {str(e)}",
@@ -118,25 +155,77 @@ class ProxyRequest(BaseModel):
 
 
 @app.post("/download_urls")
 @app.post("/download_urls")
 async def download_urls(req: ProxyRequest):
 async def download_urls(req: ProxyRequest):
-    # 解析proxy字符串为ip和port
-    if ":" in req.proxy:
-        ip, port = req.proxy.split(":", 1)
-        proxy = f"http://{ip}:{port}"
-    else:
-        proxy = None
-    msg = await run_step1(proxy)
-    return JSONResponse({"success": True, "message": msg})
+    """下载画廊链接"""
+    try:
+        # 解析proxy字符串为ip和port
+        if ":" in req.proxy:
+            ip, port = req.proxy.split(":", 1)
+            proxy = f"http://{ip}:{port}"
+        else:
+            proxy = None
+        
+        # 发送实时日志
+        await realtime_logger.broadcast_log(f"开始抓取画廊链接,代理: {proxy}", "INFO", "step1")
+        
+        # 在后台线程中执行,避免阻塞
+        def run_step1_sync():
+            import asyncio
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+            try:
+                return loop.run_until_complete(run_step1(proxy))
+            finally:
+                loop.close()
+        
+        # 使用线程池执行
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor() as executor:
+            future = executor.submit(run_step1_sync)
+            msg = future.result()
+        
+        await realtime_logger.broadcast_log(f"画廊链接抓取完成: {msg}", "SUCCESS", "step1")
+        return JSONResponse({"success": True, "message": msg})
+    except Exception as e:
+        await realtime_logger.broadcast_log(f"抓取画廊链接失败: {e}", "ERROR", "step1")
+        logger.error(f"抓取画廊链接失败: {e}")
+        return JSONResponse({"success": False, "message": f"抓取失败: {str(e)}"})
 
 
 @app.post("/download_images")
 @app.post("/download_images")
 async def download_images(req: ProxyRequest):
 async def download_images(req: ProxyRequest):
-    # 解析proxy字符串为ip和port
-    if ":" in req.proxy:
-        ip, port = req.proxy.split(":", 1)
-        proxy = f"http://{ip}:{port}"
-    else:
-        proxy = None
-    msg = await run_step2(proxy)
-    return JSONResponse({"success": True, "message": msg})
+    """下载图片"""
+    try:
+        # 解析proxy字符串为ip和port
+        if ":" in req.proxy:
+            ip, port = req.proxy.split(":", 1)
+            proxy = f"http://{ip}:{port}"
+        else:
+            proxy = None
+        
+        # 发送实时日志
+        await realtime_logger.broadcast_log(f"开始下载图片,代理: {proxy}", "INFO", "step2")
+        
+        # 在后台线程中执行,避免阻塞
+        def run_step2_sync():
+            import asyncio
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+            try:
+                return loop.run_until_complete(run_step2(proxy))
+            finally:
+                loop.close()
+        
+        # 使用线程池执行
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor() as executor:
+            future = executor.submit(run_step2_sync)
+            msg = future.result()
+        
+        await realtime_logger.broadcast_log(f"图片下载完成: {msg}", "SUCCESS", "step2")
+        return JSONResponse({"success": True, "message": msg})
+    except Exception as e:
+        await realtime_logger.broadcast_log(f"下载图片失败: {e}", "ERROR", "step2")
+        logger.error(f"下载图片失败: {e}")
+        return JSONResponse({"success": False, "message": f"下载失败: {str(e)}"})
 
 
 @app.post("/clean_files")
 @app.post("/clean_files")
 async def clean_files():
 async def clean_files():
@@ -145,24 +234,23 @@ async def clean_files():
         deleted_files = []
         deleted_files = []
         error_files = []
         error_files = []
         
         
-        # 查找当前目录及所有子目录中的 .log 和 .json 文件
-        patterns = ["**/*.log", "**/*.json"]
-        
-        for pattern in patterns:
+        # 使用配置中的清理模式
+        for pattern in config.cleanup_patterns:
             for file_path in glob.glob(pattern, recursive=True):
             for file_path in glob.glob(pattern, recursive=True):
                 try:
                 try:
-                    # 跳过 data/targets.txt 文件,因为这是配置文件
-                    if file_path == "data/targets.txt":
+                    # 跳过排除的文件
+                    if file_path in config.cleanup_exclude:
                         continue
                         continue
                     
                     
                     os.remove(file_path)
                     os.remove(file_path)
                     deleted_files.append(file_path)
                     deleted_files.append(file_path)
-                    print(f"已删除文件: {file_path}")
+                    logger.info(f"已删除文件: {file_path}")
                 except Exception as e:
                 except Exception as e:
                     error_files.append(f"{file_path}: {str(e)}")
                     error_files.append(f"{file_path}: {str(e)}")
-                    print(f"删除文件失败 {file_path}: {str(e)}")
+                    logger.error(f"删除文件失败 {file_path}: {str(e)}")
         
         
         if error_files:
         if error_files:
+            logger.warning(f"清理完成,但部分文件删除失败: {len(error_files)} 个")
             return JSONResponse({
             return JSONResponse({
                 "success": False,
                 "success": False,
                 "message": f"清理完成,但部分文件删除失败",
                 "message": f"清理完成,但部分文件删除失败",
@@ -172,6 +260,7 @@ async def clean_files():
                 "error_files": error_files
                 "error_files": error_files
             })
             })
         else:
         else:
+            logger.info(f"成功清理 {len(deleted_files)} 个文件")
             return JSONResponse({
             return JSONResponse({
                 "success": True,
                 "success": True,
                 "message": f"成功清理 {len(deleted_files)} 个文件",
                 "message": f"成功清理 {len(deleted_files)} 个文件",
@@ -181,6 +270,7 @@ async def clean_files():
             })
             })
             
             
     except Exception as e:
     except Exception as e:
+        logger.error(f"清理过程中出错: {e}")
         return JSONResponse({
         return JSONResponse({
             "success": False,
             "success": False,
             "message": f"清理过程中出错: {str(e)}",
             "message": f"清理过程中出错: {str(e)}",
@@ -190,14 +280,26 @@ async def clean_files():
 
 
 @app.post("/check_incomplete")
 @app.post("/check_incomplete")
 async def check_incomplete():
 async def check_incomplete():
-    result = await step2.scan_tasks()
-
     """检查未完成文件"""
     """检查未完成文件"""
-    return JSONResponse({
-        "success": True,
-        "message": "检查未完成文件功能已就绪",
-        "data": f"共 {len(result)} 个文件未下载"
-    })
+    try:
+        result = await step2.scan_tasks()
+        logger.info(f"检查未完成文件: {len(result)} 个")
+        return JSONResponse({
+            "success": True,
+            "message": "检查未完成文件功能已就绪",
+            "data": f"共 {len(result)} 个文件未下载"
+        })
+    except Exception as e:
+        logger.error(f"检查未完成文件失败: {e}")
+        return JSONResponse({
+            "success": False,
+            "message": f"检查失败: {str(e)}"
+        })
 
 
 if __name__ == "__main__":
 if __name__ == "__main__":
-    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)
+    uvicorn.run(
+        "main:app", 
+        host=config.host, 
+        port=config.port, 
+        reload=config.debug
+    )

+ 81 - 0
performance.py

@@ -0,0 +1,81 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+性能优化模块
+"""
+import asyncio
+import time
+from typing import Dict, Any
+from functools import wraps
+
+from logger import get_logger
+
+logger = get_logger("performance")
+
+
+def monitor_performance(func):
+    """性能监控装饰器"""
+    @wraps(func)
+    async def async_wrapper(*args, **kwargs):
+        start_time = time.time()
+        try:
+            result = await func(*args, **kwargs)
+            execution_time = time.time() - start_time
+            logger.info(f"{func.__name__} 执行完成,耗时: {execution_time:.2f}秒")
+            return result
+        except Exception as e:
+            execution_time = time.time() - start_time
+            logger.error(f"{func.__name__} 执行失败,耗时: {execution_time:.2f}秒,错误: {e}")
+            raise
+    
+    @wraps(func)
+    def sync_wrapper(*args, **kwargs):
+        start_time = time.time()
+        try:
+            result = func(*args, **kwargs)
+            execution_time = time.time() - start_time
+            logger.info(f"{func.__name__} 执行完成,耗时: {execution_time:.2f}秒")
+            return result
+        except Exception as e:
+            execution_time = time.time() - start_time
+            logger.error(f"{func.__name__} 执行失败,耗时: {execution_time:.2f}秒,错误: {e}")
+            raise
+    
+    if asyncio.iscoroutinefunction(func):
+        return async_wrapper
+    else:
+        return sync_wrapper
+
+
+class PerformanceMonitor:
+    """性能监控器"""
+    
+    def __init__(self):
+        self.metrics: Dict[str, Any] = {}
+        self.start_time = time.time()
+    
+    def start_timer(self, name: str):
+        """开始计时"""
+        self.metrics[name] = {"start": time.time()}
+    
+    def end_timer(self, name: str):
+        """结束计时"""
+        if name in self.metrics:
+            self.metrics[name]["end"] = time.time()
+            self.metrics[name]["duration"] = (
+                self.metrics[name]["end"] - self.metrics[name]["start"]
+            )
+            logger.info(f"{name} 耗时: {self.metrics[name]['duration']:.2f}秒")
+    
+    def get_summary(self) -> Dict[str, Any]:
+        """获取性能摘要"""
+        total_time = time.time() - self.start_time
+        return {
+            "total_time": total_time,
+            "metrics": self.metrics,
+            "summary": f"总运行时间: {total_time:.2f}秒"
+        }
+
+
+# 全局性能监控器
+perf_monitor = PerformanceMonitor()

+ 117 - 0
realtime_logger.py

@@ -0,0 +1,117 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+实时日志输出模块
+"""
+import asyncio
+import json
+import time
+import threading
+from typing import List, Dict, Any, Optional
+from pathlib import Path
+
+import logging
+
+logger = logging.getLogger("realtime_logger")
+
+
+class RealtimeLogger:
+    """实时日志记录器"""
+    
+    def __init__(self):
+        self.connections: List[Any] = []
+        self.log_buffer: List[Dict[str, Any]] = []
+        self.max_buffer_size = 1000
+        self._lock = threading.Lock()
+        self._loop: Optional[asyncio.AbstractEventLoop] = None
+    
+    def set_loop(self, loop: asyncio.AbstractEventLoop) -> None:
+        """注册主事件循环,便于跨线程安全调度发送任务"""
+        self._loop = loop
+    
+    def add_connection(self, websocket):
+        """添加WebSocket连接"""
+        with self._lock:
+            self.connections.append(websocket)
+        logger.info(f"新增WebSocket连接,当前连接数: {len(self.connections)}")
+    
+    def remove_connection(self, websocket):
+        """移除WebSocket连接"""
+        with self._lock:
+            if websocket in self.connections:
+                self.connections.remove(websocket)
+        logger.info(f"移除WebSocket连接,当前连接数: {len(self.connections)}")
+    
+    async def broadcast_log(self, message: str, level: str = "INFO", source: str = "system"):
+        """广播日志消息到所有连接的客户端"""
+        log_entry = {
+            "timestamp": time.time(),
+            "time": time.strftime("%H:%M:%S"),
+            "level": level,
+            "source": source,
+            "message": message
+        }
+        
+        # 添加到缓冲区
+        with self._lock:
+            self.log_buffer.append(log_entry)
+            if len(self.log_buffer) > self.max_buffer_size:
+                self.log_buffer = self.log_buffer[-self.max_buffer_size:]
+        
+        # 广播到所有连接
+        if self.connections:
+            message_data = json.dumps(log_entry, ensure_ascii=False)
+            disconnected = []
+            
+            for websocket in self.connections.copy():  # 使用副本避免并发修改
+                try:
+                    await websocket.send_text(message_data)
+                except Exception as e:
+                    logger.warning(f"发送消息失败: {e}")
+                    disconnected.append(websocket)
+            
+            # 清理断开的连接
+            for ws in disconnected:
+                self.remove_connection(ws)
+    
+    def broadcast_log_sync(self, message: str, level: str = "INFO", source: str = "system"):
+        """同步广播日志消息(用于非异步环境)"""
+        log_entry = {
+            "timestamp": time.time(),
+            "time": time.strftime("%H:%M:%S"),
+            "level": level,
+            "source": source,
+            "message": message
+        }
+        
+        # 添加到缓冲区
+        with self._lock:
+            self.log_buffer.append(log_entry)
+            if len(self.log_buffer) > self.max_buffer_size:
+                self.log_buffer = self.log_buffer[-self.max_buffer_size:]
+        
+        # 若已注册事件循环,尝试在线程安全地调度异步广播
+        if self._loop is not None:
+            try:
+                asyncio.run_coroutine_threadsafe(
+                    self.broadcast_log(message=message, level=level, source=source),
+                    self._loop,
+                )
+            except Exception:
+                # 忽略发送失败,缓冲区仍可用于新连接回放
+                pass
+    
+    async def get_recent_logs(self, count: int = 50) -> List[Dict[str, Any]]:
+        """获取最近的日志"""
+        with self._lock:
+            return self.log_buffer[-count:] if self.log_buffer else []
+    
+    def clear_buffer(self):
+        """清空日志缓冲区"""
+        with self._lock:
+            self.log_buffer.clear()
+        logger.info("日志缓冲区已清空")
+
+
+# 全局实时日志记录器
+realtime_logger = RealtimeLogger()

+ 25 - 22
requirements.txt

@@ -1,29 +1,32 @@
-aiofile==3.9.0
-aiofiles==24.1.0
-aiopath
-annotated-types==0.7.0
-anyio
-beautifulsoup4==4.14.0
-caio==0.9.24
-certifi==2025.8.3
-click==8.3.0
+# Web框架
 fastapi==0.104.1
 fastapi==0.104.1
-h11==0.16.0
-httpcore==1.0.9
+uvicorn[standard]==0.24.0
+starlette==0.27.0
+websockets==12.0
+
+# HTTP客户端
 httpx==0.25.2
 httpx==0.25.2
-idna==3.10
-Jinja2==3.1.6
+httpcore==1.0.9
+
+# 异步文件操作
+aiofiles==24.1.0
+
+# HTML解析
+beautifulsoup4==4.14.0
 lxml==6.0.2
 lxml==6.0.2
+soupsieve==2.8
+
+# 模板引擎
+Jinja2==3.1.6
 MarkupSafe==3.0.3
 MarkupSafe==3.0.3
+
+# 数据验证
 pydantic==2.11.9
 pydantic==2.11.9
 pydantic_core==2.33.2
 pydantic_core==2.33.2
-python-multipart==0.0.6
-setuptools==78.1.1
-sniffio==1.3.1
-soupsieve==2.8
-starlette==0.27.0
+
+# 进度条
 tqdm==4.67.1
 tqdm==4.67.1
-typing-inspection==0.4.1
-typing_extensions==4.15.0
-uvicorn==0.24.0
-wheel==0.45.1
+
+# 其他依赖
+python-multipart==0.0.6
+certifi==2025.8.3

+ 40 - 0
start.py

@@ -0,0 +1,40 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+应用启动脚本
+"""
+import sys
+import os
+from pathlib import Path
+
+# 添加项目根目录到Python路径
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+
+from config import config
+from logger import LoggerManager
+import uvicorn
+
+def main():
+    """主函数"""
+    # 设置根日志记录器
+    LoggerManager.setup_root_logger()
+    
+    # 确保数据目录存在
+    config._ensure_directories()
+    
+    print(f"启动 {config.app_name} v{config.app_version}")
+    print(f"服务器地址: http://{config.host}:{config.port}")
+    print(f"调试模式: {'开启' if config.debug else '关闭'}")
+    
+    # 启动服务器
+    uvicorn.run(
+        "main:app",
+        host=config.host,
+        port=config.port,
+        reload=config.debug,
+        log_level=config.log_level.lower()
+    )
+
+if __name__ == "__main__":
+    main()

+ 129 - 24
static/script.js

@@ -11,7 +11,11 @@ class DownloadTool {
         this.clearOutputBtn = document.getElementById('clearOutput');
         this.clearOutputBtn = document.getElementById('clearOutput');
         this.proxySelect = document.getElementById('proxy');
         this.proxySelect = document.getElementById('proxy');
         
         
+        this.websocket = null;
+        this.isConnected = false;
+        
         this.initEvents();
         this.initEvents();
+        this.connectWebSocket();
     }
     }
     
     
     initEvents() {
     initEvents() {
@@ -46,9 +50,77 @@ class DownloadTool {
         });
         });
     }
     }
     
     
+    connectWebSocket() {
+        try {
+            const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
+            const wsUrl = `${protocol}//${window.location.host}/ws`;
+            this.websocket = new WebSocket(wsUrl);
+            
+            this.websocket.onopen = () => {
+                this.isConnected = true;
+                this.showOutput('WebSocket连接已建立,可以接收实时日志', 'success');
+                console.log('WebSocket连接已建立');
+            };
+            
+            this.websocket.onmessage = (event) => {
+                try {
+                    const logEntry = JSON.parse(event.data);
+                    this.appendRealtimeLog(logEntry);
+                } catch (e) {
+                    console.error('解析WebSocket消息失败:', e);
+                }
+            };
+            
+            this.websocket.onclose = () => {
+                this.isConnected = false;
+                this.showOutput('WebSocket连接已断开,正在尝试重连...', 'error');
+                console.log('WebSocket连接已断开');
+                // 5秒后尝试重连
+                setTimeout(() => this.connectWebSocket(), 5000);
+            };
+            
+            this.websocket.onerror = (error) => {
+                console.error('WebSocket错误:', error);
+                this.showOutput('WebSocket连接错误', 'error');
+            };
+        } catch (error) {
+            console.error('创建WebSocket连接失败:', error);
+            this.showOutput('WebSocket连接失败', 'error');
+        }
+    }
+    
+    appendRealtimeLog(logEntry) {
+        const timestamp = logEntry.time || new Date().toLocaleTimeString();
+        const level = logEntry.level || 'INFO';
+        const source = logEntry.source || 'system';
+        const message = logEntry.message || '';
+        
+        const logLine = `[${timestamp}] [${level}] [${source}] ${message}`;
+        
+        // 追加到输出框
+        if (this.output.textContent) {
+            this.output.textContent += '\n' + logLine;
+        } else {
+            this.output.textContent = logLine;
+        }
+        
+        // 自动滚动到底部
+        this.output.scrollTop = this.output.scrollHeight;
+        
+        // 根据日志级别设置样式
+        if (level === 'ERROR') {
+            this.output.classList.add('error');
+        } else if (level === 'SUCCESS') {
+            this.output.classList.add('success');
+        } else {
+            this.output.classList.remove('error', 'success');
+        }
+    }
+    
     async loadTargetUrls() {
     async loadTargetUrls() {
         try {
         try {
-            this.showOutput('正在读取 targets.txt...', '');
+            this.setLoading(true);
+            this.showOutput('正在读取 targets.txt...', 'info');
             
             
             const response = await fetch('/load_urls', {
             const response = await fetch('/load_urls', {
                 method: 'POST'
                 method: 'POST'
@@ -59,12 +131,14 @@ class DownloadTool {
             if (result.success) {
             if (result.success) {
                 // 在URL列表文本框中显示读取的URL
                 // 在URL列表文本框中显示读取的URL
                 this.urlListTextarea.value = result.urls.join('\n');
                 this.urlListTextarea.value = result.urls.join('\n');
-                this.showOutput(`成功读取 ${result.urls.length} 个URL`, 'success');
+                this.showOutput(`成功读取 ${result.urls.length} 个URL\n\nURL列表:\n${result.urls.join('\n')}`, 'success');
             } else {
             } else {
                 this.showOutput(`读取失败: ${result.message}`, 'error');
                 this.showOutput(`读取失败: ${result.message}`, 'error');
             }
             }
         } catch (error) {
         } catch (error) {
             this.showOutput(`读取URL时出错: ${error.message}`, 'error');
             this.showOutput(`读取URL时出错: ${error.message}`, 'error');
+        } finally {
+            this.setLoading(false);
         }
         }
     }
     }
     
     
@@ -85,33 +159,60 @@ class DownloadTool {
     }
     }
     
     
     async downloadUrls() {
     async downloadUrls() {
-        const proxy = this.proxySelect.value;
-    
-        this.showOutput('正在抓取画廊链接...', 'info');
-        const res = await fetch('/download_urls', {
-            method: 'POST',
-            headers: { 'Content-Type': 'application/json' },
-            body: JSON.stringify({ proxy })
-        });
-        const data = await res.json();
-        this.showOutput(data.message, data.success ? 'success' : 'error');
+        try {
+            const proxy = this.proxySelect.value;
+            
+            this.showOutput(`正在抓取画廊链接...\n代理: ${proxy}\n\n注意:此操作可能需要较长时间,请耐心等待...`, 'info');
+            
+            // 使用setTimeout确保UI不被阻塞
+            setTimeout(async () => {
+                try {
+                    const res = await fetch('/download_urls', {
+                        method: 'POST',
+                        headers: { 'Content-Type': 'application/json' },
+                        body: JSON.stringify({ proxy })
+                    });
+                    const data = await res.json();
+                    this.showOutput(data.message, data.success ? 'success' : 'error');
+                } catch (error) {
+                    this.showOutput(`抓取画廊链接时出错: ${error.message}`, 'error');
+                }
+            }, 100);
+            
+        } catch (error) {
+            this.showOutput(`抓取画廊链接时出错: ${error.message}`, 'error');
+        }
     }
     }
 
 
     async downloadImages() {
     async downloadImages() {
-        const proxy = this.proxySelect.value;
-    
-        this.showOutput('正在下载图片...', 'info');
-        const res = await fetch('/download_images', {
-            method: 'POST',
-            headers: { 'Content-Type': 'application/json' },
-            body: JSON.stringify({ proxy })
-        });
-        const data = await res.json();
-        this.showOutput(data.message, data.success ? 'success' : 'error');
+        try {
+            const proxy = this.proxySelect.value;
+            
+            this.showOutput(`正在下载图片...\n代理: ${proxy}\n\n注意:此操作可能需要较长时间,请耐心等待...`, 'info');
+            
+            // 使用setTimeout确保UI不被阻塞
+            setTimeout(async () => {
+                try {
+                    const res = await fetch('/download_images', {
+                        method: 'POST',
+                        headers: { 'Content-Type': 'application/json' },
+                        body: JSON.stringify({ proxy })
+                    });
+                    const data = await res.json();
+                    this.showOutput(data.message, data.success ? 'success' : 'error');
+                } catch (error) {
+                    this.showOutput(`下载图片时出错: ${error.message}`, 'error');
+                }
+            }, 100);
+            
+        } catch (error) {
+            this.showOutput(`下载图片时出错: ${error.message}`, 'error');
+        }
     }
     }
 
 
     async checkIncomplete() {
     async checkIncomplete() {
         try {
         try {
+            this.setLoading(true);
             this.showOutput('正在检查未完成文件...', 'info');
             this.showOutput('正在检查未完成文件...', 'info');
             
             
             const response = await fetch('/check_incomplete', {
             const response = await fetch('/check_incomplete', {
@@ -121,20 +222,22 @@ class DownloadTool {
             const result = await response.json();
             const result = await response.json();
             
             
             if (result.success) {
             if (result.success) {
-                // 这里先显示后端返回的测试数据,等您完成后端逻辑后会返回实际数据
                 let message = `检查完成!\n\n`;
                 let message = `检查完成!\n\n`;
-                message += `返回数据: ${JSON.stringify(result.data, null, 2)}`;
+                message += `${result.data}`;
                 this.showOutput(message, 'success');
                 this.showOutput(message, 'success');
             } else {
             } else {
                 this.showOutput(`检查失败: ${result.message}`, 'error');
                 this.showOutput(`检查失败: ${result.message}`, 'error');
             }
             }
         } catch (error) {
         } catch (error) {
             this.showOutput(`检查未完成文件时出错: ${error.message}`, 'error');
             this.showOutput(`检查未完成文件时出错: ${error.message}`, 'error');
+        } finally {
+            this.setLoading(false);
         }
         }
     }
     }
 
 
     async cleanFiles() {
     async cleanFiles() {
         try {
         try {
+            this.setLoading(true);
             this.showOutput('正在清理日志和JSON文件...', 'info');
             this.showOutput('正在清理日志和JSON文件...', 'info');
             
             
             const response = await fetch('/clean_files', {
             const response = await fetch('/clean_files', {
@@ -161,6 +264,8 @@ class DownloadTool {
             }
             }
         } catch (error) {
         } catch (error) {
             this.showOutput(`清理文件时出错: ${error.message}`, 'error');
             this.showOutput(`清理文件时出错: ${error.message}`, 'error');
+        } finally {
+            this.setLoading(false);
         }
         }
     }
     }
 
 

+ 28 - 20
step1.py

@@ -18,30 +18,28 @@ from typing import Dict, List, Optional
 import httpx
 import httpx
 from bs4 import BeautifulSoup
 from bs4 import BeautifulSoup
 from tqdm.asyncio import tqdm_asyncio
 from tqdm.asyncio import tqdm_asyncio
-from aiopath import AsyncPath
+from pathlib import Path
 
 
 # -------------------- 可配置常量 --------------------
 # -------------------- 可配置常量 --------------------
-CONCURRENCY = 20                 # 并发页数
-MAX_PAGE = 100                   # 单专辑最大翻页
-RETRY_PER_PAGE = 5               # 单页重试
-TIMEOUT = httpx.Timeout(10.0)    # 请求超时
+from config import config
+
+CONCURRENCY = config.concurrency
+MAX_PAGE = config.max_page
+RETRY_PER_PAGE = config.retry_per_page
+TIMEOUT = httpx.Timeout(config.timeout)
 IMG_SELECTOR = "#gdt"            # 图片入口区域
 IMG_SELECTOR = "#gdt"            # 图片入口区域
 FAILED_RECORD = "data/failed_keys.json"
 FAILED_RECORD = "data/failed_keys.json"
-LOG_LEVEL = logging.INFO
+LOG_LEVEL = getattr(logging, config.log_level.upper())
 # ----------------------------------------------------
 # ----------------------------------------------------
 
 
+# 确保数据目录存在
 if not os.path.exists("data"):
 if not os.path.exists("data"):
     os.mkdir("data")
     os.mkdir("data")
 
 
-logging.basicConfig(
-    level=LOG_LEVEL,
-    format="[%(asctime)s] [%(levelname)s] %(message)s",
-    handlers=[
-        logging.StreamHandler(sys.stdout),
-        logging.FileHandler("data/crawl.log", encoding="utf-8"),
-    ],
-)
-log = logging.getLogger("data/eh_crawler")
+# 使用统一的日志配置
+from logger import get_logger
+from realtime_logger import realtime_logger
+log = get_logger("step1", "crawl.log")
 
 
 # 预编译正则
 # 预编译正则
 ILLEGAL_CHARS = re.compile(r'[<>:"/\\|?*\x00-\x1F]')
 ILLEGAL_CHARS = re.compile(r'[<>:"/\\|?*\x00-\x1F]')
@@ -106,7 +104,7 @@ async def crawl_single_gallery(
         key = base_url.split("/")[-1]  # 用最后一截当 key
         key = base_url.split("/")[-1]  # 用最后一截当 key
         json_name = f"{key}.json"
         json_name = f"{key}.json"
 
 
-        folder_path: Optional[AsyncPath] = None
+        folder_path: Optional[Path] = None
         json_data: Dict[str, str] = {}
         json_data: Dict[str, str] = {}
         img_count = 1
         img_count = 1
         last_page = False
         last_page = False
@@ -122,12 +120,12 @@ async def crawl_single_gallery(
             soup = BeautifulSoup(html, "lxml")
             soup = BeautifulSoup(html, "lxml")
             title = soup.title.string if soup.title else "gallery"
             title = soup.title.string if soup.title else "gallery"
             clean_title = clean_folder_name(title)
             clean_title = clean_folder_name(title)
-            folder_path = AsyncPath("data/downloads") / clean_title
-            await folder_path.mkdir(parents=True, exist_ok=True)
+            folder_path = Path("data/downloads") / clean_title
+            folder_path.mkdir(parents=True, exist_ok=True)
 
 
             # 如果 json 已存在则跳过整个画廊
             # 如果 json 已存在则跳过整个画廊
             json_path = folder_path / json_name
             json_path = folder_path / json_name
-            if await json_path.exists():
+            if json_path.exists():
                 log.info(f"{json_name} 已存在,跳过")
                 log.info(f"{json_name} 已存在,跳过")
                 return True
                 return True
 
 
@@ -152,13 +150,23 @@ async def crawl_single_gallery(
                 img_count += 1
                 img_count += 1
 
 
         if json_data:
         if json_data:
-            await json_path.write_text(
+            json_path.write_text(
                 json.dumps(json_data, ensure_ascii=False, indent=2), encoding="utf-8"
                 json.dumps(json_data, ensure_ascii=False, indent=2), encoding="utf-8"
             )
             )
             log.info(f"保存成功 -> {json_path}  ({len(json_data)} 张)")
             log.info(f"保存成功 -> {json_path}  ({len(json_data)} 张)")
+            # 发送实时日志
+            try:
+                realtime_logger.broadcast_log_sync(f"画廊 {key} 抓取完成,共 {len(json_data)} 张图片", "SUCCESS", "step1")
+            except Exception as e:
+                log.warning(f"发送实时日志失败: {e}")
             return True
             return True
         else:
         else:
             log.warning(f"{key} 未解析到任何图片链接")
             log.warning(f"{key} 未解析到任何图片链接")
+            # 发送实时日志
+            try:
+                realtime_logger.broadcast_log_sync(f"画廊 {key} 未解析到任何图片链接", "WARNING", "step1")
+            except Exception as e:
+                log.warning(f"发送实时日志失败: {e}")
             return False
             return False
 
 
 
 

+ 25 - 22
step2.py

@@ -17,29 +17,27 @@ from typing import Dict, List
 
 
 import aiofiles
 import aiofiles
 import httpx
 import httpx
-from aiopath import AsyncPath
+from pathlib import Path
 from tqdm.asyncio import tqdm_asyncio
 from tqdm.asyncio import tqdm_asyncio
 
 
 # -------------------- 可配置常量 --------------------
 # -------------------- 可配置常量 --------------------
-CONCURRENCY = 20                 # 并发下载数
-RETRY_PER_IMG = 3                # 单图重试
-TIMEOUT = httpx.Timeout(15.0)    # 请求超时
+from config import config
+
+CONCURRENCY = config.concurrency
+RETRY_PER_IMG = config.retry_per_image
+TIMEOUT = httpx.Timeout(config.image_timeout)
 FAILED_RECORD = "data/failed_downloads.json"
 FAILED_RECORD = "data/failed_downloads.json"
-LOG_LEVEL = logging.INFO
+LOG_LEVEL = getattr(logging, config.log_level.upper())
 # ----------------------------------------------------
 # ----------------------------------------------------
 
 
+# 确保数据目录存在
 if not os.path.exists("data"):
 if not os.path.exists("data"):
     os.mkdir("data")
     os.mkdir("data")
 
 
-logging.basicConfig(
-    level=LOG_LEVEL,
-    format="[%(asctime)s] [%(levelname)s] %(message)s",
-    handlers=[
-        logging.StreamHandler(sys.stdout),
-        logging.FileHandler("data/download.log", encoding="utf-8"),
-    ],
-)
-log = logging.getLogger("data/img_downloader")
+# 使用统一的日志配置
+from logger import get_logger
+from realtime_logger import realtime_logger
+log = get_logger("step2", "download.log")
 
 
 # 预编译正则
 # 预编译正则
 IMG_URL_RE = re.compile(r'<img id="img" src="(.*?)"', re.S)
 IMG_URL_RE = re.compile(r'<img id="img" src="(.*?)"', re.S)
@@ -85,18 +83,23 @@ async def download_one(
                 ext = ext_match.group(1).lower() if ext_match else "jpg"
                 ext = ext_match.group(1).lower() if ext_match else "jpg"
                 final_path = img_path.with_suffix(f".{ext}")
                 final_path = img_path.with_suffix(f".{ext}")
 
 
-                if await AsyncPath(final_path).exists():
+                if final_path.exists():
                     log.info(f"已存在,跳过: {final_path.name}")
                     log.info(f"已存在,跳过: {final_path.name}")
                     return True
                     return True
 
 
                 async with client.stream("GET", real_url) as img_resp:
                 async with client.stream("GET", real_url) as img_resp:
                     img_resp.raise_for_status()
                     img_resp.raise_for_status()
-                    await AsyncPath(final_path).parent.mkdir(parents=True, exist_ok=True)
+                    final_path.parent.mkdir(parents=True, exist_ok=True)
                     async with aiofiles.open(final_path, "wb") as fp:
                     async with aiofiles.open(final_path, "wb") as fp:
                         async for chunk in img_resp.aiter_bytes(chunk_size=65536):
                         async for chunk in img_resp.aiter_bytes(chunk_size=65536):
                             await fp.write(chunk)
                             await fp.write(chunk)
 
 
                 log.info(f"[OK] {final_path.name}")
                 log.info(f"[OK] {final_path.name}")
+                # 发送实时日志
+                try:
+                    realtime_logger.broadcast_log_sync(f"下载完成: {final_path.name}", "SUCCESS", "step2")
+                except Exception as e:
+                    log.warning(f"发送实时日志失败: {e}")
                 return True
                 return True
 
 
             except httpx.HTTPStatusError as exc:
             except httpx.HTTPStatusError as exc:
@@ -120,24 +123,24 @@ async def download_one(
 async def scan_tasks() -> List[Dict[str, str]]:
 async def scan_tasks() -> List[Dict[str, str]]:
     """扫描 downloads/ 下所有 json,返回待下载列表"""
     """扫描 downloads/ 下所有 json,返回待下载列表"""
     result = []
     result = []
-    root = AsyncPath("data/downloads")
-    if not await root.exists():
+    root = Path("data/downloads")
+    if not root.exists():
         return result
         return result
 
 
-    async for json_path in root.rglob("*.json"):
+    for json_path in root.rglob("*.json"):
         folder = json_path.parent
         folder = json_path.parent
         try:
         try:
-            data: Dict[str, str] = json.loads(await json_path.read_text(encoding="utf-8"))
+            data: Dict[str, str] = json.loads(json_path.read_text(encoding="utf-8"))
         except Exception as exc:
         except Exception as exc:
             log.warning(f"读取 json 失败 {json_path} -> {exc}")
             log.warning(f"读取 json 失败 {json_path} -> {exc}")
             continue
             continue
 
 
         for img_name, img_url in data.items():
         for img_name, img_url in data.items():
             img_path = folder / img_name  # 无后缀
             img_path = folder / img_name  # 无后缀
-            # 异步判断任意后缀是否存在
+            # 判断任意后缀是否存在
             exists = False
             exists = False
             for ext in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
             for ext in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
-                if await img_path.with_suffix(ext).exists():
+                if img_path.with_suffix(ext).exists():
                     exists = True
                     exists = True
                     break
                     break
             if not exists:
             if not exists:

+ 10 - 6
templates/index.html

@@ -39,7 +39,7 @@
         </form>
         </form>
         
         
         <div class="output-section">
         <div class="output-section">
-            <h3>以下是一个输出框, 但貌似没啥卵用...</h3>
+            <h3>操作日志</h3>
             <pre id="output" class="output-area"></pre>
             <pre id="output" class="output-area"></pre>
         </div>
         </div>
 
 
@@ -51,17 +51,21 @@
                 <p><strong>工具使用步骤:</strong></p>
                 <p><strong>工具使用步骤:</strong></p>
                 <ol>
                 <ol>
                     <li>从下拉框选择代理设置(代理配置保存在项目根目录的proxy.txt中)</li>
                     <li>从下拉框选择代理设置(代理配置保存在项目根目录的proxy.txt中)</li>
-                    <li>将URL复制到项目目录下的data/targets.txt中, 一个画廊一个URL</li>
-                    <li>在<a href="https://e-hentai.org/" target="_blank">点解这里</a>获取需要下载的画廊URL</li>
-                    <li><del>不要问什么不直接填到页面, 我懒得写</del></li>
+                    <li>将URL复制到项目目录下的data/targets.txt中,一个画廊一个URL</li>
+                    <li>在<a href="https://e-hentai.org/" target="_blank">这里</a>获取需要下载的画廊URL</li>
                     <li>点击"读取目标URL"加载 targets.txt 中的URL列表</li>
                     <li>点击"读取目标URL"加载 targets.txt 中的URL列表</li>
                     <li>点击"下载URL"开始抓取画廊链接</li>
                     <li>点击"下载URL"开始抓取画廊链接</li>
                     <li>点击"下载图片"开始下载图片文件</li>
                     <li>点击"下载图片"开始下载图片文件</li>
                     <li>使用"检查未完成"查看下载进度</li>
                     <li>使用"检查未完成"查看下载进度</li>
                     <li>使用"清理日志和JSON文件"清理临时文件</li>
                     <li>使用"清理日志和JSON文件"清理临时文件</li>
                 </ol>
                 </ol>
-                <p><strong>注意:</strong>请确保代理设置正确,且 targets.txt 文件中已添加目标URL。</p>
-                <p><strong>代理配置:</strong>在项目根目录的proxy.txt文件中,每行一个代理地址,格式为 IP:端口</p>
+                <p><strong>注意事项:</strong></p>
+                <ul>
+                    <li>请确保代理设置正确,且 targets.txt 文件中已添加目标URL</li>
+                    <li>代理配置:在项目根目录的proxy.txt文件中,每行一个代理地址,格式为 IP:端口</li>
+                    <li>下载的图片会保存在 data/downloads 目录下,按画廊名称分文件夹存储</li>
+                    <li>如果下载中断,可以重新运行"下载图片"继续未完成的下载</li>
+                </ul>
             </div>
             </div>
         </div>
         </div>
     </div>
     </div>

+ 19 - 11
utils.py

@@ -1,27 +1,35 @@
-# utils.py
-from pathlib import Path
-from typing import List
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+工具函数模块
+"""
+from typing import Optional
 
 
-import logging
-
-# 把 1step.py 的主逻辑封装成函数
+from logger import get_logger
 from step1 import main as step1_main
 from step1 import main as step1_main
 from step2 import main as step2_main
 from step2 import main as step2_main
 
 
-log = logging.getLogger("utils")
+# 设置日志
+logger = get_logger("utils")
 
 
-async def run_step1(proxy: str | None = None) -> str:
+async def run_step1(proxy: Optional[str] = None) -> str:
+    """执行第一步:抓取画廊链接"""
     try:
     try:
+        logger.info("开始执行画廊链接抓取")
         await step1_main(proxy)
         await step1_main(proxy)
+        logger.info("画廊链接抓取完成")
         return "画廊链接抓取完成!"
         return "画廊链接抓取完成!"
     except Exception as e:
     except Exception as e:
-        log.exception("step1 执行失败")
+        logger.exception("step1 执行失败")
         return f"抓取失败:{e}"
         return f"抓取失败:{e}"
 
 
-async def run_step2(proxy: str | None = None) -> str:
+async def run_step2(proxy: Optional[str] = None) -> str:
+    """执行第二步:下载图片"""
     try:
     try:
+        logger.info("开始执行图片下载")
         await step2_main(proxy)
         await step2_main(proxy)
+        logger.info("图片下载完成")
         return "图片下载完成!"
         return "图片下载完成!"
     except Exception as e:
     except Exception as e:
-        log.exception("step2 执行失败")
+        logger.exception("step2 执行失败")
         return f"下载失败:{e}"
         return f"下载失败:{e}"