jack hai 3 meses
pai
achega
e2e51537b4
Modificáronse 14 ficheiros con 1031 adicións e 189 borrados
  1. 2 1
      .gitignore
  2. 191 1
      Readme.md
  3. 92 0
      config.py
  4. 88 0
      logger.py
  5. 184 82
      main.py
  6. 81 0
      performance.py
  7. 117 0
      realtime_logger.py
  8. 25 22
      requirements.txt
  9. 40 0
      start.py
  10. 129 24
      static/script.js
  11. 28 20
      step1.py
  12. 25 22
      step2.py
  13. 10 6
      templates/index.html
  14. 19 11
      utils.py

+ 2 - 1
.gitignore

@@ -63,4 +63,5 @@ target/
 other/split_clash_config/split_config
 ai_news/save_data
 daily/*.txt
-data/
+data/
+eh.tar

+ 191 - 1
Readme.md

@@ -1,4 +1,194 @@
+# EH-Downloader
+
+一个基于 FastAPI 的 E-Hentai 画廊下载工具,支持异步批量下载画廊图片。
+
+## 功能特性
+
+- 🚀 **异步下载**: 基于 asyncio 的高性能异步下载
+- 🌐 **Web界面**: 现代化的 Web 用户界面
+- 🔧 **代理支持**: 支持 HTTP 代理配置
+- 📁 **智能管理**: 自动创建目录结构,按画廊分文件夹存储
+- 🔄 **断点续传**: 支持中断后继续下载
+- 📊 **进度监控**: 实时显示下载进度和状态
+- 🧹 **自动清理**: 一键清理临时文件和日志
+
+## 快速开始
+
+### 环境要求
+
+- Python 3.11+
+- 网络代理(可选,用于访问 E-Hentai)
+
+### 安装依赖
+
+```bash
 pip install -r requirements.txt
+```
+
+### 运行应用
+
+```bash
 python main.py
+```
+
+访问 `http://localhost:8000` 使用 Web 界面。
+
+### Docker 部署
+
+```bash
+# 构建镜像
+docker build -t eh-downloader .
+
+# 运行容器
+docker-compose up -d
+```
+
+## 使用说明
+
+### 1. 配置代理
+
+在项目根目录的 `proxy.txt` 文件中添加代理配置,每行一个:
+
+```
+127.0.0.1:7890
+192.168.1.100:8080
+```
+
+### 2. 添加目标URL
+
+在 `data/targets.txt` 文件中添加要下载的画廊URL,每行一个:
+
+```
+https://e-hentai.org/g/1234567/abcdef123456
+https://e-hentai.org/g/2345678/bcdefg234567
+```
+
+### 3. 开始下载
+
+1. 打开 Web 界面
+2. 选择代理设置
+3. 点击"读取目标URL"加载URL列表
+4. 点击"下载URL"抓取画廊链接
+5. 点击"下载图片"开始下载图片
+
+## 项目结构
+
+```
+ehentai-fastapi/
+├── main.py              # 主应用文件
+├── config.py            # 配置管理
+├── logger.py            # 日志管理
+├── utils.py             # 工具函数
+├── step1.py             # 画廊链接抓取
+├── step2.py             # 图片下载
+├── downloader.py        # 下载器类
+├── templates/           # HTML模板
+├── static/              # 静态资源
+├── data/                # 数据目录
+│   ├── targets.txt      # 目标URL列表
+│   ├── downloads/        # 下载文件存储
+│   └── *.log           # 日志文件
+├── proxy.txt            # 代理配置
+├── requirements.txt     # 依赖列表
+├── Dockerfile           # Docker配置
+└── docker-compose.yaml  # Docker Compose配置
+```
+
+## 配置说明
+
+### 应用配置 (config.py)
+
+- `concurrency`: 并发数,默认20
+- `max_page`: 单专辑最大翻页数,默认100
+- `retry_per_page`: 单页重试次数,默认5
+- `retry_per_image`: 单图重试次数,默认3
+- `timeout`: 请求超时时间,默认10秒
+- `image_timeout`: 图片下载超时时间,默认15秒
+
+### 日志配置
+
+- 日志级别:INFO
+- 日志文件:`data/app.log`, `data/crawl.log`, `data/download.log`
+- 日志格式:`[时间] [级别] 消息`
+
+## API 接口
+
+### GET /
+主页面
+
+### POST /load_urls
+读取目标URL列表
+
+### POST /download_urls
+开始抓取画廊链接
+
+### POST /download_images
+开始下载图片
+
+### POST /check_incomplete
+检查未完成的下载
+
+### POST /clean_files
+清理临时文件
+
+### POST /clear
+清除输出
+
+## 注意事项
+
+1. **网络要求**: 需要稳定的网络连接和合适的代理
+2. **存储空间**: 确保有足够的磁盘空间存储下载的图片
+3. **合规使用**: 请遵守相关法律法规和网站使用条款
+4. **代理配置**: 建议使用稳定的代理服务以确保下载成功率
+
+## 故障排除
+
+### 常见问题
+
+1. **下载失败**: 检查代理配置和网络连接
+2. **文件损坏**: 重新下载或检查存储空间
+3. **权限错误**: 确保应用有读写权限
+4. **内存不足**: 降低并发数或增加系统内存
+
+### 日志查看
+
+```bash
+# 查看应用日志
+tail -f data/app.log
+
+# 查看抓取日志
+tail -f data/crawl.log
+
+# 查看下载日志
+tail -f data/download.log
+```
+
+## 开发说明
+
+### 代码结构
+
+- **main.py**: FastAPI 应用主文件
+- **config.py**: 配置管理模块
+- **logger.py**: 日志管理模块
+- **utils.py**: 工具函数模块
+- **step1.py**: 画廊链接抓取逻辑
+- **step2.py**: 图片下载逻辑
+
+### 扩展功能
+
+1. 添加新的下载源
+2. 支持更多图片格式
+3. 实现下载队列管理
+4. 添加用户认证系统
+
+## 许可证
+
+本项目仅供学习和研究使用,请遵守相关法律法规。
+
+## 更新日志
 
-open: ip:8000
+### v1.0.0
+- 初始版本发布
+- 支持基本的画廊下载功能
+- Web界面和API接口
+- Docker支持

+ 92 - 0
config.py

@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+配置管理模块
+"""
+import os
+from pathlib import Path
+from typing import List, Optional
+from pydantic import BaseModel, Field
+
+
+class AppConfig(BaseModel):
+    """应用配置"""
+    # 基础配置
+    app_name: str = "EH-Downloader"
+    app_version: str = "1.0.0"
+    debug: bool = False
+    
+    # 服务器配置
+    host: str = "0.0.0.0"
+    port: int = 8000
+    
+    # 数据目录配置
+    data_dir: str = "data"
+    downloads_dir: str = "data/downloads"
+    targets_file: str = "data/targets.txt"
+    proxy_file: str = "proxy.txt"
+    
+    # 爬虫配置
+    concurrency: int = 20
+    max_page: int = 100
+    retry_per_page: int = 5
+    retry_per_image: int = 3
+    timeout: float = 10.0
+    image_timeout: float = 15.0
+    
+    # 日志配置
+    log_level: str = "INFO"
+    log_format: str = "[%(asctime)s] [%(levelname)s] %(message)s"
+    
+    # 文件清理配置
+    cleanup_patterns: List[str] = ["**/*.log", "**/*.json"]
+    cleanup_exclude: List[str] = ["data/targets.txt"]
+    
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        # 确保目录存在
+        self._ensure_directories()
+    
+    def _ensure_directories(self):
+        """确保必要的目录存在"""
+        Path(self.data_dir).mkdir(exist_ok=True)
+        Path(self.downloads_dir).mkdir(parents=True, exist_ok=True)
+    
+    @property
+    def targets_path(self) -> Path:
+        """获取targets文件路径"""
+        return Path(self.targets_file)
+    
+    @property
+    def proxy_path(self) -> Path:
+        """获取proxy文件路径"""
+        return Path(self.proxy_file)
+    
+    def get_proxies(self) -> List[str]:
+        """读取代理列表"""
+        if not self.proxy_path.exists():
+            return ["127.0.0.1:7890"]
+        
+        try:
+            with open(self.proxy_path, 'r', encoding='utf-8') as f:
+                proxies = [line.strip() for line in f.readlines() if line.strip()]
+            return proxies if proxies else ["127.0.0.1:7890"]
+        except Exception:
+            return ["127.0.0.1:7890"]
+    
+    def get_targets(self) -> List[str]:
+        """读取目标URL列表"""
+        if not self.targets_path.exists():
+            return []
+        
+        try:
+            with open(self.targets_path, 'r', encoding='utf-8') as f:
+                urls = [line.strip() for line in f.readlines() if line.strip()]
+            # 过滤掉注释行
+            return [url for url in urls if url and not url.startswith('#')]
+        except Exception:
+            return []
+
+
+# 全局配置实例
+config = AppConfig()

+ 88 - 0
logger.py

@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+日志管理模块
+"""
+import logging
+import sys
+from pathlib import Path
+from typing import Optional
+
+from config import config
+
+
+class LoggerManager:
+    """日志管理器"""
+    
+    _loggers = {}
+    
+    @classmethod
+    def get_logger(cls, name: str, log_file: Optional[str] = None) -> logging.Logger:
+        """获取日志记录器"""
+        if name in cls._loggers:
+            return cls._loggers[name]
+        
+        logger = logging.getLogger(name)
+        logger.setLevel(getattr(logging, config.log_level.upper()))
+        
+        # 避免重复添加处理器
+        if logger.handlers:
+            return logger
+        
+        # 控制台处理器
+        console_handler = logging.StreamHandler(sys.stdout)
+        console_handler.setLevel(getattr(logging, config.log_level.upper()))
+        console_formatter = logging.Formatter(config.log_format)
+        console_handler.setFormatter(console_formatter)
+        logger.addHandler(console_handler)
+        
+        # 文件处理器
+        if log_file:
+            log_path = Path(config.data_dir) / log_file
+            file_handler = logging.FileHandler(log_path, encoding='utf-8')
+            file_handler.setLevel(getattr(logging, config.log_level.upper()))
+            file_formatter = logging.Formatter(config.log_format)
+            file_handler.setFormatter(file_formatter)
+            logger.addHandler(file_handler)
+        
+        # WebSocket 实时日志处理器
+        logger.addHandler(WebSocketLogHandler())
+        
+        cls._loggers[name] = logger
+        return logger
+    
+    @classmethod
+    def setup_root_logger(cls):
+        """设置根日志记录器"""
+        logging.basicConfig(
+            level=getattr(logging, config.log_level.upper()),
+            format=config.log_format,
+            handlers=[
+                logging.StreamHandler(sys.stdout),
+                logging.FileHandler(Path(config.data_dir) / "app.log", encoding='utf-8'),
+                WebSocketLogHandler(),
+            ]
+        )
+
+
+# 便捷函数
+def get_logger(name: str, log_file: Optional[str] = None) -> logging.Logger:
+    """获取日志记录器的便捷函数"""
+    return LoggerManager.get_logger(name, log_file)
+
+
+class WebSocketLogHandler(logging.Handler):
+    """将日志通过实时日志器广播到 WebSocket 客户端"""
+    
+    def emit(self, record: logging.LogRecord) -> None:
+        try:
+            message = self.format(record)
+            level = record.levelname
+            source = record.name
+            # 走同步接口,内部会尝试调度到事件循环
+            # 延迟导入,避免循环依赖
+            from realtime_logger import realtime_logger
+            realtime_logger.broadcast_log_sync(message, level, source)
+        except Exception:
+            # 保证日志不因 WebSocket 发送失败而中断
+            pass

+ 184 - 82
main.py

@@ -1,49 +1,99 @@
-from fastapi import FastAPI, Request
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+EH-Downloader 主应用
+"""
+import glob
+import os
+from pathlib import Path
+from typing import List
+
+from fastapi import FastAPI, Request, HTTPException, WebSocket, WebSocketDisconnect
 from fastapi.staticfiles import StaticFiles
 from fastapi.templating import Jinja2Templates
 from fastapi.responses import JSONResponse, FileResponse
-import uvicorn
-import glob
-import os
 from pydantic import BaseModel
+import uvicorn
+import asyncio
+import threading
+import json
+
+from config import config
+from logger import get_logger
+from realtime_logger import realtime_logger
 import step2
-from utils import *
+from utils import run_step1, run_step2
+
+# 设置日志
+logger = get_logger("main", "app.log")
 
-app = FastAPI(title="EH-Downloader", version="1.0.0")
+app = FastAPI(
+    title=config.app_name,
+    version=config.app_version,
+    description="E-Hentai 画廊下载工具"
+)
 
-# 在应用启动时检查并创建data文件夹和targets.txt,以及proxy.txt
 @app.on_event("startup")
 async def startup_event():
-    # 检查并创建data文件夹
-    data_dir = "data"
-    if not os.path.exists(data_dir):
-        os.makedirs(data_dir)
-        print(f"创建目录: {data_dir}")
+    """应用启动事件"""
+    logger.info(f"启动 {config.app_name} v{config.app_version}")
+    # 注册事件循环到实时日志器,便于跨线程广播
+    try:
+        realtime_logger.set_loop(asyncio.get_running_loop())
+    except RuntimeError:
+        # 若获取失败则忽略
+        pass
     
-    # 检查并创建targets.txt文件
-    targets_file = os.path.join(data_dir, "targets.txt")
-    if not os.path.exists(targets_file):
-        with open(targets_file, 'w', encoding='utf-8') as f:
+    # 确保目录存在
+    config._ensure_directories()
+    
+    # 创建默认targets.txt文件
+    if not config.targets_path.exists():
+        with open(config.targets_path, 'w', encoding='utf-8') as f:
             f.write("# 在这里添加目标URL,每行一个\n")
             f.write("# 示例:\n")
             f.write("https://e-hentai.org/g/3550066/47d6393550\n")
-        print(f"创建文件: {targets_file}")
-    else:
-        print(f"文件已存在: {targets_file}")
+        logger.info(f"创建文件: {config.targets_path}")
     
-    # 检查并创建proxy.txt文件
-    proxy_file = "proxy.txt"
-    if not os.path.exists(proxy_file):
-        with open(proxy_file, 'w', encoding='utf-8') as f:
+    # 创建默认proxy.txt文件
+    if not config.proxy_path.exists():
+        with open(config.proxy_path, 'w', encoding='utf-8') as f:
             f.write("127.0.0.1:7890\n")
-        print(f"创建文件: {proxy_file}")
-    else:
-        print(f"文件已存在: {proxy_file}")
+        logger.info(f"创建文件: {config.proxy_path}")
+    
+    logger.info("应用启动完成")
 
 # 挂载静态文件和模板
 app.mount("/static", StaticFiles(directory="static"), name="static")
 templates = Jinja2Templates(directory="templates")
 
+# WebSocket 路由
+@app.websocket("/ws")
+async def websocket_endpoint(websocket: WebSocket):
+    """WebSocket连接处理"""
+    await websocket.accept()
+    realtime_logger.add_connection(websocket)
+    
+    try:
+        # 发送最近的日志
+        recent_logs = await realtime_logger.get_recent_logs(20)
+        for log_entry in recent_logs:
+            await websocket.send_text(json.dumps(log_entry, ensure_ascii=False))
+        
+        # 保持连接
+        while True:
+            try:
+                # 等待客户端消息(心跳检测)
+                data = await websocket.receive_text()
+                if data == "ping":
+                    await websocket.send_text("pong")
+            except WebSocketDisconnect:
+                break
+    except Exception as e:
+        logger.error(f"WebSocket错误: {e}")
+    finally:
+        realtime_logger.remove_connection(websocket)
+
 # favicon 路由
 @app.get("/favicon.ico", include_in_schema=False)
 async def favicon():
@@ -52,37 +102,22 @@ async def favicon():
 @app.get("/")
 async def home(request: Request):
     """主页面"""
-    # 读取proxy.txt中的代理列表
-    proxies = []
     try:
-        with open("proxy.txt", 'r', encoding='utf-8') as f:
-            proxies = [line.strip() for line in f.readlines() if line.strip()]
+        proxies = config.get_proxies()
+        return templates.TemplateResponse("index.html", {
+            "request": request,
+            "proxies": proxies,
+            "default_proxy": proxies[0] if proxies else "127.0.0.1:7890"
+        })
     except Exception as e:
-        print(f"读取proxy.txt失败: {e}")
-        proxies = ["127.0.0.1:7890"]
-    
-    # 如果没有代理配置,使用默认值
-    if not proxies:
-        proxies = ["127.0.0.1:7890"]
-    
-    return templates.TemplateResponse("index.html", {
-        "request": request,
-        "proxies": proxies,
-        "default_proxy": proxies[0] if proxies else "127.0.0.1:7890"
-    })
+        logger.error(f"渲染主页失败: {e}")
+        raise HTTPException(status_code=500, detail="服务器内部错误")
 
 @app.post("/load_urls")
 async def load_urls():
     """读取 targets.txt 文件中的URL"""
     try:
-        file_path = "data/targets.txt"
-        
-        # 读取文件内容
-        with open(file_path, 'r', encoding='utf-8') as f:
-            urls = [line.strip() for line in f.readlines() if line.strip()]
-        
-        # 过滤掉空行和注释行(以#开头的行)
-        urls = [url for url in urls if url and not url.startswith('#')]
+        urls = config.get_targets()
         
         if not urls:
             return JSONResponse({
@@ -91,6 +126,7 @@ async def load_urls():
                 "urls": []
             })
         
+        logger.info(f"成功读取 {len(urls)} 个URL")
         return JSONResponse({
             "success": True,
             "message": f"成功读取 {len(urls)} 个URL",
@@ -98,6 +134,7 @@ async def load_urls():
         })
         
     except Exception as e:
+        logger.error(f"读取URL失败: {e}")
         return JSONResponse({
             "success": False,
             "message": f"读取文件时出错: {str(e)}",
@@ -118,25 +155,77 @@ class ProxyRequest(BaseModel):
 
 @app.post("/download_urls")
 async def download_urls(req: ProxyRequest):
-    # 解析proxy字符串为ip和port
-    if ":" in req.proxy:
-        ip, port = req.proxy.split(":", 1)
-        proxy = f"http://{ip}:{port}"
-    else:
-        proxy = None
-    msg = await run_step1(proxy)
-    return JSONResponse({"success": True, "message": msg})
+    """下载画廊链接"""
+    try:
+        # 解析proxy字符串为ip和port
+        if ":" in req.proxy:
+            ip, port = req.proxy.split(":", 1)
+            proxy = f"http://{ip}:{port}"
+        else:
+            proxy = None
+        
+        # 发送实时日志
+        await realtime_logger.broadcast_log(f"开始抓取画廊链接,代理: {proxy}", "INFO", "step1")
+        
+        # 在后台线程中执行,避免阻塞
+        def run_step1_sync():
+            import asyncio
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+            try:
+                return loop.run_until_complete(run_step1(proxy))
+            finally:
+                loop.close()
+        
+        # 使用线程池执行
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor() as executor:
+            future = executor.submit(run_step1_sync)
+            msg = future.result()
+        
+        await realtime_logger.broadcast_log(f"画廊链接抓取完成: {msg}", "SUCCESS", "step1")
+        return JSONResponse({"success": True, "message": msg})
+    except Exception as e:
+        await realtime_logger.broadcast_log(f"抓取画廊链接失败: {e}", "ERROR", "step1")
+        logger.error(f"抓取画廊链接失败: {e}")
+        return JSONResponse({"success": False, "message": f"抓取失败: {str(e)}"})
 
 @app.post("/download_images")
 async def download_images(req: ProxyRequest):
-    # 解析proxy字符串为ip和port
-    if ":" in req.proxy:
-        ip, port = req.proxy.split(":", 1)
-        proxy = f"http://{ip}:{port}"
-    else:
-        proxy = None
-    msg = await run_step2(proxy)
-    return JSONResponse({"success": True, "message": msg})
+    """下载图片"""
+    try:
+        # 解析proxy字符串为ip和port
+        if ":" in req.proxy:
+            ip, port = req.proxy.split(":", 1)
+            proxy = f"http://{ip}:{port}"
+        else:
+            proxy = None
+        
+        # 发送实时日志
+        await realtime_logger.broadcast_log(f"开始下载图片,代理: {proxy}", "INFO", "step2")
+        
+        # 在后台线程中执行,避免阻塞
+        def run_step2_sync():
+            import asyncio
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+            try:
+                return loop.run_until_complete(run_step2(proxy))
+            finally:
+                loop.close()
+        
+        # 使用线程池执行
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor() as executor:
+            future = executor.submit(run_step2_sync)
+            msg = future.result()
+        
+        await realtime_logger.broadcast_log(f"图片下载完成: {msg}", "SUCCESS", "step2")
+        return JSONResponse({"success": True, "message": msg})
+    except Exception as e:
+        await realtime_logger.broadcast_log(f"下载图片失败: {e}", "ERROR", "step2")
+        logger.error(f"下载图片失败: {e}")
+        return JSONResponse({"success": False, "message": f"下载失败: {str(e)}"})
 
 @app.post("/clean_files")
 async def clean_files():
@@ -145,24 +234,23 @@ async def clean_files():
         deleted_files = []
         error_files = []
         
-        # 查找当前目录及所有子目录中的 .log 和 .json 文件
-        patterns = ["**/*.log", "**/*.json"]
-        
-        for pattern in patterns:
+        # 使用配置中的清理模式
+        for pattern in config.cleanup_patterns:
             for file_path in glob.glob(pattern, recursive=True):
                 try:
-                    # 跳过 data/targets.txt 文件,因为这是配置文件
-                    if file_path == "data/targets.txt":
+                    # 跳过排除的文件
+                    if file_path in config.cleanup_exclude:
                         continue
                     
                     os.remove(file_path)
                     deleted_files.append(file_path)
-                    print(f"已删除文件: {file_path}")
+                    logger.info(f"已删除文件: {file_path}")
                 except Exception as e:
                     error_files.append(f"{file_path}: {str(e)}")
-                    print(f"删除文件失败 {file_path}: {str(e)}")
+                    logger.error(f"删除文件失败 {file_path}: {str(e)}")
         
         if error_files:
+            logger.warning(f"清理完成,但部分文件删除失败: {len(error_files)} 个")
             return JSONResponse({
                 "success": False,
                 "message": f"清理完成,但部分文件删除失败",
@@ -172,6 +260,7 @@ async def clean_files():
                 "error_files": error_files
             })
         else:
+            logger.info(f"成功清理 {len(deleted_files)} 个文件")
             return JSONResponse({
                 "success": True,
                 "message": f"成功清理 {len(deleted_files)} 个文件",
@@ -181,6 +270,7 @@ async def clean_files():
             })
             
     except Exception as e:
+        logger.error(f"清理过程中出错: {e}")
         return JSONResponse({
             "success": False,
             "message": f"清理过程中出错: {str(e)}",
@@ -190,14 +280,26 @@ async def clean_files():
 
 @app.post("/check_incomplete")
 async def check_incomplete():
-    result = await step2.scan_tasks()
-
     """检查未完成文件"""
-    return JSONResponse({
-        "success": True,
-        "message": "检查未完成文件功能已就绪",
-        "data": f"共 {len(result)} 个文件未下载"
-    })
+    try:
+        result = await step2.scan_tasks()
+        logger.info(f"检查未完成文件: {len(result)} 个")
+        return JSONResponse({
+            "success": True,
+            "message": "检查未完成文件功能已就绪",
+            "data": f"共 {len(result)} 个文件未下载"
+        })
+    except Exception as e:
+        logger.error(f"检查未完成文件失败: {e}")
+        return JSONResponse({
+            "success": False,
+            "message": f"检查失败: {str(e)}"
+        })
 
 if __name__ == "__main__":
-    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)
+    uvicorn.run(
+        "main:app", 
+        host=config.host, 
+        port=config.port, 
+        reload=config.debug
+    )

+ 81 - 0
performance.py

@@ -0,0 +1,81 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+性能优化模块
+"""
+import asyncio
+import time
+from typing import Dict, Any
+from functools import wraps
+
+from logger import get_logger
+
+logger = get_logger("performance")
+
+
+def monitor_performance(func):
+    """性能监控装饰器"""
+    @wraps(func)
+    async def async_wrapper(*args, **kwargs):
+        start_time = time.time()
+        try:
+            result = await func(*args, **kwargs)
+            execution_time = time.time() - start_time
+            logger.info(f"{func.__name__} 执行完成,耗时: {execution_time:.2f}秒")
+            return result
+        except Exception as e:
+            execution_time = time.time() - start_time
+            logger.error(f"{func.__name__} 执行失败,耗时: {execution_time:.2f}秒,错误: {e}")
+            raise
+    
+    @wraps(func)
+    def sync_wrapper(*args, **kwargs):
+        start_time = time.time()
+        try:
+            result = func(*args, **kwargs)
+            execution_time = time.time() - start_time
+            logger.info(f"{func.__name__} 执行完成,耗时: {execution_time:.2f}秒")
+            return result
+        except Exception as e:
+            execution_time = time.time() - start_time
+            logger.error(f"{func.__name__} 执行失败,耗时: {execution_time:.2f}秒,错误: {e}")
+            raise
+    
+    if asyncio.iscoroutinefunction(func):
+        return async_wrapper
+    else:
+        return sync_wrapper
+
+
+class PerformanceMonitor:
+    """性能监控器"""
+    
+    def __init__(self):
+        self.metrics: Dict[str, Any] = {}
+        self.start_time = time.time()
+    
+    def start_timer(self, name: str):
+        """开始计时"""
+        self.metrics[name] = {"start": time.time()}
+    
+    def end_timer(self, name: str):
+        """结束计时"""
+        if name in self.metrics:
+            self.metrics[name]["end"] = time.time()
+            self.metrics[name]["duration"] = (
+                self.metrics[name]["end"] - self.metrics[name]["start"]
+            )
+            logger.info(f"{name} 耗时: {self.metrics[name]['duration']:.2f}秒")
+    
+    def get_summary(self) -> Dict[str, Any]:
+        """获取性能摘要"""
+        total_time = time.time() - self.start_time
+        return {
+            "total_time": total_time,
+            "metrics": self.metrics,
+            "summary": f"总运行时间: {total_time:.2f}秒"
+        }
+
+
+# 全局性能监控器
+perf_monitor = PerformanceMonitor()

+ 117 - 0
realtime_logger.py

@@ -0,0 +1,117 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+实时日志输出模块
+"""
+import asyncio
+import json
+import time
+import threading
+from typing import List, Dict, Any, Optional
+from pathlib import Path
+
+import logging
+
+logger = logging.getLogger("realtime_logger")
+
+
+class RealtimeLogger:
+    """实时日志记录器"""
+    
+    def __init__(self):
+        self.connections: List[Any] = []
+        self.log_buffer: List[Dict[str, Any]] = []
+        self.max_buffer_size = 1000
+        self._lock = threading.Lock()
+        self._loop: Optional[asyncio.AbstractEventLoop] = None
+    
+    def set_loop(self, loop: asyncio.AbstractEventLoop) -> None:
+        """注册主事件循环,便于跨线程安全调度发送任务"""
+        self._loop = loop
+    
+    def add_connection(self, websocket):
+        """添加WebSocket连接"""
+        with self._lock:
+            self.connections.append(websocket)
+        logger.info(f"新增WebSocket连接,当前连接数: {len(self.connections)}")
+    
+    def remove_connection(self, websocket):
+        """移除WebSocket连接"""
+        with self._lock:
+            if websocket in self.connections:
+                self.connections.remove(websocket)
+        logger.info(f"移除WebSocket连接,当前连接数: {len(self.connections)}")
+    
+    async def broadcast_log(self, message: str, level: str = "INFO", source: str = "system"):
+        """广播日志消息到所有连接的客户端"""
+        log_entry = {
+            "timestamp": time.time(),
+            "time": time.strftime("%H:%M:%S"),
+            "level": level,
+            "source": source,
+            "message": message
+        }
+        
+        # 添加到缓冲区
+        with self._lock:
+            self.log_buffer.append(log_entry)
+            if len(self.log_buffer) > self.max_buffer_size:
+                self.log_buffer = self.log_buffer[-self.max_buffer_size:]
+        
+        # 广播到所有连接
+        if self.connections:
+            message_data = json.dumps(log_entry, ensure_ascii=False)
+            disconnected = []
+            
+            for websocket in self.connections.copy():  # 使用副本避免并发修改
+                try:
+                    await websocket.send_text(message_data)
+                except Exception as e:
+                    logger.warning(f"发送消息失败: {e}")
+                    disconnected.append(websocket)
+            
+            # 清理断开的连接
+            for ws in disconnected:
+                self.remove_connection(ws)
+    
+    def broadcast_log_sync(self, message: str, level: str = "INFO", source: str = "system"):
+        """同步广播日志消息(用于非异步环境)"""
+        log_entry = {
+            "timestamp": time.time(),
+            "time": time.strftime("%H:%M:%S"),
+            "level": level,
+            "source": source,
+            "message": message
+        }
+        
+        # 添加到缓冲区
+        with self._lock:
+            self.log_buffer.append(log_entry)
+            if len(self.log_buffer) > self.max_buffer_size:
+                self.log_buffer = self.log_buffer[-self.max_buffer_size:]
+        
+        # 若已注册事件循环,尝试在线程安全地调度异步广播
+        if self._loop is not None:
+            try:
+                asyncio.run_coroutine_threadsafe(
+                    self.broadcast_log(message=message, level=level, source=source),
+                    self._loop,
+                )
+            except Exception:
+                # 忽略发送失败,缓冲区仍可用于新连接回放
+                pass
+    
+    async def get_recent_logs(self, count: int = 50) -> List[Dict[str, Any]]:
+        """获取最近的日志"""
+        with self._lock:
+            return self.log_buffer[-count:] if self.log_buffer else []
+    
+    def clear_buffer(self):
+        """清空日志缓冲区"""
+        with self._lock:
+            self.log_buffer.clear()
+        logger.info("日志缓冲区已清空")
+
+
+# 全局实时日志记录器
+realtime_logger = RealtimeLogger()

+ 25 - 22
requirements.txt

@@ -1,29 +1,32 @@
-aiofile==3.9.0
-aiofiles==24.1.0
-aiopath
-annotated-types==0.7.0
-anyio
-beautifulsoup4==4.14.0
-caio==0.9.24
-certifi==2025.8.3
-click==8.3.0
+# Web框架
 fastapi==0.104.1
-h11==0.16.0
-httpcore==1.0.9
+uvicorn[standard]==0.24.0
+starlette==0.27.0
+websockets==12.0
+
+# HTTP客户端
 httpx==0.25.2
-idna==3.10
-Jinja2==3.1.6
+httpcore==1.0.9
+
+# 异步文件操作
+aiofiles==24.1.0
+
+# HTML解析
+beautifulsoup4==4.14.0
 lxml==6.0.2
+soupsieve==2.8
+
+# 模板引擎
+Jinja2==3.1.6
 MarkupSafe==3.0.3
+
+# 数据验证
 pydantic==2.11.9
 pydantic_core==2.33.2
-python-multipart==0.0.6
-setuptools==78.1.1
-sniffio==1.3.1
-soupsieve==2.8
-starlette==0.27.0
+
+# 进度条
 tqdm==4.67.1
-typing-inspection==0.4.1
-typing_extensions==4.15.0
-uvicorn==0.24.0
-wheel==0.45.1
+
+# 其他依赖
+python-multipart==0.0.6
+certifi==2025.8.3

+ 40 - 0
start.py

@@ -0,0 +1,40 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+应用启动脚本
+"""
+import sys
+import os
+from pathlib import Path
+
+# 添加项目根目录到Python路径
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+
+from config import config
+from logger import LoggerManager
+import uvicorn
+
+def main():
+    """主函数"""
+    # 设置根日志记录器
+    LoggerManager.setup_root_logger()
+    
+    # 确保数据目录存在
+    config._ensure_directories()
+    
+    print(f"启动 {config.app_name} v{config.app_version}")
+    print(f"服务器地址: http://{config.host}:{config.port}")
+    print(f"调试模式: {'开启' if config.debug else '关闭'}")
+    
+    # 启动服务器
+    uvicorn.run(
+        "main:app",
+        host=config.host,
+        port=config.port,
+        reload=config.debug,
+        log_level=config.log_level.lower()
+    )
+
+if __name__ == "__main__":
+    main()

+ 129 - 24
static/script.js

@@ -11,7 +11,11 @@ class DownloadTool {
         this.clearOutputBtn = document.getElementById('clearOutput');
         this.proxySelect = document.getElementById('proxy');
         
+        this.websocket = null;
+        this.isConnected = false;
+        
         this.initEvents();
+        this.connectWebSocket();
     }
     
     initEvents() {
@@ -46,9 +50,77 @@ class DownloadTool {
         });
     }
     
+    connectWebSocket() {
+        try {
+            const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
+            const wsUrl = `${protocol}//${window.location.host}/ws`;
+            this.websocket = new WebSocket(wsUrl);
+            
+            this.websocket.onopen = () => {
+                this.isConnected = true;
+                this.showOutput('WebSocket连接已建立,可以接收实时日志', 'success');
+                console.log('WebSocket连接已建立');
+            };
+            
+            this.websocket.onmessage = (event) => {
+                try {
+                    const logEntry = JSON.parse(event.data);
+                    this.appendRealtimeLog(logEntry);
+                } catch (e) {
+                    console.error('解析WebSocket消息失败:', e);
+                }
+            };
+            
+            this.websocket.onclose = () => {
+                this.isConnected = false;
+                this.showOutput('WebSocket连接已断开,正在尝试重连...', 'error');
+                console.log('WebSocket连接已断开');
+                // 5秒后尝试重连
+                setTimeout(() => this.connectWebSocket(), 5000);
+            };
+            
+            this.websocket.onerror = (error) => {
+                console.error('WebSocket错误:', error);
+                this.showOutput('WebSocket连接错误', 'error');
+            };
+        } catch (error) {
+            console.error('创建WebSocket连接失败:', error);
+            this.showOutput('WebSocket连接失败', 'error');
+        }
+    }
+    
+    appendRealtimeLog(logEntry) {
+        const timestamp = logEntry.time || new Date().toLocaleTimeString();
+        const level = logEntry.level || 'INFO';
+        const source = logEntry.source || 'system';
+        const message = logEntry.message || '';
+        
+        const logLine = `[${timestamp}] [${level}] [${source}] ${message}`;
+        
+        // 追加到输出框
+        if (this.output.textContent) {
+            this.output.textContent += '\n' + logLine;
+        } else {
+            this.output.textContent = logLine;
+        }
+        
+        // 自动滚动到底部
+        this.output.scrollTop = this.output.scrollHeight;
+        
+        // 根据日志级别设置样式
+        if (level === 'ERROR') {
+            this.output.classList.add('error');
+        } else if (level === 'SUCCESS') {
+            this.output.classList.add('success');
+        } else {
+            this.output.classList.remove('error', 'success');
+        }
+    }
+    
     async loadTargetUrls() {
         try {
-            this.showOutput('正在读取 targets.txt...', '');
+            this.setLoading(true);
+            this.showOutput('正在读取 targets.txt...', 'info');
             
             const response = await fetch('/load_urls', {
                 method: 'POST'
@@ -59,12 +131,14 @@ class DownloadTool {
             if (result.success) {
                 // 在URL列表文本框中显示读取的URL
                 this.urlListTextarea.value = result.urls.join('\n');
-                this.showOutput(`成功读取 ${result.urls.length} 个URL`, 'success');
+                this.showOutput(`成功读取 ${result.urls.length} 个URL\n\nURL列表:\n${result.urls.join('\n')}`, 'success');
             } else {
                 this.showOutput(`读取失败: ${result.message}`, 'error');
             }
         } catch (error) {
             this.showOutput(`读取URL时出错: ${error.message}`, 'error');
+        } finally {
+            this.setLoading(false);
         }
     }
     
@@ -85,33 +159,60 @@ class DownloadTool {
     }
     
     async downloadUrls() {
-        const proxy = this.proxySelect.value;
-    
-        this.showOutput('正在抓取画廊链接...', 'info');
-        const res = await fetch('/download_urls', {
-            method: 'POST',
-            headers: { 'Content-Type': 'application/json' },
-            body: JSON.stringify({ proxy })
-        });
-        const data = await res.json();
-        this.showOutput(data.message, data.success ? 'success' : 'error');
+        try {
+            const proxy = this.proxySelect.value;
+            
+            this.showOutput(`正在抓取画廊链接...\n代理: ${proxy}\n\n注意:此操作可能需要较长时间,请耐心等待...`, 'info');
+            
+            // 使用setTimeout确保UI不被阻塞
+            setTimeout(async () => {
+                try {
+                    const res = await fetch('/download_urls', {
+                        method: 'POST',
+                        headers: { 'Content-Type': 'application/json' },
+                        body: JSON.stringify({ proxy })
+                    });
+                    const data = await res.json();
+                    this.showOutput(data.message, data.success ? 'success' : 'error');
+                } catch (error) {
+                    this.showOutput(`抓取画廊链接时出错: ${error.message}`, 'error');
+                }
+            }, 100);
+            
+        } catch (error) {
+            this.showOutput(`抓取画廊链接时出错: ${error.message}`, 'error');
+        }
     }
 
     async downloadImages() {
-        const proxy = this.proxySelect.value;
-    
-        this.showOutput('正在下载图片...', 'info');
-        const res = await fetch('/download_images', {
-            method: 'POST',
-            headers: { 'Content-Type': 'application/json' },
-            body: JSON.stringify({ proxy })
-        });
-        const data = await res.json();
-        this.showOutput(data.message, data.success ? 'success' : 'error');
+        try {
+            const proxy = this.proxySelect.value;
+            
+            this.showOutput(`正在下载图片...\n代理: ${proxy}\n\n注意:此操作可能需要较长时间,请耐心等待...`, 'info');
+            
+            // 使用setTimeout确保UI不被阻塞
+            setTimeout(async () => {
+                try {
+                    const res = await fetch('/download_images', {
+                        method: 'POST',
+                        headers: { 'Content-Type': 'application/json' },
+                        body: JSON.stringify({ proxy })
+                    });
+                    const data = await res.json();
+                    this.showOutput(data.message, data.success ? 'success' : 'error');
+                } catch (error) {
+                    this.showOutput(`下载图片时出错: ${error.message}`, 'error');
+                }
+            }, 100);
+            
+        } catch (error) {
+            this.showOutput(`下载图片时出错: ${error.message}`, 'error');
+        }
     }
 
     async checkIncomplete() {
         try {
+            this.setLoading(true);
             this.showOutput('正在检查未完成文件...', 'info');
             
             const response = await fetch('/check_incomplete', {
@@ -121,20 +222,22 @@ class DownloadTool {
             const result = await response.json();
             
             if (result.success) {
-                // 这里先显示后端返回的测试数据,等您完成后端逻辑后会返回实际数据
                 let message = `检查完成!\n\n`;
-                message += `返回数据: ${JSON.stringify(result.data, null, 2)}`;
+                message += `${result.data}`;
                 this.showOutput(message, 'success');
             } else {
                 this.showOutput(`检查失败: ${result.message}`, 'error');
             }
         } catch (error) {
             this.showOutput(`检查未完成文件时出错: ${error.message}`, 'error');
+        } finally {
+            this.setLoading(false);
         }
     }
 
     async cleanFiles() {
         try {
+            this.setLoading(true);
             this.showOutput('正在清理日志和JSON文件...', 'info');
             
             const response = await fetch('/clean_files', {
@@ -161,6 +264,8 @@ class DownloadTool {
             }
         } catch (error) {
             this.showOutput(`清理文件时出错: ${error.message}`, 'error');
+        } finally {
+            this.setLoading(false);
         }
     }
 

+ 28 - 20
step1.py

@@ -18,30 +18,28 @@ from typing import Dict, List, Optional
 import httpx
 from bs4 import BeautifulSoup
 from tqdm.asyncio import tqdm_asyncio
-from aiopath import AsyncPath
+from pathlib import Path
 
 # -------------------- 可配置常量 --------------------
-CONCURRENCY = 20                 # 并发页数
-MAX_PAGE = 100                   # 单专辑最大翻页
-RETRY_PER_PAGE = 5               # 单页重试
-TIMEOUT = httpx.Timeout(10.0)    # 请求超时
+from config import config
+
+CONCURRENCY = config.concurrency
+MAX_PAGE = config.max_page
+RETRY_PER_PAGE = config.retry_per_page
+TIMEOUT = httpx.Timeout(config.timeout)
 IMG_SELECTOR = "#gdt"            # 图片入口区域
 FAILED_RECORD = "data/failed_keys.json"
-LOG_LEVEL = logging.INFO
+LOG_LEVEL = getattr(logging, config.log_level.upper())
 # ----------------------------------------------------
 
+# 确保数据目录存在
 if not os.path.exists("data"):
     os.mkdir("data")
 
-logging.basicConfig(
-    level=LOG_LEVEL,
-    format="[%(asctime)s] [%(levelname)s] %(message)s",
-    handlers=[
-        logging.StreamHandler(sys.stdout),
-        logging.FileHandler("data/crawl.log", encoding="utf-8"),
-    ],
-)
-log = logging.getLogger("data/eh_crawler")
+# 使用统一的日志配置
+from logger import get_logger
+from realtime_logger import realtime_logger
+log = get_logger("step1", "crawl.log")
 
 # 预编译正则
 ILLEGAL_CHARS = re.compile(r'[<>:"/\\|?*\x00-\x1F]')
@@ -106,7 +104,7 @@ async def crawl_single_gallery(
         key = base_url.split("/")[-1]  # 用最后一截当 key
         json_name = f"{key}.json"
 
-        folder_path: Optional[AsyncPath] = None
+        folder_path: Optional[Path] = None
         json_data: Dict[str, str] = {}
         img_count = 1
         last_page = False
@@ -122,12 +120,12 @@ async def crawl_single_gallery(
             soup = BeautifulSoup(html, "lxml")
             title = soup.title.string if soup.title else "gallery"
             clean_title = clean_folder_name(title)
-            folder_path = AsyncPath("data/downloads") / clean_title
-            await folder_path.mkdir(parents=True, exist_ok=True)
+            folder_path = Path("data/downloads") / clean_title
+            folder_path.mkdir(parents=True, exist_ok=True)
 
             # 如果 json 已存在则跳过整个画廊
             json_path = folder_path / json_name
-            if await json_path.exists():
+            if json_path.exists():
                 log.info(f"{json_name} 已存在,跳过")
                 return True
 
@@ -152,13 +150,23 @@ async def crawl_single_gallery(
                 img_count += 1
 
         if json_data:
-            await json_path.write_text(
+            json_path.write_text(
                 json.dumps(json_data, ensure_ascii=False, indent=2), encoding="utf-8"
             )
             log.info(f"保存成功 -> {json_path}  ({len(json_data)} 张)")
+            # 发送实时日志
+            try:
+                realtime_logger.broadcast_log_sync(f"画廊 {key} 抓取完成,共 {len(json_data)} 张图片", "SUCCESS", "step1")
+            except Exception as e:
+                log.warning(f"发送实时日志失败: {e}")
             return True
         else:
             log.warning(f"{key} 未解析到任何图片链接")
+            # 发送实时日志
+            try:
+                realtime_logger.broadcast_log_sync(f"画廊 {key} 未解析到任何图片链接", "WARNING", "step1")
+            except Exception as e:
+                log.warning(f"发送实时日志失败: {e}")
             return False
 
 

+ 25 - 22
step2.py

@@ -17,29 +17,27 @@ from typing import Dict, List
 
 import aiofiles
 import httpx
-from aiopath import AsyncPath
+from pathlib import Path
 from tqdm.asyncio import tqdm_asyncio
 
 # -------------------- 可配置常量 --------------------
-CONCURRENCY = 20                 # 并发下载数
-RETRY_PER_IMG = 3                # 单图重试
-TIMEOUT = httpx.Timeout(15.0)    # 请求超时
+from config import config
+
+CONCURRENCY = config.concurrency
+RETRY_PER_IMG = config.retry_per_image
+TIMEOUT = httpx.Timeout(config.image_timeout)
 FAILED_RECORD = "data/failed_downloads.json"
-LOG_LEVEL = logging.INFO
+LOG_LEVEL = getattr(logging, config.log_level.upper())
 # ----------------------------------------------------
 
+# 确保数据目录存在
 if not os.path.exists("data"):
     os.mkdir("data")
 
-logging.basicConfig(
-    level=LOG_LEVEL,
-    format="[%(asctime)s] [%(levelname)s] %(message)s",
-    handlers=[
-        logging.StreamHandler(sys.stdout),
-        logging.FileHandler("data/download.log", encoding="utf-8"),
-    ],
-)
-log = logging.getLogger("data/img_downloader")
+# 使用统一的日志配置
+from logger import get_logger
+from realtime_logger import realtime_logger
+log = get_logger("step2", "download.log")
 
 # 预编译正则
 IMG_URL_RE = re.compile(r'<img id="img" src="(.*?)"', re.S)
@@ -85,18 +83,23 @@ async def download_one(
                 ext = ext_match.group(1).lower() if ext_match else "jpg"
                 final_path = img_path.with_suffix(f".{ext}")
 
-                if await AsyncPath(final_path).exists():
+                if final_path.exists():
                     log.info(f"已存在,跳过: {final_path.name}")
                     return True
 
                 async with client.stream("GET", real_url) as img_resp:
                     img_resp.raise_for_status()
-                    await AsyncPath(final_path).parent.mkdir(parents=True, exist_ok=True)
+                    final_path.parent.mkdir(parents=True, exist_ok=True)
                     async with aiofiles.open(final_path, "wb") as fp:
                         async for chunk in img_resp.aiter_bytes(chunk_size=65536):
                             await fp.write(chunk)
 
                 log.info(f"[OK] {final_path.name}")
+                # 发送实时日志
+                try:
+                    realtime_logger.broadcast_log_sync(f"下载完成: {final_path.name}", "SUCCESS", "step2")
+                except Exception as e:
+                    log.warning(f"发送实时日志失败: {e}")
                 return True
 
             except httpx.HTTPStatusError as exc:
@@ -120,24 +123,24 @@ async def download_one(
 async def scan_tasks() -> List[Dict[str, str]]:
     """扫描 downloads/ 下所有 json,返回待下载列表"""
     result = []
-    root = AsyncPath("data/downloads")
-    if not await root.exists():
+    root = Path("data/downloads")
+    if not root.exists():
         return result
 
-    async for json_path in root.rglob("*.json"):
+    for json_path in root.rglob("*.json"):
         folder = json_path.parent
         try:
-            data: Dict[str, str] = json.loads(await json_path.read_text(encoding="utf-8"))
+            data: Dict[str, str] = json.loads(json_path.read_text(encoding="utf-8"))
         except Exception as exc:
             log.warning(f"读取 json 失败 {json_path} -> {exc}")
             continue
 
         for img_name, img_url in data.items():
             img_path = folder / img_name  # 无后缀
-            # 异步判断任意后缀是否存在
+            # 判断任意后缀是否存在
             exists = False
             for ext in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
-                if await img_path.with_suffix(ext).exists():
+                if img_path.with_suffix(ext).exists():
                     exists = True
                     break
             if not exists:

+ 10 - 6
templates/index.html

@@ -39,7 +39,7 @@
         </form>
         
         <div class="output-section">
-            <h3>以下是一个输出框, 但貌似没啥卵用...</h3>
+            <h3>操作日志</h3>
             <pre id="output" class="output-area"></pre>
         </div>
 
@@ -51,17 +51,21 @@
                 <p><strong>工具使用步骤:</strong></p>
                 <ol>
                     <li>从下拉框选择代理设置(代理配置保存在项目根目录的proxy.txt中)</li>
-                    <li>将URL复制到项目目录下的data/targets.txt中, 一个画廊一个URL</li>
-                    <li>在<a href="https://e-hentai.org/" target="_blank">点解这里</a>获取需要下载的画廊URL</li>
-                    <li><del>不要问什么不直接填到页面, 我懒得写</del></li>
+                    <li>将URL复制到项目目录下的data/targets.txt中,一个画廊一个URL</li>
+                    <li>在<a href="https://e-hentai.org/" target="_blank">这里</a>获取需要下载的画廊URL</li>
                     <li>点击"读取目标URL"加载 targets.txt 中的URL列表</li>
                     <li>点击"下载URL"开始抓取画廊链接</li>
                     <li>点击"下载图片"开始下载图片文件</li>
                     <li>使用"检查未完成"查看下载进度</li>
                     <li>使用"清理日志和JSON文件"清理临时文件</li>
                 </ol>
-                <p><strong>注意:</strong>请确保代理设置正确,且 targets.txt 文件中已添加目标URL。</p>
-                <p><strong>代理配置:</strong>在项目根目录的proxy.txt文件中,每行一个代理地址,格式为 IP:端口</p>
+                <p><strong>注意事项:</strong></p>
+                <ul>
+                    <li>请确保代理设置正确,且 targets.txt 文件中已添加目标URL</li>
+                    <li>代理配置:在项目根目录的proxy.txt文件中,每行一个代理地址,格式为 IP:端口</li>
+                    <li>下载的图片会保存在 data/downloads 目录下,按画廊名称分文件夹存储</li>
+                    <li>如果下载中断,可以重新运行"下载图片"继续未完成的下载</li>
+                </ul>
             </div>
         </div>
     </div>

+ 19 - 11
utils.py

@@ -1,27 +1,35 @@
-# utils.py
-from pathlib import Path
-from typing import List
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+工具函数模块
+"""
+from typing import Optional
 
-import logging
-
-# 把 1step.py 的主逻辑封装成函数
+from logger import get_logger
 from step1 import main as step1_main
 from step2 import main as step2_main
 
-log = logging.getLogger("utils")
+# 设置日志
+logger = get_logger("utils")
 
-async def run_step1(proxy: str | None = None) -> str:
+async def run_step1(proxy: Optional[str] = None) -> str:
+    """执行第一步:抓取画廊链接"""
     try:
+        logger.info("开始执行画廊链接抓取")
         await step1_main(proxy)
+        logger.info("画廊链接抓取完成")
         return "画廊链接抓取完成!"
     except Exception as e:
-        log.exception("step1 执行失败")
+        logger.exception("step1 执行失败")
         return f"抓取失败:{e}"
 
-async def run_step2(proxy: str | None = None) -> str:
+async def run_step2(proxy: Optional[str] = None) -> str:
+    """执行第二步:下载图片"""
     try:
+        logger.info("开始执行图片下载")
         await step2_main(proxy)
+        logger.info("图片下载完成")
         return "图片下载完成!"
     except Exception as e:
-        log.exception("step2 执行失败")
+        logger.exception("step2 执行失败")
         return f"下载失败:{e}"