正确做法是用LEFT JOIN派生表保留最大id:DELETE t1 FROM app_wine t1 LEFT JOIN (SELECT item_id, MAX(id) AS max_id FROM app_wine GROUP BY item_id) t2 ON t1.item_id = t2.item_id AND t1.id = t2.max_id WHERE t2.max_id IS NULL。

查重复数据:GROUP BY + HAVING COUNT(*) > 1 是对的,但只是第一步
这条语句本身没错,SELECT item_id, COUNT(*) FROM app_wine GROUP BY item_id HAVING COUNT(*) > 1 确实能快速定位哪些 item_id 存在重复。但它只告诉你“有重复”,不告诉你“哪条是最新”,更不能直接删旧留新。很多人卡在这一步,误以为查出来就能删——其实这只是诊断,不是治疗。
删旧留新必须绕开 MySQL 的 “You can't specify target table for update in FROM clause” 错误
想直接写 DELETE FROM app_wine WHERE id NOT IN (SELECT MAX(id) FROM app_wine GROUP BY item_id)?MySQL 会报错:You can't specify target table 'app_wine' for update in FROM clause。这是 MySQL 的硬性限制:不能在子查询里直接读写同一张表。
- 正确做法是把子查询结果“垫一层”——用派生表(即加个临时别名)绕过校验
- 推荐写法:
DELETE t1 FROM app_wine t1 LEFT JOIN (SELECT item_id, MAX(id) AS max_id FROM app_wine GROUP BY item_id) t2 ON t1.item_id = t2.item_id AND t1.id = t2.max_id WHERE t2.max_id IS NULL - 等价但更易读的写法(用 NOT IN + 派生表):
DELETE FROM app_wine WHERE id NOT IN (SELECT max_id FROM (SELECT MAX(id) AS max_id FROM app_wine GROUP BY item_id) AS tmp)
按时间字段保留最新,别只认 id —— 除非你确定它严格递增且代表插入顺序
很多业务表的 id 是自增主键,看起来“越大越新”,但要注意:如果存在批量导入、历史补录、或手动指定 id 的情况,MAX(id) 就不可靠。此时应优先依赖明确的时间字段,比如 created_at 或 update_time。
- 用时间保留最新:
SELECT item_id, MAX(created_at) FROM app_wine GROUP BY item_id HAVING COUNT(*) > 1 - 删除时关联时间最大值:
DELETE t1 FROM app_wine t1 INNER JOIN app_wine t2 ON t1.item_id = t2.item_id AND t1.created_at - 注意:这种写法会删掉所有“非最大时间”的记录,包括同时间多条——若存在毫秒级重复且需保一条,得额外加
id辅助去重
执行前务必备份,且先用 SELECT 验证将删哪些行
删操作不可逆。千万别跳过验证步骤。哪怕表不大,也该先跑一遍“模拟删除”的 SELECT:
SELECT * FROM app_wine WHERE id NOT IN (SELECT max_id FROM (SELECT MAX(id) AS max_id FROM app_wine GROUP BY item_id) AS tmp)- 检查返回结果是否全是预期要删的旧数据;确认数量合理(比如重复 50 条,应删 49 条)
- 线上环境建议加
LIMIT分批删(如DELETE ... LIMIT 1000),避免长事务锁表
真正容易被忽略的点是:GROUP BY 去重逻辑依赖字段组合是否完整。比如你想按 item_id + sku 去重,却只写了 GROUP BY item_id,那就会误删本不该合并的记录——动手前,先想清楚“什么是你的业务意义上的‘重复’”。










