系统频繁重启问题排查
问题描述
某日下午,测试反馈测试环境系统不能使用,经查看服务监控发现服务不断重启数十次。
排查过程
使用 Arthas 对系统进行分析。首先,查看所有运行中的线程。
thread --state RUNNABLE
发现有几个定时任务执行CPU使用率过高,合计占用70%+。
打印显示CPU使用率过高的线程的运行栈。结合代码分析,异常原因。初步判断,可能是执行批量更新过程出现问题。
但是结合业务数据量,正常情况下数据量应该不多。因为系统反复重启无法直接通过系统查看数据量,直接连接数据库查询,发现数据量异常。正常情况下批量执行数据小于100,发现当前实际执行10w数据更新。到这里可以确定问题点就在这里。
分析最近开发的功能需求,最终定位到问题:拉取数据时,过滤逻辑和原来不一样,没有过滤原有数据,每次定时任务执行时,都会复制现有的记录,导致数据不断膨胀。
thread 130
线程运行栈内容具体如下:
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.PrintStream.write(PrintStream.java:480)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.write(PrintStream.java:527)
at java.io.PrintStream.print(PrintStream.java:669)
at java.io.PrintStream.println(PrintStream.java:806)
at com.baomidou.mybatisplus.extension.p6spy.StdoutLogger.logText(StdoutLogger.java:31)
at com.p6spy.engine.spy.appender.FormattedLogger.logSQL(FormattedLogger.java:35)
at com.p6spy.engine.common.P6LogQuery.doLog(P6LogQuery.java:121)
at com.p6spy.engine.common.P6LogQuery.doLogElapsed(P6LogQuery.java:91)
at com.p6spy.engine.common.P6LogQuery.logElapsed(P6LogQuery.java:203)
at com.p6spy.engine.logging.LoggingEventListener.logElapsed(LoggingEventListener.java:107)
at com.p6spy.engine.logging.LoggingEventListener.onAfterAnyAddBatch(LoggingEventListener.java:64)
at com.p6spy.engine.event.SimpleJdbcEventListener.onAfterAddBatch(SimpleJdbcEventListener.java:151)
at com.p6spy.engine.event.CompoundJdbcEventListener.onAfterAddBatch(CompoundJdbcEventListener.java:103)
at com.p6spy.engine.wrapper.PreparedStatementWrapper.addBatch(PreparedStatementWrapper.java:382)
at sun.reflect.GeneratedMethodAccessor704.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:59)
at com.sun.proxy.$Proxy504.addBatch(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.batch(PreparedStatementHandler.java:58)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.batch(RoutingStatementHandler.java:69)
at sun.reflect.GeneratedMethodAccessor703.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
at com.sun.proxy.$Proxy490.batch(Unknown Source)
at org.apache.ibatis.executor.BatchExecutor.doUpdate(BatchExecutor.java:77)
at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
at sun.reflect.GeneratedMethodAccessor684.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:49)
at com.baomidou.mybatisplus.extension.plugins.MybatisPlusInterceptor.intercept(MybatisPlusInterceptor.java:106)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy489.update(Unknown Source)
at sun.reflect.GeneratedMethodAccessor684.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
at com.sun.proxy.$Proxy489.update(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197)
at com.baomidou.mybatisplus.extension.service.impl.ServiceImpl.lambda$updateBatchById$3(ServiceImpl.java:194)
at com.baomidou.mybatisplus.extension.service.impl.ServiceImpl$$Lambda$2191/703036573.accept(Unknown Source)
at com.baomidou.mybatisplus.extension.toolkit.SqlHelper.lambda$executeBatch$0(SqlHelper.java:211)
at com.baomidou.mybatisplus.extension.toolkit.SqlHelper$$Lambda$2188/793575531.accept(Unknown Source)
at com.baomidou.mybatisplus.extension.toolkit.SqlHelper.executeBatch(SqlHelper.java:175)
at com.baomidou.mybatisplus.extension.toolkit.SqlHelper.executeBatch(SqlHelper.java:207)
at com.baomidou.mybatisplus.extension.service.impl.ServiceImpl.executeBatch(ServiceImpl.java:239)
at com.baomidou.mybatisplus.extension.service.impl.ServiceImpl.updateBatchById(ServiceImpl.java:191)
at com.baomidou.mybatisplus.extension.service.IService.updateBatchById(IService.java:177)
at com.baomidou.mybatisplus.extension.service.IService$$FastClassBySpringCGLIB$$f8525d18.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.transaction.interceptor.TransactionInterceptor$$Lambda$2113/1586949210.proceedWithInvocation(Unknown Source)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:367)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:118)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at com.baomidou.dynamic.datasource.aop.DynamicDataSourceAnnotationInterceptor.invoke(DynamicDataSourceAnnotationInterceptor.java:51)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691)
// 省略其他日志
调整代码后,验证结论是否正常。如图,左侧为系统反复重启CPU与内存变化;右侧为问题修复后,系统CPU与内存变化。
复盘过程
排查思路:
- 打印正在运行的线程情况,并打印CPU使用率高的线程栈。
- 线程运行堆栈结合代码,分析异常可能存在问题的代码。
- 异常代码结合业务需求,分析可能出现问题原因。
- 修复问题,并验证效果。
优化流程:
可以直接,如下(命令),一键展示****当前最忙的前 N 个线程并打印堆栈:
thread -n 3
评论