무슨 일인가?

맡고있는 팀이 다루는 데이터가 나날히 증가하면서, 
sync를 담당하는 메시지 큐 프로세서가 Full GC를 계속해서 수행하는 이슈가 있었다. 
꽤나 오랫동안 스케일 업이나 스케일 아웃 없이 유지한 서버라서 처음에는 서버 추가로 대응할 까 했지만, 
튜닝할 수 있는 부분은 튜닝하는게 맞다고 생각하고 조사를 시작했다. 

Thread dump를 많이 떠봤지만? 글쎄,,, 

메시지 큐 프로세서 서버의 특성상 수행시간이 오래 걸리는 task들을 수행하는 경우가 많다.
Full GC 문제상황에서 계속해서 thread dump를 떠봤지만, 교착상태나 프로세스를 느리게 만드는 원인을 찾지 못했다. 
가끔씩 hang 걸린 스레드들이 있었으나, 문제 발생시마다 발견되었던 것은 아니기 때문에 원인으로 보지 않았다. 
분석 툴은 tda 를 이용했다. 

조사중에 문제는 더 심해졌으며 약 2주동안 서버를 아침 peak 시간마다 서버를 반복 배포하는 상황이 발생했다.. 

Heap dump를 떠보자! 

GC를 수행해도 메모리를 점유하고 있는 (leak) 이 있는것인가? 약 3일동안 
10초 주기로 힙덤프를 3~5씩 수행하고 비교를 시작했다. 
분석툴은 mat를 이용했다. 
mat 툴 내에서 leak 취약점은 발견되지 않았다. 또한 각 힙덤프 마다 (GC가 여러번 수행됐을 때)
계속해서 메모리를 점유하고 있는 특정 객체가 없었다.

1차 힙덤프
2차 힙덤프
3차 힙덤프

( 이 외에도 덤프 파일은 더 많았지만, 특별한 상황으로 판단할 근거가 없었다. )

 

GC가 어떻게 수행되고 있는건가?

좀 더 자세한 정보가 필요했다. Full GC 전 후 상황을 알 수 있는 로그가 없을 까? 
찾아보니 gc log를 생성하는 jvm 옵션을 걸어놨었고 해당 파일을 참조했다.
또한 실시간으로 문제가 발생할 때는 jstat으로 관찰을 시작했다. 

gc.*.log.0

2022-06-23T15:54:00.948+0900: 275799.929: [Full GC (Ergonomics) 2022-06-23T15:54:01.121+0900: 275800.103: [SoftReference, 13246 refs, 0.0021082 secs]2022-06-23T15:54:01.123+0900: 275800.105: [WeakReference, 15971 refs, 0.0021386 secs]2022-06-23T15:54:01.125+0900: 275800.107: [FinalReference, 2502 refs, 0.0002797 secs]2022-06-23T15:54:01.125+0900: 275800.107: [PhantomReference, 0 refs, 1347 refs, 0.0001439 secs]2022-06-23T15:54:01.125+0900: 275800.107: [JNI Weak Reference, 0.0013016 secs][PSYoungGen: 3156821K->74490K(3932160K)] [ParOldGen: 3145725K->3145524K(3145728K)] 6302547K->3220014K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.7926097 secs] [Times: user=12.14 sys=0.01, real=0.79 secs] 
2022-06-23T15:54:01.741+0900: 275800.723: Total time for which application threads were stopped: 2.7901548 seconds, Stopping threads took: 0.0007279 seconds
2022-06-23T15:54:02.500+0900: 275801.482: [Full GC (Ergonomics) 2022-06-23T15:54:02.661+0900: 275801.643: [SoftReference, 15194 refs, 0.0024719 secs]2022-06-23T15:54:02.664+0900: 275801.646: [WeakReference, 17368 refs, 0.0022696 secs]2022-06-23T15:54:02.666+0900: 275801.648: [FinalReference, 14516 refs, 0.0122559 secs]2022-06-23T15:54:02.678+0900: 275801.660: [PhantomReference, 0 refs, 1346 refs, 0.0001451 secs]2022-06-23T15:54:02.678+0900: 275801.660: [JNI Weak Reference, 0.0013243 secs][PSYoungGen: 2695930K->75865K(3932160K)] [ParOldGen: 3145524K->3145721K(3145728K)] 5841454K->3221586K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 1.0474940 secs] [Times: user=17.54 sys=0.02, real=1.05 secs] 
2022-06-23T15:54:03.549+0900: 275802.530: Total time for which application threads were stopped: 1.0592147 seconds, Stopping threads took: 0.0007225 seconds
2022-06-23T15:54:04.392+0900: 275803.374: [Full GC (Ergonomics) 2022-06-23T15:54:04.578+0900: 275803.559: [SoftReference, 15562 refs, 0.0025315 secs]2022-06-23T15:54:04.580+0900: 275803.562: [WeakReference, 16948 refs, 0.0023067 secs]2022-06-23T15:54:04.582+0900: 275803.564: [FinalReference, 13462 refs, 0.0109742 secs]2022-06-23T15:54:04.593+0900: 275803.575: [PhantomReference, 0 refs, 1348 refs, 0.0001486 secs]2022-06-23T15:54:04.594+0900: 275803.575: [JNI Weak Reference, 0.0013160 secs][PSYoungGen: 2621440K->75734K(3932160K)] [ParOldGen: 3145721K->3145640K(3145728K)] 5767161K->3221375K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.5677562 secs] [Times: user=6.77 sys=0.01, real=0.56 secs] 
2022-06-23T15:54:04.961+0900: 275803.943: Total time for which application threads were stopped: 0.5793442 seconds, Stopping threads took: 0.0007261 seconds
2022-06-23T15:54:05.826+0900: 275804.808: [Full GC (Ergonomics) 2022-06-23T15:54:05.998+0900: 275804.980: [SoftReference, 15214 refs, 0.0023534 secs]2022-06-23T15:54:06.001+0900: 275804.983: [WeakReference, 17447 refs, 0.0022982 secs]2022-06-23T15:54:06.003+0900: 275804.985: [FinalReference, 12894 refs, 0.0107383 secs]2022-06-23T15:54:06.014+0900: 275804.996: [PhantomReference, 0 refs, 1347 refs, 0.0001522 secs]2022-06-23T15:54:06.014+0900: 275804.996: [JNI Weak Reference, 0.0013047 secs][PSYoungGen: 2621440K->77698K(3932160K)] [ParOldGen: 3145640K->3145499K(3145728K)] 5767080K->3223197K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.5904243 secs] [Times: user=7.38 sys=0.00, real=0.59 secs] 
2022-06-23T15:54:06.418+0900: 275805.399: Total time for which application threads were stopped: 0.6022619 seconds, Stopping threads took: 0.0007616 seconds
2022-06-23T15:54:07.527+0900: 275806.509: [Full GC (Ergonomics) 2022-06-23T15:54:07.705+0900: 275806.687: [SoftReference, 13973 refs, 0.0021885 secs]2022-06-23T15:54:07.707+0900: 275806.689: [WeakReference, 17550 refs, 0.0022955 secs]2022-06-23T15:54:07.709+0900: 275806.691: [FinalReference, 12356 refs, 0.0101714 secs]2022-06-23T15:54:07.719+0900: 275806.701: [PhantomReference, 0 refs, 1346 refs, 0.0001466 secs]2022-06-23T15:54:07.720+0900: 275806.701: [JNI Weak Reference, 0.0012965 secs][PSYoungGen: 2621440K->75924K(3932160K)] [ParOldGen: 3145499K->3145277K(3145728K)] 5766939K->3221202K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.5293224 secs] [Times: user=6.00 sys=0.00, real=0.53 secs] 
2022-06-23T15:54:08.057+0900: 275807.039: Total time for which application threads were stopped: 0.5409180 seconds, Stopping threads took: 0.0007210 seconds
2022-06-23T15:54:09.062+0900: 275808.043: [GC (Allocation Failure) 2022-06-23T15:54:09.116+0900: 275808.098: [SoftReference, 0 refs, 0.0000566 secs]2022-06-23T15:54:09.116+0900: 275808.098: [WeakReference, 29 refs, 0.0000224 secs]2022-06-23T15:54:09.116+0900: 275808.098: [FinalReference, 9186 refs, 0.0046865 secs]2022-06-23T15:54:09.121+0900: 275808.102: [PhantomReference, 0 refs, 0 refs, 0.0000102 secs]2022-06-23T15:54:09.121+0900: 275808.102: [JNI Weak Reference, 0.0007166 secs][PSYoungGen: 2621440K->75871K(3932160K)] 5766717K->3221516K(7077888K), 0.0664343 secs] [Times: user=1.23 sys=0.01, real=0.07 secs] 
2022-06-23T15:54:09.128+0900: 275808.110: [Full GC (Ergonomics) 2022-06-23T15:54:09.304+0900: 275808.286: [SoftReference, 14000 refs, 0.0022588 secs]2022-06-23T15:54:09.306+0900: 275808.288: [WeakReference, 17088 refs, 0.0022053 secs]2022-06-23T15:54:09.309+0900: 275808.290: [FinalReference, 2253 refs, 0.0001940 secs]2022-06-23T15:54:09.309+0900: 275808.291: [PhantomReference, 0 refs, 1348 refs, 0.0001491 secs]2022-06-23T15:54:09.309+0900: 275808.291: [JNI Weak Reference, 0.0012363 secs][PSYoungGen: 75871K->74838K(3932160K)] [ParOldGen: 3145645K->3145399K(3145728K)] 3221516K->3220238K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.5451732 secs] [Times: user=6.80 sys=0.00, real=0.55 secs] 
2022-06-23T15:54:09.675+0900: 275808.656: Total time for which application threads were stopped: 0.6236176 seconds, Stopping threads took: 0.0007080 seconds
2022-06-23T15:54:11.208+0900: 275810.190: [Full GC (Ergonomics) 2022-06-23T15:54:11.378+0900: 275810.360: [SoftReference, 12815 refs, 0.0020839 secs]2022-06-23T15:54:11.380+0900: 275810.362: [WeakReference, 16815 refs, 0.0022654 secs]2022-06-23T15:54:11.383+0900: 275810.365: [FinalReference, 13525 refs, 0.0113922 secs]2022-06-23T15:54:11.394+0900: 275810.376: [PhantomReference, 0 refs, 1349 refs, 0.0001458 secs]2022-06-23T15:54:11.394+0900: 275810.376: [JNI Weak Reference, 0.0012956 secs][PSYoungGen: 2696278K->74025K(3932160K)] [ParOldGen: 3145399K->3145709K(3145728K)] 5841678K->3219735K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.5401919 secs] [Times: user=6.26 sys=0.00, real=0.54 secs] 
2022-06-23T15:54:11.749+0900: 275810.731: Total time for which application threads were stopped: 0.5516511 seconds, Stopping threads took: 0.0007110 seconds
2022-06-23T15:54:12.466+0900: 275811.448: [Full GC (Ergonomics) 2022-06-23T15:54:12.655+0900: 275811.637: [SoftReference, 11622 refs, 0.0018615 secs]2022-06-23T15:54:12.657+0900: 275811.639: [WeakReference, 17258 refs, 0.0022578 secs]2022-06-23T15:54:12.660+0900: 275811.641: [FinalReference, 9433 refs, 0.0072564 secs]2022-06-23T15:54:12.667+0900: 275811.649: [PhantomReference, 0 refs, 1349 refs, 0.0001477 secs]2022-06-23T15:54:12.667+0900: 275811.649: [JNI Weak Reference, 0.0013050 secs][PSYoungGen: 2621440K->74705K(3932160K)] [ParOldGen: 3145709K->3145505K(3145728K)] 5767149K->3220210K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.6667728 secs] [Times: user=9.21 sys=0.00, real=0.67 secs] 
2022-06-23T15:54:13.134+0900: 275812.116: Total time for which application threads were stopped: 0.6782087 seconds, Stopping threads took: 0.0007444 seconds
2022-06-23T15:54:14.515+0900: 275813.497: [Full GC (Ergonomics) 2022-06-23T15:54:14.687+0900: 275813.669: [SoftReference, 13840 refs, 0.0022529 secs]2022-06-23T15:54:14.689+0900: 275813.671: [WeakReference, 16513 refs, 0.0021787 secs]2022-06-23T15:54:14.691+0900: 275813.673: [FinalReference, 13791 refs, 0.0115372 secs]2022-06-23T15:54:14.703+0900: 275813.685: [PhantomReference, 0 refs, 1347 refs, 0.0001521 secs]2022-06-23T15:54:14.703+0900: 275813.685: [JNI Weak Reference, 0.0013446 secs][PSYoungGen: 2621440K->74322K(3932160K)] [ParOldGen: 3145505K->3145586K(3145728K)] 5766945K->3219909K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.5750195 secs] [Times: user=7.07 sys=0.00, real=0.57 secs] 
2022-06-23T15:54:15.091+0900: 275814.072: Total time for which application threads were stopped: 0.5866793 seconds, Stopping threads took: 0.0007986 seconds
2022-06-23T15:54:16.324+0900: 275815.306: [Full GC (Ergonomics) 2022-06-23T15:54:16.491+0900: 275815.473: [SoftReference, 12165 refs, 0.0019353 secs]2022-06-23T15:54:16.493+0900: 275815.475: [WeakReference, 17343 refs, 0.0022993 secs]2022-06-23T15:54:16.495+0900: 275815.477: [FinalReference, 13299 refs, 0.0111796 secs]2022-06-23T15:54:16.506+0900: 275815.488: [PhantomReference, 0 refs, 1347 refs, 0.0001478 secs]2022-06-23T15:54:16.506+0900: 275815.488: [JNI Weak Reference, 0.0013418 secs][PSYoungGen: 2621440K->76142K(3932160K)] [ParOldGen: 3145586K->3145327K(3145728K)] 5767026K->3221470K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.6305719 secs] [Times: user=8.26 sys=0.00, real=0.63 secs] 
2022-06-23T15:54:16.956+0900: 275815.938: Total time for which application threads were stopped: 0.6422066 seconds, Stopping threads took: 0.0007193 seconds
2022-06-23T15:54:18.088+0900: 275817.070: [GC (Allocation Failure) 2022-06-23T15:54:18.145+0900: 275817.126: [SoftReference, 0 refs, 0.0000539 secs]2022-06-23T15:54:18.145+0900: 275817.126: [WeakReference, 31 refs, 0.0000178 secs]2022-06-23T15:54:18.145+0900: 275817.126: [FinalReference, 11389 refs, 0.0057581 secs]2022-06-23T15:54:18.150+0900: 275817.132: [PhantomReference, 0 refs, 0 refs, 0.0000107 secs]2022-06-23T15:54:18.150+0900: 275817.132: [JNI Weak Reference, 0.0007090 secs][PSYoungGen: 2621440K->76871K(3932160K)] 5766767K->3222359K(7077888K), 0.0697500 secs] [Times: user=1.26 sys=0.00, real=0.07 secs] 
2022-06-23T15:54:18.158+0900: 275817.140: [Full GC (Ergonomics) 2022-06-23T15:54:18.321+0900: 275817.303: [SoftReference, 14348 refs, 0.0022840 secs]2022-06-23T15:54:18.324+0900: 275817.306: [WeakReference, 17052 refs, 0.0022095 secs]2022-06-23T15:54:18.326+0900: 275817.308: [FinalReference, 2340 refs, 0.0002111 secs]2022-06-23T15:54:18.326+0900: 275817.308: [PhantomReference, 0 refs, 1349 refs, 0.0001466 secs]2022-06-23T15:54:18.326+0900: 275817.308: [JNI Weak Reference, 0.0012911 secs][PSYoungGen: 76871K->75392K(3932160K)] [ParOldGen: 3145487K->3145670K(3145728K)] 3222359K->3221063K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.6015073 secs] [Times: user=8.01 sys=0.00, real=0.60 secs] 
2022-06-23T15:54:18.761+0900: 275817.743: Total time for which application threads were stopped: 0.6834082 seconds, Stopping threads took: 0.0007441 seconds
2022-06-23T15:54:19.991+0900: 275818.972: [Full GC (Ergonomics) 2022-06-23T15:54:20.162+0900: 275819.144: [SoftReference, 14130 refs, 0.0022199 secs]2022-06-23T15:54:20.164+0900: 275819.146: [WeakReference, 17526 refs, 0.0023440 secs]2022-06-23T15:54:20.167+0900: 275819.149: [FinalReference, 13510 refs, 0.0107897 secs]2022-06-23T15:54:20.177+0900: 275819.159: [PhantomReference, 0 refs, 1349 refs, 0.0001465 secs]2022-06-23T15:54:20.178+0900: 275819.160: [JNI Weak Reference, 0.0013428 secs][PSYoungGen: 2696832K->0K(3932160K)] [ParOldGen: 3145670K->3145054K(3145728K)] 5842503K->3145054K(7077888K), [Metaspace: 316669K->316669K(1359872K)], 0.5521212 secs] [Times: user=6.06 sys=0.01, real=0.56 secs] 
2022-06-23T15:54:20.544+0900: 275819.525: Total time for which application threads were stopped: 0.5637847 seconds, Stopping threads took: 0.0007391 seconds
2022-06-23T15:54:21.954+0900: 275820.936: [GC (Allocation Failure) 2022-06-23T15:54:21.969+0900: 275820.951: [SoftReference, 0 refs, 0.0000471 secs]2022-06-23T15:54:21.969+0900: 275820.951: [WeakReference, 21 refs, 0.0000145 secs]2022-06-23T15:54:21.969+0900: 275820.951: [FinalReference, 11852 refs, 0.0053773 secs]2022-06-23T15:54:21.975+0900: 275820.957: [PhantomReference, 0 refs, 0 refs, 0.0000102 secs]2022-06-23T15:54:21.975+0900: 275820.957: [JNI Weak Reference, 0.0007138 secs][PSYoungGen: 2621440K->9033K(3932160K)] 5766494K->3154087K(7077888K), 0.0287361 secs] [Times: user=0.30 sys=0.01, real=0.03 secs] 
2022-06-23T15:54:21.983+0900: 275820.965: Total time for which application threads were stopped: 0.0404609 seconds, Stopping threads took: 0.0007301 seconds
2022-06-23T15:54:23.890+0900: 275822.872: [GC (Allocation Failure) 2022-06-23T15:54:23.907+0900: 275822.889: [SoftReference, 0 refs, 0.0000472 secs]2022-06-23T15:54:23.907+0900: 275822.889: [WeakReference, 31 refs, 0.0000184 secs]2022-06-23T15:54:23.907+0900: 275822.889: [FinalReference, 12855 refs, 0.0059021 secs]2022-06-23T15:54:23.913+0900: 275822.895: [PhantomReference, 0 refs, 0 refs, 0.0000104 secs]2022-06-23T15:54:23.913+0900: 275822.895: [JNI Weak Reference, 0.0007180 secs][PSYoungGen: 2630473K->10823K(3932160K)] 5775527K->3155877K(7077888K), 0.0306921 secs] [Times: user=0.34 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:23.922+0900: 275822.903: Total time for which application threads were stopped: 0.0426840 seconds, Stopping threads took: 0.0007616 seconds
2022-06-23T15:54:25.789+0900: 275824.771: [GC (Allocation Failure) 2022-06-23T15:54:25.805+0900: 275824.787: [SoftReference, 0 refs, 0.0000610 secs]2022-06-23T15:54:25.805+0900: 275824.787: [WeakReference, 42 refs, 0.0000202 secs]2022-06-23T15:54:25.805+0900: 275824.787: [FinalReference, 11981 refs, 0.0053150 secs]2022-06-23T15:54:25.810+0900: 275824.792: [PhantomReference, 0 refs, 0 refs, 0.0000101 secs]2022-06-23T15:54:25.810+0900: 275824.792: [JNI Weak Reference, 0.0007284 secs][PSYoungGen: 2632263K->10025K(3932160K)] 5777317K->3155080K(7077888K), 0.0285799 secs] [Times: user=0.31 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:25.819+0900: 275824.801: Total time for which application threads were stopped: 0.0403066 seconds, Stopping threads took: 0.0007221 seconds
2022-06-23T15:54:27.571+0900: 275826.552: [GC (Allocation Failure) 2022-06-23T15:54:27.586+0900: 275826.568: [SoftReference, 0 refs, 0.0000469 secs]2022-06-23T15:54:27.586+0900: 275826.568: [WeakReference, 47 refs, 0.0000220 secs]2022-06-23T15:54:27.586+0900: 275826.568: [FinalReference, 11512 refs, 0.0050805 secs]2022-06-23T15:54:27.592+0900: 275826.573: [PhantomReference, 0 refs, 0 refs, 0.0000098 secs]2022-06-23T15:54:27.592+0900: 275826.573: [JNI Weak Reference, 0.0007156 secs][PSYoungGen: 2631465K->12134K(3932160K)] 5776520K->3157189K(7077888K), 0.0287580 secs] [Times: user=0.31 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:27.600+0900: 275826.582: Total time for which application threads were stopped: 0.0407009 seconds, Stopping threads took: 0.0007481 seconds
2022-06-23T15:54:29.093+0900: 275828.075: Total time for which application threads were stopped: 0.0113373 seconds, Stopping threads took: 0.0007676 seconds
2022-06-23T15:54:29.356+0900: 275828.337: [GC (Allocation Failure) 2022-06-23T15:54:29.372+0900: 275828.354: [SoftReference, 0 refs, 0.0000490 secs]2022-06-23T15:54:29.372+0900: 275828.354: [WeakReference, 25 refs, 0.0000160 secs]2022-06-23T15:54:29.372+0900: 275828.354: [FinalReference, 11428 refs, 0.0046712 secs]2022-06-23T15:54:29.377+0900: 275828.359: [PhantomReference, 0 refs, 0 refs, 0.0000102 secs]2022-06-23T15:54:29.377+0900: 275828.359: [JNI Weak Reference, 0.0007074 secs][PSYoungGen: 2633574K->11364K(3932160K)] 5778629K->3156419K(7077888K), 0.0289628 secs] [Times: user=0.33 sys=0.01, real=0.02 secs] 
2022-06-23T15:54:29.385+0900: 275828.367: Total time for which application threads were stopped: 0.0407151 seconds, Stopping threads took: 0.0007484 seconds
2022-06-23T15:54:31.560+0900: 275830.542: [GC (Allocation Failure) 2022-06-23T15:54:31.578+0900: 275830.560: [SoftReference, 0 refs, 0.0000473 secs]2022-06-23T15:54:31.578+0900: 275830.560: [WeakReference, 36 refs, 0.0000181 secs]2022-06-23T15:54:31.578+0900: 275830.560: [FinalReference, 13035 refs, 0.0055757 secs]2022-06-23T15:54:31.584+0900: 275830.566: [PhantomReference, 0 refs, 0 refs, 0.0000104 secs]2022-06-23T15:54:31.584+0900: 275830.566: [JNI Weak Reference, 0.0007132 secs][PSYoungGen: 2632804K->12432K(3932160K)] 5777859K->3157486K(7077888K), 0.0315727 secs] [Times: user=0.35 sys=0.01, real=0.03 secs] 
2022-06-23T15:54:31.593+0900: 275830.575: Total time for which application threads were stopped: 0.0439990 seconds, Stopping threads took: 0.0007568 seconds
2022-06-23T15:54:33.228+0900: 275832.210: [GC (Allocation Failure) 2022-06-23T15:54:33.246+0900: 275832.228: [SoftReference, 0 refs, 0.0000466 secs]2022-06-23T15:54:33.246+0900: 275832.228: [WeakReference, 33 refs, 0.0000191 secs]2022-06-23T15:54:33.246+0900: 275832.228: [FinalReference, 12247 refs, 0.0054947 secs]2022-06-23T15:54:33.251+0900: 275832.233: [PhantomReference, 0 refs, 0 refs, 0.0000106 secs]2022-06-23T15:54:33.251+0900: 275832.233: [JNI Weak Reference, 0.0007213 secs][PSYoungGen: 2633872K->11745K(3932160K)] 5778926K->3156800K(7077888K), 0.0306720 secs] [Times: user=0.34 sys=0.01, real=0.03 secs] 
2022-06-23T15:54:33.260+0900: 275832.242: Total time for which application threads were stopped: 0.0427162 seconds, Stopping threads took: 0.0007635 seconds
2022-06-23T15:54:34.984+0900: 275833.966: [GC (Allocation Failure) 2022-06-23T15:54:35.000+0900: 275833.982: [SoftReference, 0 refs, 0.0000471 secs]2022-06-23T15:54:35.000+0900: 275833.982: [WeakReference, 29 refs, 0.0000306 secs]2022-06-23T15:54:35.000+0900: 275833.982: [FinalReference, 11798 refs, 0.0051205 secs]2022-06-23T15:54:35.006+0900: 275833.987: [PhantomReference, 0 refs, 0 refs, 0.0000096 secs]2022-06-23T15:54:35.006+0900: 275833.987: [JNI Weak Reference, 0.0007095 secs][PSYoungGen: 2633185K->12405K(3932160K)] 5778240K->3157459K(7077888K), 0.0293101 secs] [Times: user=0.32 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:35.014+0900: 275833.996: Total time for which application threads were stopped: 0.0412732 seconds, Stopping threads took: 0.0007630 seconds
2022-06-23T15:54:35.794+0900: 275834.776: Total time for which application threads were stopped: 0.0110328 seconds, Stopping threads took: 0.0006979 seconds
2022-06-23T15:54:36.727+0900: 275835.709: [GC (Allocation Failure) 2022-06-23T15:54:36.744+0900: 275835.725: [SoftReference, 0 refs, 0.0000471 secs]2022-06-23T15:54:36.744+0900: 275835.726: [WeakReference, 47 refs, 0.0000208 secs]2022-06-23T15:54:36.744+0900: 275835.726: [FinalReference, 11605 refs, 0.0047551 secs]2022-06-23T15:54:36.748+0900: 275835.730: [PhantomReference, 0 refs, 0 refs, 0.0000099 secs]2022-06-23T15:54:36.748+0900: 275835.730: [JNI Weak Reference, 0.0007143 secs][PSYoungGen: 2633845K->13402K(3932160K)] 5778899K->3158456K(7077888K), 0.0294072 secs] [Times: user=0.33 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:36.757+0900: 275835.739: Total time for which application threads were stopped: 0.0414257 seconds, Stopping threads took: 0.0007846 seconds
2022-06-23T15:54:38.575+0900: 275837.557: [GC (Allocation Failure) 2022-06-23T15:54:38.593+0900: 275837.574: [SoftReference, 0 refs, 0.0000471 secs]2022-06-23T15:54:38.593+0900: 275837.574: [WeakReference, 22 refs, 0.0000154 secs]2022-06-23T15:54:38.593+0900: 275837.574: [FinalReference, 12183 refs, 0.0056325 secs]2022-06-23T15:54:38.598+0900: 275837.580: [PhantomReference, 0 refs, 0 refs, 0.0000113 secs]2022-06-23T15:54:38.598+0900: 275837.580: [JNI Weak Reference, 0.0007396 secs][PSYoungGen: 2634842K->12122K(3932160K)] 5779896K->3157177K(7077888K), 0.0309613 secs] [Times: user=0.34 sys=0.01, real=0.04 secs] 
2022-06-23T15:54:38.607+0900: 275837.589: Total time for which application threads were stopped: 0.0429210 seconds, Stopping threads took: 0.0007807 seconds
2022-06-23T15:54:39.619+0900: 275838.601: Total time for which application threads were stopped: 0.0116050 seconds, Stopping threads took: 0.0007219 seconds
2022-06-23T15:54:40.440+0900: 275839.422: [GC (Allocation Failure) 2022-06-23T15:54:40.456+0900: 275839.438: [SoftReference, 0 refs, 0.0000493 secs]2022-06-23T15:54:40.456+0900: 275839.438: [WeakReference, 26 refs, 0.0000172 secs]2022-06-23T15:54:40.456+0900: 275839.438: [FinalReference, 12527 refs, 0.0054881 secs]2022-06-23T15:54:40.462+0900: 275839.443: [PhantomReference, 0 refs, 0 refs, 0.0000102 secs]2022-06-23T15:54:40.462+0900: 275839.443: [JNI Weak Reference, 0.0007196 secs][PSYoungGen: 2633562K->11706K(3932160K)] 5778617K->3156761K(7077888K), 0.0291793 secs] [Times: user=0.32 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:40.470+0900: 275839.452: Total time for which application threads were stopped: 0.0409766 seconds, Stopping threads took: 0.0007189 seconds
2022-06-23T15:54:42.248+0900: 275841.230: [GC (Allocation Failure) 2022-06-23T15:54:42.265+0900: 275841.247: [SoftReference, 0 refs, 0.0000512 secs]2022-06-23T15:54:42.265+0900: 275841.247: [WeakReference, 28 refs, 0.0000158 secs]2022-06-23T15:54:42.265+0900: 275841.247: [FinalReference, 12115 refs, 0.0045422 secs]2022-06-23T15:54:42.269+0900: 275841.251: [PhantomReference, 0 refs, 0 refs, 0.0000105 secs]2022-06-23T15:54:42.269+0900: 275841.251: [JNI Weak Reference, 0.0007131 secs][PSYoungGen: 2633146K->12509K(3932160K)] 5778201K->3157564K(7077888K), 0.0286606 secs] [Times: user=0.33 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:42.278+0900: 275841.260: Total time for which application threads were stopped: 0.0405765 seconds, Stopping threads took: 0.0007569 seconds
2022-06-23T15:54:42.465+0900: 275841.447: Total time for which application threads were stopped: 0.0114296 seconds, Stopping threads took: 0.0007450 seconds
2022-06-23T15:54:44.359+0900: 275843.341: [GC (Allocation Failure) 2022-06-23T15:54:44.376+0900: 275843.358: [SoftReference, 0 refs, 0.0000445 secs]2022-06-23T15:54:44.376+0900: 275843.358: [WeakReference, 23 refs, 0.0000148 secs]2022-06-23T15:54:44.376+0900: 275843.358: [FinalReference, 12855 refs, 0.0049757 secs]2022-06-23T15:54:44.381+0900: 275843.363: [PhantomReference, 0 refs, 0 refs, 0.0000109 secs]2022-06-23T15:54:44.381+0900: 275843.363: [JNI Weak Reference, 0.0007150 secs][PSYoungGen: 2633949K->11983K(3932160K)] 5779004K->3157038K(7077888K), 0.0289794 secs] [Times: user=0.32 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:44.389+0900: 275843.371: Total time for which application threads were stopped: 0.0408213 seconds, Stopping threads took: 0.0007614 seconds
2022-06-23T15:54:46.053+0900: 275845.035: [GC (Allocation Failure) 2022-06-23T15:54:46.069+0900: 275845.051: [SoftReference, 0 refs, 0.0000475 secs]2022-06-23T15:54:46.069+0900: 275845.051: [WeakReference, 20 refs, 0.0000142 secs]2022-06-23T15:54:46.069+0900: 275845.051: [FinalReference, 12427 refs, 0.0048786 secs]2022-06-23T15:54:46.074+0900: 275845.056: [PhantomReference, 0 refs, 0 refs, 0.0000102 secs]2022-06-23T15:54:46.074+0900: 275845.056: [JNI Weak Reference, 0.0007055 secs][PSYoungGen: 2633423K->11586K(3932160K)] 5778478K->3156641K(7077888K), 0.0289904 secs] [Times: user=0.32 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:46.083+0900: 275845.065: Total time for which application threads were stopped: 0.0406926 seconds, Stopping threads took: 0.0007252 seconds
2022-06-23T15:54:47.007+0900: 275845.989: Total time for which application threads were stopped: 0.0111079 seconds, Stopping threads took: 0.0007446 seconds
2022-06-23T15:54:47.891+0900: 275846.873: [GC (Allocation Failure) 2022-06-23T15:54:47.909+0900: 275846.891: [SoftReference, 0 refs, 0.0000479 secs]2022-06-23T15:54:47.909+0900: 275846.891: [WeakReference, 53 refs, 0.0000231 secs]2022-06-23T15:54:47.909+0900: 275846.891: [FinalReference, 12407 refs, 0.0054633 secs]2022-06-23T15:54:47.914+0900: 275846.896: [PhantomReference, 0 refs, 0 refs, 0.0000100 secs]2022-06-23T15:54:47.914+0900: 275846.896: [JNI Weak Reference, 0.0007263 secs][PSYoungGen: 2633026K->13391K(3932160K)] 5778081K->3158446K(7077888K), 0.0310329 secs] [Times: user=0.35 sys=0.00, real=0.03 secs] 
2022-06-23T15:54:47.923+0900: 275846.905: Total time for which application threads were stopped: 0.0431955 seconds, Stopping threads took: 0.0007912 seconds
2022-06-23T15:54:49.825+0900: 275848.807: [GC (Allocation Failure) 2022-06-23T15:54:49.843+0900: 275848.825: [SoftReference, 0 refs, 0.0000542 secs]2022-06-23T15:54:49.843+0900: 275848.825: [WeakReference, 41 refs, 0.0000172 secs]2022-06-23T15:54:49.843+0900: 275848.825: [FinalReference, 12455 refs, 0.0056528 secs]2022-06-23T15:54:49.849+0900: 275848.831: [PhantomReference, 0 refs, 0 refs, 0.0000100 secs]2022-06-23T15:54:49.849+0900: 275848.831: [JNI Weak Reference, 0.0007026 secs]--[PSYoungGen: 2634831K->2634831K(3932160K)] 5779886K->5780555K(7077888K), 0.7459946 secs] [Times: user=0.92 sys=0.20, real=0.75 secs] 
2022-06-23T15:54:50.572+0900: 275849.554: [Full GC (Ergonomics) 2022-06-23T15:54:50.735+0900: 275849.717: [SoftReference, 13093 refs, 0.0021188 secs]2022-06-23T15:54:50.737+0900: 275849.719: [WeakReference, 16413 refs, 0.0021708 secs]2022-06-23T15:54:50.739+0900: 275849.721: [FinalReference, 2416 refs, 0.0007444 secs]2022-06-23T15:54:50.740+0900: 275849.722: [PhantomReference, 0 refs, 1349 refs, 0.0001461 secs]2022-06-23T15:54:50.740+0900: 275849.722: [JNI Weak Reference, 0.0013205 secs][PSYoungGen: 2634831K->0K(3932160K)] [ParOldGen: 3145723K->3144544K(3145728K)] 5780555K->3144544K(7077888K), [Metaspace: 316673K->316673K(1359872K)], 0.5460908 secs] [Times: user=6.62 sys=0.01, real=0.55 secs] 
2022-06-23T15:54:51.119+0900: 275850.101: Total time for which application threads were stopped: 1.3043852 seconds, Stopping threads took: 0.0007068 seconds
2022-06-23T15:54:51.954+0900: 275850.936: [GC (Allocation Failure) 2022-06-23T15:54:51.972+0900: 275850.954: [SoftReference, 0 refs, 0.0000527 secs]2022-06-23T15:54:51.972+0900: 275850.954: [WeakReference, 38 refs, 0.0000168 secs]2022-06-23T15:54:51.972+0900: 275850.954: [FinalReference, 11670 refs, 0.0048020 secs]2022-06-23T15:54:51.977+0900: 275850.958: [PhantomReference, 0 refs, 0 refs, 0.0000101 secs]2022-06-23T15:54:51.977+0900: 275850.958: [JNI Weak Reference, 0.0007102 secs][PSYoungGen: 2621440K->12922K(3932160K)] 5765984K->3157466K(7077888K), 0.0299901 secs] [Times: user=0.35 sys=0.00, real=0.03 secs]

자세한 로그를 확인 요약한 상황은 이랬다.

서버 스펙 : 
-Xms8g
-Xmx8g
-XX:NewSize=5g
-XX:MaxNewSize=5g
-XX:SurvivorRatio=2
-XX:+UseParallelGC
-XX:+UseParallelOldGC

1. full gc가 0.5초마다 발생함
2. full gc가 발생함에도 oldGeneration 영역의 메모리가 확보되지 않음.
3. minor gc또한 매우 빈번하게 발생하여, eden 영역의 객체들이 매우 빠르게 survivor 영역으로 넘어감. 



추론

위의 상황 및 여러 글들을 참조해서 추론을 시작했다. 

요청량 자체가 최근 급증하여 gc를 수행하는 순간에도 계속해서 힙영역이 full로 차게 되는 것 같다. 
young 영역에서 eden:survivor 비율 ,  new : old 영역의 비율을 조정해보거나, GC 알고리즘을 변경해보자. 
건드려 보려는 옵션은 다음과 같았다. 
참조 : https://d2.naver.com/helloworld/37111

1. Parallel GC

-XX:NewSize
-XX:MaxNewSize
-XX:SurvivorRatio
-XX:newRatio

ParallelGC 옵션

2. G1 GC

-XX:+UseG1GC
-XX:MaxGCPauseMillis=100

large memory 어플리케이션은 g1 gc 성능이 높다는 내용이 많았다. 


https://www.site24x7.com/learn/java/choosing-garbage-collector.html

결론

사실 플랫폼 내의 서버군들은 서버 메모리를 증량하면서 G1GC로 이미 넘어간 상황이었다.
리더 님과 팀내 의논 끝에 1번 보다는 2번으로 진행해보기로 했다. 

 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   

 0.0   20480.0  0.0   20480.0 4055040.0 2396160.0 4313088.0  2978559.6  333440.0 301559.8 42624.0 35543.8    588   27.014   0      0.000   27.014
 0.0   24576.0  0.0   24576.0 4046848.0 196608.0 4317184.0  2977576.3  333440.0 301560.8 42624.0 35543.8    589   27.053   0      0.000   27.053
 0.0   24576.0  0.0   24576.0 4046848.0 1810432.0 4317184.0  2977576.3  333440.0 301560.8 42624.0 35543.8    589   27.053   0      0.000   27.053
 0.0   24576.0  0.0   24576.0 4046848.0 3661824.0 4317184.0  2977576.3  333440.0 301560.8 42624.0 35543.8    589   27.053   0      0.000   27.053
 0.0   24576.0  0.0   24576.0 4046848.0 1560576.0 4317184.0  2978842.9  333440.0 301568.2 42624.0 35543.8    590   27.092   0      0.000   27.092
 0.0   24576.0  0.0   24576.0 4046848.0 3334144.0 4317184.0  2978842.9  333440.0 301568.2 42624.0 35543.8    590   27.092   0      0.000   27.092
 0.0   24576.0  0.0   24576.0 4046848.0 1294336.0 4317184.0  2979806.6  333440.0 301619.2 42624.0 35544.9    591   27.132   0      0.000   27.132
 0.0   24576.0  0.0   24576.0 4046848.0 3194880.0 4317184.0  2979806.6  333440.0 301619.2 42624.0 35544.9    591   27.132   0      0.000   27.132
 0.0   28672.0  0.0   28672.0 4038656.0 1146880.0 4321280.0  2979939.8  333440.0 301643.3 42624.0 35544.9    592   27.170   0      0.000   27.170
 0.0   28672.0  0.0   28672.0 4038656.0 2764800.0 4321280.0  2979939.8  333440.0 301643.3 42624.0 35544.9    592   27.170   0      0.000   27.170
 0.0   28672.0  0.0   28672.0 4038656.0 671744.0 4321280.0  2977753.6  333440.0 301647.8 42624.0 35544.9    593   27.209   0      0.000   27.209
 0.0   28672.0  0.0   28672.0 4038656.0 2260992.0 4321280.0  2977753.6  333440.0 301647.8 42624.0 35544.9    593   27.209   0      0.000   27.209
 0.0   28672.0  0.0   28672.0 4038656.0 3842048.0 4321280.0  2977753.6  333440.0 301647.8 42624.0 35544.9    594   27.209   0      0.000   27.209
 0.0   24576.0  0.0   24576.0 4046848.0 2064384.0 4317184.0  2980564.3  333440.0 301662.9 42624.0 35547.2    594   27.245   0      0.000   27.245
 0.0   28672.0  0.0   28672.0 4038656.0 208896.0 4321280.0  2977347.4  333440.0 301667.0 42624.0 35547.2    595   27.283   0      0.000   27.283
 0.0   28672.0  0.0   28672.0 4038656.0 2273280.0 4321280.0  2977347.4  333440.0 301667.0 42624.0 35547.2    595   27.283   0      0.000   27.283
 0.0   28672.0  0.0   28672.0 4038656.0 372736.0 4321280.0  2979605.4  333440.0 301668.0 42624.0 35547.2    596   27.319   0      0.000   27.319
 0.0   28672.0  0.0   28672.0 4038656.0 2260992.0 4321280.0  2979605.4  333440.0 301668.0 42624.0 35547.2    596   27.319   0      0.000   27.319
 0.0   24576.0  0.0   24576.0 4042752.0 204800.0 4321280.0  2982796.6  333440.0 301668.7 42624.0 35547.2    597   27.360   0      0.000   27.360
 0.0   24576.0  0.0   24576.0 4042752.0 1998848.0 4321280.0  2982796.6  333440.0 301668.7 42624.0 35547.2    597   27.360   0      0.000   27.360
 0.0   24576.0  0.0   24576.0 4042752.0 3846144.0 4321280.0  2982796.6  333440.0 301668.7 42624.0 35547.2    598   27.360   0      0.000   27.360
 0.0   24576.0  0.0   24576.0 4042752.0 1527808.0 4321280.0  2980284.7  333440.0 301670.6 42624.0 35547.2    598   27.396   0      0.000   27.396
 0.0   24576.0  0.0   24576.0 4042752.0 3158016.0 4321280.0  2980284.7  333440.0 301670.6 42624.0 35547.2    598   27.396   0      0.000   27.396
 0.0   24576.0  0.0   24576.0 4042752.0 1163264.0 4321280.0  2980986.3  333440.0 301693.9 42624.0 35547.2    599   27.434   0      0.000   27.434

peak 시간에도 Full GC는 발생하지 않았으며, 더불어 MQ 메시지가 밀리는 상황도 어느정도 개선되었다. (Full GC 로 인한 STW가 줄었기 때문인 듯)
이정도 단계에서 문제가 해결되어서 더 조사는 해보지 않았다. 
결론은 GC 튜닝은 각각의 특수한 상황에 따라 케이스가 매우 많기 때문에, 여러 자료를 참고 정도밖에 할 수 없었고, 맨 땅에 헤딩해야 한다는 것이다. 

GC 특성상 평균 메모리 점유량이 2.X -> 4.X로 증가했다.

 

'백엔드 > java' 카테고리의 다른 글

Open Feign  (0) 2024.12.26

+ Recent posts