istio-init容器Init:CrashLoopBackOff故障解决

最近在测试istio时,经常发现注入过sidecar的pod过段时间就变成了Init:CrashLoopBackOff状态。如:

[root@sh-saas-k8s1-master-dev-01 ~]# kubectl get pod --all-namespaces -o wide | grep  'Init'
public-ops-tomcat-dev           public-ops-dubbo-demo-web-tomcat-dev-79f758dcf-64qwr              0/2     Init:CrashLoopBackOff   7          21h     10.253.3.166   10.12.97.23   <none>           <none>

我的kubernetes版本为1.14.10,istio版本为:1.5.1
查看istio-init容器的日志,发现有如下的报错:

[root@sh-saas-k8s1-master-dev-01 ~]# kubectl logs -n public-ops-tomcat-dev           public-ops-dubbo-demo-web-tomcat-dev-79f758dcf-64qwr istio-init 
Environment:
------------
ENVOY_PORT=
INBOUND_CAPTURE_PORT=
ISTIO_INBOUND_INTERCEPTION_MODE=
ISTIO_INBOUND_TPROXY_MARK=
ISTIO_INBOUND_TPROXY_ROUTE_TABLE=
ISTIO_INBOUND_PORTS=
ISTIO_LOCAL_EXCLUDE_PORTS=
ISTIO_SERVICE_CIDR=
ISTIO_SERVICE_EXCLUDE_CIDR=

Variables:
----------
PROXY_PORT=15001
PROXY_INBOUND_CAPTURE_PORT=15006
PROXY_UID=1337
PROXY_GID=1337
INBOUND_INTERCEPTION_MODE=REDIRECT
INBOUND_TPROXY_MARK=1337
INBOUND_TPROXY_ROUTE_TABLE=133
INBOUND_PORTS_INCLUDE=*
INBOUND_PORTS_EXCLUDE=15090,15020
OUTBOUND_IP_RANGES_INCLUDE=*
OUTBOUND_IP_RANGES_EXCLUDE=
OUTBOUND_PORTS_EXCLUDE=
KUBEVIRT_INTERFACES=
ENABLE_INBOUND_IPV6=false

Writing following contents to rules file:  /tmp/iptables-rules-1588923880490327697.txt562915423
* nat
-N ISTIO_REDIRECT
-N ISTIO_IN_REDIRECT
-N ISTIO_INBOUND
-N ISTIO_OUTPUT
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-port 15001
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-port 15006
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -o lo -s 127.0.0.6/32 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
COMMIT

iptables-restore --noflush /tmp/iptables-rules-1588923880490327697.txt562915423
iptables-restore: line 2 failed
iptables-save 
# Generated by iptables-save v1.6.1 on Fri May  8 07:44:40 2020
*mangle
:PREROUTING ACCEPT [643414:2344563772]
:INPUT ACCEPT [643414:2344563772]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [616124:4267707048]
:POSTROUTING ACCEPT [616124:4267707048]
COMMIT
# Completed on Fri May  8 07:44:40 2020
# Generated by iptables-save v1.6.1 on Fri May  8 07:44:40 2020
*raw
:PREROUTING ACCEPT [643414:2344563772]
:OUTPUT ACCEPT [616124:4267707048]
COMMIT
# Completed on Fri May  8 07:44:40 2020
# Generated by iptables-save v1.6.1 on Fri May  8 07:44:40 2020
*nat
:PREROUTING ACCEPT [38474:2000648]
:INPUT ACCEPT [40999:2131948]
:OUTPUT ACCEPT [7987:560379]
:POSTROUTING ACCEPT [8763:600731]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A ISTIO_OUTPUT -s 127.0.0.6/32 -o lo -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
COMMIT
# Completed on Fri May  8 07:44:40 2020
# Generated by iptables-save v1.6.1 on Fri May  8 07:44:40 2020
*filter
:INPUT ACCEPT [643414:2344563772]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [616124:4267707048]
COMMIT
# Completed on Fri May  8 07:44:40 2020
panic: exit status 1

goroutine 1 [running]:
istio.io/istio/tools/istio-iptables/pkg/dependencies.(*RealDependencies).RunOrFail(0xd819c0, 0x9739b8, 0x10, 0xc00000cbc0, 0x2, 0x2)
        istio.io/istio@/tools/istio-iptables/pkg/dependencies/implementation.go:44 +0x96
istio.io/istio/tools/istio-iptables/pkg/cmd.(*IptablesConfigurator).executeIptablesRestoreCommand(0xc000109d30, 0x7faeecd9a001, 0x0, 0x0)
        istio.io/istio@/tools/istio-iptables/pkg/cmd/run.go:474 +0x3aa
istio.io/istio/tools/istio-iptables/pkg/cmd.(*IptablesConfigurator).executeCommands(0xc000109d30)
        istio.io/istio@/tools/istio-iptables/pkg/cmd/run.go:481 +0x45
istio.io/istio/tools/istio-iptables/pkg/cmd.(*IptablesConfigurator).run(0xc000109d30)
        istio.io/istio@/tools/istio-iptables/pkg/cmd/run.go:428 +0x24e2
istio.io/istio/tools/istio-iptables/pkg/cmd.glob..func1(0xd5c740, 0xc0000ee700, 0x0, 0x10)
        istio.io/istio@/tools/istio-iptables/pkg/cmd/root.go:56 +0x14e
github.com/spf13/cobra.(*Command).execute(0xd5c740, 0xc00001e130, 0x10, 0x11, 0xd5c740, 0xc00001e130)
        github.com/spf13/cobra@v0.0.5/command.go:830 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0xd5c740, 0x40574f, 0xc00009e058, 0x0)
        github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v0.0.5/command.go:864
istio.io/istio/tools/istio-iptables/pkg/cmd.Execute()
        istio.io/istio@/tools/istio-iptables/pkg/cmd/root.go:284 +0x2d
main.main()
        istio.io/istio@/tools/istio-iptables/main.go:22 +0x20

从报错来看,istio-init容器重启了,并且在重启时,执行iptables报错。这就奇怪了,正常来说,init容器是执行完成后就结束,并不会再次执行的。

把pod删除,让其重新创建一个新的pod,发现是可以正常启动的。

Pod 重启,会导致 Init 容器重新执行,主要有如下几个原因:

  • 用户更新 PodSpec 导致 Init 容器镜像发生改变。应用容器镜像的变更只会重启应用容器。
  • Pod 基础设施容器被重启。这不多见,但某些具有 root 权限可访问 Node 的人可能会这样做。
  • 当 restartPolicy 设置为 Always,Pod 中所有容器会终止,强制重启,由于垃圾收集导致 Init 容器完整的记录丢失。

基于以上原因,怀疑是我定时清理docker磁盘导致的。我定时清理方式如下:

[root@sh-saas-k8s1-node-dev-03 ~]# crontab -l
00 05 * * 4 docker system prune -a -f

于是手动又执行了一次docker system prune -a -f,发现果然istio-init容器又重启报错了。

网上查了一下,发现已经有人提过issue了.
https://github.com/istio/istio/issues/19717
https://github.com/kubernetes/kubernetes/issues/67261

发生的原因主要是init容器是执行完后就退出的,也就是是一个停止的容器。

[root@sh-saas-k8s1-node-dev-05 ~]# docker ps -a | grep init
b23d4bfc0f52        82f719eb65c1                                                                                               "istio-iptables -p 1…"   22 hours ago        Exited (0) 22 hours ago                         k8s_istio-init_public-fe-zhan-node-dev-66dc985977-7n5rx_public-fe-node-dev_949d261c-904c-11ea-8278-5254001a47d3_0
33036e9212de        82f719eb65c1                                                                                               "istio-iptables -p 1…"   22 hours ago        Exited (0) 22 hours ago                         k8s_istio-init_public-fe-zhan-client-v2-node-dev-cbc9586cc-4wn5v_public-fe-node-dev_76e18eb4-904c-11ea-8278-5254001a47d3_0

执行docker system prune -a -f清理会把已经停止的容器清理掉。kubelet发现这个容器被清理掉后,又把这个init容器给重启了。目前看来还没有fix这个问题。

不过可以通过以下方式规避:

docker system prune -af --volumes --filter "label!=io.kubernetes.container.name=istio-init"
#或:
docker image prune -af