coreos - Kubernetes pods crash after a few hours, restarting kubelet fixes -


i'm running insecure test kubernetes v1.7.5 in bare metal setup running coreos 1409.7.0. i've installed api-server, controller, scheduler, proxy , kubelet on master node, , kubelet , proxy on 3 other worker nodes, flanneld using systemd service files provided in contrib/init k8s project.

everything running when cluster starts up. can deploy dashboard , deploys i've customized (consul clients/server, nginx, etc) , work great. however, if leave cluster running few hours come , every pod in crashloopbackup, being restarted many times. thing solves problem restart kubelet on each machine. problem goes away , goes normal.

logs kubelet after it's gone bad state:

sep 10 19:09:06 k8-app-2.example.com kubelet[1025]: , failed "startcontainer" "nginx-server" crashloopbackoff: "back-off 5m0s restarting failed container=nginx-server pod=nginx-deployment-617048525-mgf0v_default(f6dff9f2-95db-11e7-b533-02c75fb65df0)" sep 10 19:09:06 k8-app-2.example.com kubelet[1025]: ] sep 10 19:09:07 k8-app-2.example.com kubelet[1025]: i0910 19:09:07.286367    1025 kuberuntime_manager.go:457] container {name:nginx-server image:nginx command:[] args:[] workingdir: ports:[{name:http hostport:0 containerport:80 protocol:tcp hostip:}] envfrom:[] env:[{name:node_ip value: valuefrom:&envvarsource{fieldref:&objectfieldselector{apiversion:v1,fieldpath:status.hostip,},resourcefieldref:nil,configmapkeyref:nil,secretkeyref:nil,}} {name:pod_ip value: valuefrom:&envvarsource{fieldref:&objectfieldselector{apiversion:v1,fieldpath:status.podip,},resourcefieldref:nil,configmapkeyref:nil,secretkeyref:nil,}}] resources:{limits:map[] requests:map[]} volumemounts:[] livenessprobe:&probe{handler:handler{exec:nil,httpget:&httpgetaction{path:/,port:80,host:,scheme:http,httpheaders:[],},tcpsocket:nil,},initialdelayseconds:10,timeoutseconds:1,periodseconds:10,successthreshold:1,failurethreshold:3,} readinessprobe:nil lifecycle:nil terminationmessagepath:/dev/termination-log terminationmessagepolicy:file imagepullpolicy:always securitycontext:nil stdin:false stdinonce:false tty:false} dead, restartpolicy says should restart it. sep 10 19:09:07 k8-app-2.example.com kubelet[1025]: i0910 19:09:07.286795    1025 kuberuntime_manager.go:457] container {name:regup image:registry.hub.docker.com/spunon/regup:latest command:[] args:[] workingdir: ports:[] envfrom:[] env:[{name:service_name value:nginx valuefrom:nil} {name:service_port value:80 valuefrom:nil} {name:node_ip value: valuefrom:&envvarsource{fieldref:&objectfieldselector{apiversion:v1,fieldpath:status.hostip,},resourcefieldref:nil,configmapkeyref:nil,secretkeyref:nil,}} {name:pod_ip value: valuefrom:&envvarsource{fieldref:&objectfieldselector{apiversion:v1,fieldpath:status.podip,},resourcefieldref:nil,configmapkeyref:nil,secretkeyref:nil,}}] resources:{limits:map[] requests:map[]} volumemounts:[] livenessprobe:nil readinessprobe:nil lifecycle:nil terminationmessagepath:/dev/termination-log terminationmessagepolicy:file imagepullpolicy:always securitycontext:nil stdin:false stdinonce:false tty:false} dead, restartpolicy says should restart it. sep 10 19:09:07 k8-app-2.example.com kubelet[1025]: i0910 19:09:07.287071    1025 kuberuntime_manager.go:741] checking backoff container "nginx-server" in pod "nginx-deployment-617048525-mgf0v_default(f6dff9f2-95db-11e7-b533-02c75fb65df0)" sep 10 19:09:07 k8-app-2.example.com kubelet[1025]: i0910 19:09:07.287376    1025 kuberuntime_manager.go:751] back-off 5m0s restarting failed container=nginx-server pod=nginx-deployment-617048525-mgf0v_default(f6dff9f2-95db-11e7-b533-02c75fb65df0) sep 10 19:09:07 k8-app-2.example.com kubelet[1025]: i0910 19:09:07.287601    1025 kuberuntime_manager.go:741] checking backoff container "regup" in pod "nginx-deployment-617048525-mgf0v_default(f6dff9f2-95db-11e7-b533-02c75fb65df0)" sep 10 19:09:07 k8-app-2.example.com kubelet[1025]: i0910 19:09:07.287863    1025 kuberuntime_manager.go:751] back-off 5m0s restarting failed container=regup pod=nginx-deployment-617048525-mgf0v_default(f6dff9f2-95db-11e7-b533-02c75fb65df0) 

edit: here logs kubelet when issue seems start


Comments