跳转至

02 openshift 部分节点无法exec 进入后台和不能进入日志界面

openshift 不能exec 进入后台, 和不能logs 查看日志

问题描述

openshift 不能进入后台, 也不能查看日志, 具体报错有如下几类

1、 get 日志时有报错websocket 错误

WebSocket connection to 'wss://console-openshift-console.apps.example.com/api/kubernetes/api/v1/namespaces/openshift-console/pods/console-79b6c7bb87-gt2ck/log?container=console&follow=true&tailLines=1000&x-csrf-token=ESx4l2bhkAyUQ8nx9f0%2FmA3qThlJEI6IOptYX2N%2FSPBDwcQuQ1K91DDjT0I3J99QYF4rogNwgleVtq6FV%2BkL7Q%3D%3D' failed: Error during WebSocket handshake: Unexpected response code: 500

2、 在命令行中, exec, logs 和rsh 工具都在报错

$ oc logs console-79b6c7bb87-gt2ck

Error from server: Get https://master0.example.com:10250/containerLogs/openshift-console/console-79b6c7bb87-gt2ck/console: remote error: tls: internal error

3、 在集群安装好之后有大量的csr在pending

$ oc exec marketplace-operator-768b99959-9pftm -n openshift-marketplace -- echo foo
Error from server: error dialing backend: remote error: tls: internal error

$ oc logs marketplace-operator-768b99959-9pftm -n openshift-marketplace
Error from server: Get https://master:10250/containerLogs/openshift-marketplace/marketplace-operator-768b99959-9pftm/marketplace-operator: remote error: tls: internal error

4、kube-apiserver 也会有一些日志报错信息

$ sudo crictl ps | grep kube-api
239ec13eeaf4e       beaf65fce4dc16947c5bd5d1ca7e16313234c393e8ca1c4251ac9b85094972bb   About an hour ago   Running             kube-apiserver-operator                   3                   bd197ceb6f882
6f2bdcab072ca       beaf65fce4dc16947c5bd5d1ca7e16313234c393e8ca1c4251ac9b85094972bb   About an hour ago   Running             kube-apiserver-cert-syncer-8              1                   6938a6ebc2c3d
e6b9db2994d07       0d8dcfc307048a0f0400e644fcd1c9929018103b15d0f9b23b4841f1e71937bc   About an hour ago   Running             kube-apiserver-8                          1                   6938a6ebc2c3d

$ sudo crictl logs e6b9db2994d07
...
E0725 17:38:54.707552       1 status.go:64] apiserver received an error that is not an metav1.Status: &url.Error{Op:"Get", URL:"https://master:10250/containerLogs/openshift-kube-apiserver/kube-apiserver-master/kube-apiserver-8", Err:(*net.OpError)(0xc01ec89270)}
...

解决思路

1、 批准所有pending的csr

$ oc get csr -o name | xargs oc adm certificate approve

根因

1、This has been noted in related Bugzilla: BZ-1733331

诊断步骤

检查所有的pending 的csr 和没有批准的csr ;

$ oc get csr
NAME        AGE     REQUESTOR                          CONDITION
csr-22bmr   4h45m   system:node:node0.example.com      Pending
csr-22dln   21h     system:node:master1.example.com    Pending