之前在《Kubernetes初体验》中我们使用Minikube快速体验了一把Kubernetes,然后在《Kubernetes架构及资源关系简单总结》一文中我们又简单介绍了Kubernetes的框架以及Kubernetes中的一些关键术语和概念,或者称之为资源、对象。本文主要讲Kubernetes的一种原始部署方式。Kubernetes从开发至今,其部署方式已经变得越来越简单。常见的有三种:
- 最简单的就是使用Minikube方式。下载一个二进制文件即可拥有一个单机版的Kubernetes,而且支持各个平台。
- 从源码安装。这种方式也是简单的进行一些配置,然后执行
kube-up.sh
就可以部署一个Kubernetes集群。可参见官方文档《Manually Deploying Kubernetes on Ubuntu Nodes》。PS:目前,该文档部署Kubernetes 1.5.3版本会有些问题,可关注#39224。 - 通过kubeadm部署。可参见官方文档《Installing Kubernetes on Linux with kubeadm》
其实,除了上面三种方式外,有些Linux发行版已经提供了Kubernetes的安装包,比如在CentOS 7上面,直接执行yum install -y etcd kubernetes
即可安装Kubernetes,然后做些配置就可以完成部署了。我相信对于Google这种追求自动化、智能化的公司,他们会让Kubernetes部署方式还会更加简化。但这些都不是本文的重点,本文要讲述的是如何像堆积木一样一个模块一个模块的部署Kubernetes。为什么要这样做?
为了更好的理解学习Kubernetes。前面我们已经简单介绍过Kubernetes的架构,知道它其实是由几大模块组成,各个模块间合作构成一个集群。现在简单化的部署方式屏蔽了很多细节,使得我们对于各个模块的感知少了很多。而且很容器觉得Kubernetes的内部部署细节非常的麻烦或者复杂,但其实并非如此,其实Kubernetes集群就是由为数不多的几个二进制文件组成,部署一个基本的集群也非难事。因为是使用Go开发的,这些二进制文件也没有任何依赖,从别的地方拷贝过来就可使用。本文就介绍如何从这些二进制文件搭建一个Kubernetes集群, 以加深对Kubernetes的理解。而且,其他部署方式其实也只是对这种方式的一种封装。
现在Systemd逐渐替代了Upstart,有的部署方式也只支持Systemd的Linux发行版,如果是Upstart,还得做适配。至于什么是Systemd和Upstart,不是本文要讨论的,后续会总结发出来。这里我使用的Linux发行版是Ubuntu 16.04。当然Ubuntu 15.04+的都使用的是Systemd,应该都是适用的,其他使用Systemd的系统应该也是适用的,但可能需要做些小的改动。另外,前文介绍了Kubernetes集群分为Master和Node,所以我们部署也一样,分为Master的部署和Node的部署。
我的环境是用Virtualbox虚拟了两台Ubuntu 16.04,虚拟机和主机的通信方式是NAT和host-only方式。NAT用于访问外网,host-only用于两台虚拟机之间访问,IP分别为192.168.56.101和192.168.56.102.其中101这台机器机器既是Master,又是Node;102是Node。本文只装了101,后面再测试网络等需要多台的时候再安装102.因为Kubernetes里面Node是主动向Master注册的(通过Node上面的kubelet),所以要扩展Node的话也非常容易。
获取二进制文件
我们都需要哪些二进制文件呢?回想一下《Kubernetes架构及资源关系简单总结》中,Kubernetes集群内主要包含这些模块:Master中:APIServer、scheduler、controller manager、etcd;Node中:kubelet、kube-proxy、runtime(这里指Docker)。
上面每个模块都由一个二进制文件实现,所以我们需要上面每个模块对应的那个二进制文件。获取方式有很多。最直观的方式就是去github上面下载release的包,里面有二进制文件。但那个包有1GB+大小,特别对于中国用户就呵呵了,当然还有许多其他获取的方式。
注意:源代码目录里面也有很多名字和二进制名字相同的文件,但那些不是二进制文件,而是一些去掉后缀的shell脚步,都只有KB级别的大小,而真正的二进制文件都是MB级别的,注意别搞错了。
推荐使用下面的命令下载kubernetes-server-linux-amd64.tar.gz
包:
curl -L https://storage.googleapis.com/kubernetes-release/release/v${KUBE_VERSION}/kubernetes-server-linux-amd64.tar.gz -o kubernetes-server-linux-amd64.tar.gz
这个包解压后的kubernetes/server/bin
目录下就有我们需要的二进制文件(只使用了其中6个):
ubuntu➜ bin ll
total 1.3G
-rwxr-x--- 1 root root 145M Dec 14 09:06 hyperkube
-rwxr-x--- 1 root root 118M Dec 14 09:06 kube-apiserver
-rw-r----- 1 root root 33 Dec 14 09:06 kube-apiserver.docker_tag
-rw-r----- 1 root root 119M Dec 14 09:06 kube-apiserver.tar
-rwxr-x--- 1 root root 97M Dec 14 09:06 kube-controller-manager
-rw-r----- 1 root root 33 Dec 14 09:06 kube-controller-manager.docker_tag
-rw-r----- 1 root root 98M Dec 14 09:06 kube-controller-manager.tar
-rwxr-x--- 1 root root 6.6M Dec 14 09:06 kube-discovery
-rwxr-x--- 1 root root 44M Dec 14 09:05 kube-dns
-rwxr-x--- 1 root root 44M Dec 14 09:05 kube-proxy
-rw-r----- 1 root root 33 Dec 14 09:06 kube-proxy.docker_tag
-rw-r----- 1 root root 174M Dec 14 09:06 kube-proxy.tar
-rwxr-x--- 1 root root 51M Dec 14 09:06 kube-scheduler
-rw-r----- 1 root root 33 Dec 14 09:06 kube-scheduler.docker_tag
-rw-r----- 1 root root 52M Dec 14 09:06 kube-scheduler.tar
-rwxr-x--- 1 root root 91M Dec 14 09:06 kubeadm
-rwxr-x--- 1 root root 49M Dec 14 09:06 kubectl
-rwxr-x--- 1 root root 46M Dec 14 09:06 kubefed
-rwxr-x--- 1 root root 103M Dec 14 09:06 kubelet
我将这些二进制文件都放到了/opt/bin
目录下,并且将该目录加到了PATH中。你也可以直接将这些文件放到系统的PATH路径中,比如/usr/bin
。
OK,有了这些二进制文件,我们就可以开始部署了。
部署Master
前文介绍过,Master上面主要四个模块:APIServer、scheduler、controller manager、etcd,我们一一来部署。
部署etcd
我建议直接使用apt install etcd
命令去安装,这样同时也会安装etcdctl。安装完后etcd的数据默认存储在/var/lib/etcd/default
目录,默认配置文件为/etc/default/etcd
,可通过/lib/systemd/system/etcd.service
文件进行修改。
2017.9.4更新
Kubernets新版本(我记得好像是1.6开始吧,记不清了)已经不支持etcd 2.x版本了,但是在Ubuntu 16.04上面通过apt install
装的是2.2版本,这样会导致api-server无法和etcd通讯,而导致一些问题,所以建议从github下载最新etcd 3.x(https://github.com/coreos/etcd/releases),然后手动安装。创建/lib/systemd/system/etcd.service
文件:
[Unit]
Description=Etcd Server
Documentation=https://github.com/coreos/etcd
After=network.target
[Service]
User=root
Type=simple
EnvironmentFile=-/etc/default/etcd
ExecStart=/opt/k8s/v1_6_9/etcd-v3.2.7-linux-amd64/etcd # 改为你自己路径
Restart=on-failure
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
安装好以后,执行以下命令:
# 重新加载systemd配置管理,切记增加`*.service`后一定要先执行该命令,否则启动服务时会报错
systemctl daemon-reload
systemctl enable etcd.service # 将etcd加到开机启动列表中
systemctl start etcd.service # 启动etcd
安装好以后,etcd默认监听http://127.0.0.1:2379
地址供客户端连接。我们可以使用etcdctl
来检查etcd是否正确启动:
ubuntu➜ bin etcdctl cluster-health
member ce2a822cea30bfca is healthy: got healthy result from http://localhost:2379
cluster is healthy
可以看到运行正常。当然,部署多台的话,因为所有Node都需要访问etcd,所以etcd必须要监听在其他Node可以访问的IP上面才可以,在/etc/default/etcd
中增加以下两行:
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"
重启etcd即可使etcd在所有IP上起监听。
部署APIServer
APIServer对应的二进制文件是kube-apiserver
,我们先来设置systemd服务文件/lib/systemd/system/kube-apiserver.service
:
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes
After=etcd.service
Wants=etcd.service
[Service]
EnvironmentFile=/etc/kubernetes/apiserver
ExecStart=/opt/bin/kube-apiserver $KUBE_API_ARGS
Restart=on-failure
Type=notify
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
重点项简单说明:
- kube-apiserver服务依赖etcd,所以设置了
After
。 EnvironmentFile
是该服务的配置文件。ExecStart
说明如何启动该服务。
我们看到kube-apiserver
的启动参数为$KUBE_API_ARGS
,我们在配置文件/etc/kubernetes/apiserver
中定义这个环境变量:
KUBE_API_ARGS="--etcd_servers=http://127.0.0.1:2379 --insecure-bind-address=0.0.0.0 --insecure-port=8080 --service-cluster-ip-range=169.169.0.0/16 --service-node-port-range=1-65535 --admission_control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ResourceQuota --logtostderr=false --log-dir=/var/log/kubernetes --v=2"
选项说明:
--etcd_servers
:就是etcd的地址。--insecure-bind-address
:apiserver绑定主机的非安全IP地址,设置0.0.0.0
表示绑定所有IP地址。--insecure-port
:apiserver绑定主机的非安全端口,默认为8080。--service-cluster-ip-range
:Kubernetes集群中Service的虚拟IP地址段范围,以CIDR格式表示,该IP范围不能与物理机真实IP段有重合。-service-node-port-range
:Kubernetes集群中Service可映射的物理机端口范围,默认为30000~32767.--admission_control
: Kubernetes集群的准入控制设置,各控制模块以插件形式依次生效。--logtostderr
:设置为false表示将日志写入文件,不写入stderr。--log-dir
: 日志目录。--v
:日志级别。
OK,APIServer的部署配置完成了,其实主要分两部分:
- 创建systemd服务文件,有了该文件,就可以使用systemd去控制该服务,比如启停、开机自启等。systemd的命令、语法等后面写文章介绍。
- 模块的配置文件,用于控制模块如何启动及功能控制。
后面其他模块的配置与之大同小异。
部署controller manager
controller manager对应的二进制文件是kube-controller-manager
,且该服务依赖于kube-apiserver。
依旧先配置systemd的服务文件/lib/systemd/system/kube-controller-manager.service
:
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes
After=kube-apiserver.service
Requires=kube-apiserver.service
[Service]
EnvironmentFile=/etc/kubernetes/controller-manager
ExecStart=/opt/bin/kube-controller-manager $KUBE_CONTROLLER_MANAGER_ARGS
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
在/etc/kubernetes/controller-manager
中设置$KUBE_CONTROLLER_MANAGER_ARGS
:
KUBE_CONTROLLER_MANAGER_ARGS="--master=http://192.168.56.101:8080 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"
--master
指的是APIServer的地址。
部署scheduler
scheduler对应的二进制文件是kube-scheduler
,scheduler依赖于APIServer。
配置systemd服务文件/lib/systemd/system/kube-scheduler.service
:
[Unit]
Description=Kubernetes Scheduler Manager
Documentation=https://github.com/kubernetes
After=kube-apiserver.service
Requires=kube-apiserver.service
[Service]
EnvironmentFile=/etc/kubernetes/scheduler
ExecStart=/opt/bin/kube-scheduler $KUBE_SCHEDULER_ARGS
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
在配置文件/etc/kubernetes/scheduler
中设置$KUBE_SCHEDULER_ARGS
:
KUBE_SCHEDULER_ARGS="--master=http://192.168.56.101:8080 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"
至此,Master上面的四个模块都部署完了,我们按照顺序启动他们,并将其加入到开机自启动选项中:
# 重新加载systemd配置管理,切记增加`*.service`后一定要先执行该命令,否则启动服务时会报错
systemctl daemon-reload
# enable表示该服务开机自启,start表示启动该服务
systemctl enable kube-apiserver.service
systemctl start kube-apiserver.service
systemctl enable kube-controller-manager.service
systemctl start kube-controller-manager.service
systemctl enable kube-scheduler.service
systemctl start kube-scheduler.service
然后我们分别运行systemctl status <service_name>
来验证服务的状态,“running”表示启动成功。如果未成,也可看到错误日志。
部署Node
Node上面运行三个模块:kubelet、kube-proxy、runtime。其中runtime目前指的是docker或者rkt,这里我们使用docker,docker的安装这里就不赘述了,最好安装最新版本的docker。
部署kubelet
kubelet对应的二进制文件是kubelet
,且其依赖Docker服务。
配置systemd服务文件/lib/systemd/system/kubelet.service
:
[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=/etc/kubernetes/kubelet
ExecStart=/opt/bin/kubelet $KUBELET_ARGS
Restart=on-failure
[Install]
WantedBy=multi-user.target
在配置文件/etc/kubernetes/kubelet
中设置参数$KUBELET_ARGS
KUBELET_ARGS="--api-servers=http://192.168.56.101:8080 --hostname-override=192.168.56.101 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"
其中--hostname-override
设置本Node的名称。
部署kube-proxy
kube-proxy对应的二进制文件为kube-proxy
,且该服务依赖于network
服务。
配置systemd服务文件/lib/systemd/system/kube-proxy.service
:
[Unit]
Description=Kubernetes Kube-Proxt Server
Documentation=https://github.com/kubernetes
After=network.target
Requires=network.target
[Service]
EnvironmentFile=/etc/kubernetes/proxy
ExecStart=/opt/bin/kube-proxy $KUBE_PROXY_ARGS
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
在配置文件/etc/kubernetes/proxy
中设置参数$KUBE_PROXY_ARGS
KUBE_PROXY_ARGS="--master=http://192.168.56.101:8080 --logtostderr=false --log-dir=/var/log/kubernetes --v=2"
然后我们依次启动Node上的服务(Docker安装好以后默认开机自启且已经启动,这里不再启动):
systemctl daemon-reload
systemctl enable kubelet.service
systemctl start kubelet.service
systemctl enable kube-proxy.service
systemctl start kube-proxy.service
待服务都成功启动后,kubelet会主动向Master注册自己所在的Node。如果所有服务都启动成功,我们就可以看到可用的Node了:
ubuntu➜ system kubectl get node
NAME STATUS AGE
192.168.56.101 Ready 1h
再在另外一台Node上面也部署一下,就可以看到两个节点了。
至此,本文就介绍完了。不过要应用到生产环境中,我们还有一些安全项和网络项需要配置,后面再介绍。
[root@k8s-master1 etcd-v3.3.10]# systemctl status kube-apiserver.service
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/usr/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since 四 2018-12-13 01:02:42 CST; 1min 5s ago
12月 13 01:02:41 k8s-master1 systemd[1]: kube-apiserver.service failed to run 'start' task: Is a directory
12月 13 01:02:41 k8s-master1 systemd[1]: Failed to start Kubernetes API Server.
12月 13 01:02:41 k8s-master1 systemd[1]: Unit kube-apiserver.service entered failed state.
12月 13 01:02:41 k8s-master1 systemd[1]: kube-apiserver.service failed.
12月 13 01:02:41 k8s-master1 systemd[1]: Starting Kubernetes API Server...
12月 13 01:02:42 k8s-master1 systemd[1]: kube-apiserver.service holdoff time over, scheduling restart.
12月 13 01:02:42 k8s-master1 systemd[1]: start request repeated too quickly for kube-apiserver.service
12月 13 01:02:42 k8s-master1 systemd[1]: Failed to start Kubernetes API Server.
12月 13 01:02:42 k8s-master1 systemd[1]: Unit kube-apiserver.service entered failed state.
12月 13 01:02:42 k8s-master1 systemd[1]: kube-apiserver.service failed.
是什么问题呢?怎么解决呢?
没有日志,光这个看不出来问题呀....
@all 因为换工作原因,已经很久不搞Kubernetes了,而K8s发展特别快,所以本文的一些方式对于新版本可能已经不太适用了,但因为没有尝试过新版本,所以很多问题无法解答,抱歉。
运行etcd后变成这样了,一直保持这样不动!
2018-03-27 17:10:59.476632 I | etcdmain: etcd Version: 3.0.4
2018-03-27 17:10:59.477839 I | etcdmain: Git SHA: d53923c
2018-03-27 17:10:59.478332 I | etcdmain: Go Version: go1.6.3
2018-03-27 17:10:59.478870 I | etcdmain: Go OS/Arch: linux/amd64
2018-03-27 17:10:59.479432 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
2018-03-27 17:10:59.480142 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2018-03-27 17:10:59.481491 I | etcdmain: listening for peers on http://localhost:2380
2018-03-27 17:10:59.482292 I | etcdmain: listening for client requests on localhost:2379
2018-03-27 17:10:59.487576 I | etcdserver: name = default
2018-03-27 17:10:59.488298 I | etcdserver: data dir = default.etcd
2018-03-27 17:10:59.488821 I | etcdserver: member dir = default.etcd/member
2018-03-27 17:10:59.489250 I | etcdserver: heartbeat = 100ms
2018-03-27 17:10:59.489669 I | etcdserver: election = 1000ms
2018-03-27 17:10:59.490900 I | etcdserver: snapshot count = 10000
2018-03-27 17:10:59.491330 I | etcdserver: advertise client URLs = http://localhost:2379
2018-03-27 17:10:59.491860 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2018-03-27 17:10:59.492152 I | etcdserver: initial cluster = default=http://localhost:2380
2018-03-27 17:10:59.496832 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2018-03-27 17:10:59.497220 I | raft: 8e9e05c52164694d became follower at term 0
2018-03-27 17:10:59.497466 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-03-27 17:10:59.497709 I | raft: 8e9e05c52164694d became follower at term 1
2018-03-27 17:10:59.505315 I | etcdserver: starting server... [version: 3.0.4, cluster version: to_be_decided]
2018-03-27 17:10:59.506764 I | membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-03-27 17:10:59.699176 I | raft: 8e9e05c52164694d is starting a new election at term 1
2018-03-27 17:10:59.699880 I | raft: 8e9e05c52164694d became candidate at term 2
2018-03-27 17:10:59.700375 I | raft: 8e9e05c52164694d received vote from 8e9e05c52164694d at term 2
2018-03-27 17:10:59.700841 I | raft: 8e9e05c52164694d became leader at term 2
2018-03-27 17:10:59.701247 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2018-03-27 17:10:59.702513 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
2018-03-27 17:10:59.703305 I | etcdmain: ready to serve client requests
2018-03-27 17:10:59.704032 I | etcdserver: setting up the initial cluster version to 3.0
2018-03-27 17:10:59.704782 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-03-27 17:10:59.708561 N | membership: set the initial cluster version to 3.0
2018-03-27 17:10:59.711522 I | api: enabled capabilities for version 3.0
2018-03-27 17:10:59.712014 N | etcdmain: serving insecure client requests on localhost:2379, this is strongly discouraged!
请问手动安装etcd是下载好后直接./etcd就行吗?卡在这里不动了
2018-03-27 17:05:45.947682 I | etcdmain: etcd Version: 3.0.4
2018-03-27 17:05:45.949403 I | etcdmain: Git SHA: d53923c
2018-03-27 17:05:45.949915 I | etcdmain: Go Version: go1.6.3
2018-03-27 17:05:45.950393 I | etcdmain: Go OS/Arch: linux/amd64
2018-03-27 17:05:45.950793 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1
2018-03-27 17:05:45.951203 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2018-03-27 17:05:45.951659 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-03-27 17:05:45.952481 I | etcdmain: listening for peers on http://localhost:2380
2018-03-27 17:05:45.953019 I | etcdmain: listening for client requests on localhost:2379
2018-03-27 17:05:45.954994 I | etcdserver: name = default
2018-03-27 17:05:45.955506 I | etcdserver: data dir = default.etcd
2018-03-27 17:05:45.955919 I | etcdserver: member dir = default.etcd/member
2018-03-27 17:05:45.956340 I | etcdserver: heartbeat = 100ms
2018-03-27 17:05:45.956743 I | etcdserver: election = 1000ms
2018-03-27 17:05:45.957158 I | etcdserver: snapshot count = 10000
2018-03-27 17:05:45.957566 I | etcdserver: advertise client URLs = http://localhost:2379
2018-03-27 17:05:45.959619 I | etcdserver: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 419
2018-03-27 17:05:45.960496 I | raft: 8e9e05c52164694d became follower at term 4
2018-03-27 17:05:45.960948 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 4, commit: 419, applied: 0, lastindex: 419, lastterm: 4]
2018-03-27 17:05:45.963913 I | etcdserver: starting server... [version: 3.0.4, cluster version: to_be_decided]
2018-03-27 17:05:45.965996 I | membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-03-27 17:05:45.966871 N | membership: set the initial cluster version to 3.0
2018-03-27 17:05:45.967384 I | api: enabled capabilities for version 3.0
2018-03-27 17:05:46.762603 I | raft: 8e9e05c52164694d is starting a new election at term 4
2018-03-27 17:05:46.762756 I | raft: 8e9e05c52164694d became candidate at term 5
2018-03-27 17:05:46.762788 I | raft: 8e9e05c52164694d received vote from 8e9e05c52164694d at term 5
2018-03-27 17:05:46.762838 I | raft: 8e9e05c52164694d became leader at term 5
2018-03-27 17:05:46.762870 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 5
2018-03-27 17:05:46.764991 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster cdf818194e3a8c32
2018-03-27 17:05:46.768356 I | etcdmain: ready to serve client requests
2018-03-27 17:05:46.771749 N | etcdmain: serving insecure client requests on localhost:2379, this is strongly discouraged!
2018-03-27 17:05:46.776294 E | etcdmain: forgot to set Type=notify in systemd service file?
请问我按照以上文章部署,有几个问题想咨询下您,1.您在编辑完apiserver.service时,/etc/kubernetes下对应的配置文件那里来的?此处我是从其他地方拷贝过来了一份配置文件。2.我在配置完成后系统服务时apiserver、controller-manager 、scheduler 服务都报错了 /var/log/messages 日志显示如下报错:
Failed at step USER spawning /usr/bin/kube-apiserver: No such process
Mar 15 15:18:08 localhost systemd: kube-apiserver.service: main process exited, code=exited, status=217/USER
Mar 15 15:18:08 localhost systemd: Failed to start Kubernetes API Server.
Mar 15 15:18:08 localhost systemd: Unit kube-apiserver.service entered failed state.
Mar 15 15:18:08 localhost systemd: kube-apiserver.service failed.
Mar 15 15:18:09 localhost systemd: kube-apiserver.service holdoff time over, scheduling restart.
请问是怎么回事?
我的k8s集群启动的时候是可以正常kubectl get nodes的,但是kubernetes controller manager显示kube-controller-manager:Start request repeated too quickly.Failed to start kubernetes Controller Manager。这导致pod不能分发到node上,有什么解决办法么?
启动报错kubelet,
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: inactive (dead) (Result: exit-code) since 五 2017-12-22 15:12:35 CST; 4s ago
Process: 2363 ExecStart=/opt/bin/kubelet $KUBELET_ARGS (code=exited, status=203/EXEC)
Main PID: 2363 (code=exited, status=203/EXEC)
12月 22 15:12:34 node1 systemd[1]: kubelet.service: Unit entered failed state.
12月 22 15:12:34 node1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
12月 22 15:12:35 node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
12月 22 15:12:35 node1 systemd[1]: Stopped Kubernetes Kubelet Server.
12月 22 15:12:35 node1 systemd[1]: kubelet.service: Start request repeated too quickly.
12月 22 15:12:35 node1 systemd[1]: Failed to start Kubernetes Kubelet Server.
请问这个问题您解决了吗?我也遇到了!
你好,据我的实践和查阅文档,好像kubelet的--api-servers参数已经在后面版本取消了,不知道从哪个版本开始的,我实践的1.8.2已经不让用了,好像是用--kubeconfig指定config里说明,然后那个config是由kubectl config生成的,我理解的是这样,但是没验证成功。。。我想楼上几个应该也是因为这个服务没起来吧。
我查看了一下 是1。8.11版本
是啊,我也是这一步卡住了,但是我的报错信息是 Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled
Drop-In: /etc/systemd/system/kubelet.service.d
Active: inactive (dead) (Result: exit-code) since 四 2018-04-12 22:37:43 CST; 59s ago
我也不知道是什么版本,反正是下载的最新的kubernetes,希望哪位大神可以帮忙答疑,万分感谢
root@master:~# systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: inactive (dead) (Result: exit-code) since Tue 2017-12-05 22:51:37 CST; 3s ago
Process: 3148 ExecStart=/opt/bin/kubelet $KUBELET_ARGS (code=exited, status=2)
Main PID: 3148 (code=exited, status=2)
Dec 05 22:51:37 master kubelet[3148]: --tls-private-key-file string File containing x509 private key matching --tls-cert-file.
Dec 05 22:51:37 master kubelet[3148]: -v, --v Level log level for V logs
Dec 05 22:51:37 master kubelet[3148]: --version version[=true] Print version information and quit
Dec 05 22:51:37 master kubelet[3148]: --vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
Dec 05 22:51:37 master kubelet[3148]: --volume-plugin-dir string <Warning: Alpha feature> The full path of the directory in which to sea
Dec 05 22:51:37 master kubelet[3148]: --volume-stats-agg-period duration Specifies interval for kubelet to calculate and cache the volume disk u
Dec 05 22:51:37 master systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Dec 05 22:51:37 master systemd[1]: Stopped Kubernetes Kubelet Server.
Dec 05 22:51:37 master systemd[1]: kubelet.service: Start request repeated too quickly.
Dec 05 22:51:37 master systemd[1]: Failed to start Kubernetes Kubelet Server.
你看看是不是没有关闭交换空间.运行swapoff -a 然后再启动试试
检查一下KUBELET_ARGS,感觉好像是配置参数有问题,有些不认识
根据作者的文章安装kubernetes时在kubelet上总是出现如下错误,请给予指点,谢谢!
systemctl status kubelet
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: inactive (dead) (Result: exit-code) since Thu 2017-11-30 10:15:32 CST; 37s ago
Main PID: 1615 (code=exited, status=200/CHDIR)
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Nov 30 10:15:32 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Start request repeated too quickly.
Nov 30 10:15:32 u3 systemd[1]: Failed to start Kubernetes Kubelet Server.
cat /var/log/syslog
Nov 30 10:15:30 u3 systemd[1602]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
Nov 30 10:15:30 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
Nov 30 10:15:30 u3 systemd[1]: kubelet.service: Unit entered failed state.
Nov 30 10:15:30 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1606]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1609]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1612]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Nov 30 10:15:31 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1]: Started Kubernetes Kubelet Server.
Nov 30 10:15:31 u3 systemd[1615]: kubelet.service: Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Main process exited, code=exited, status=200/CHDIR
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Unit entered failed state.
Nov 30 10:15:31 u3 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Nov 30 10:15:32 u3 systemd[1]: Stopped Kubernetes Kubelet Server.
Nov 30 10:15:32 u3 systemd[1]: kubelet.service: Start request repeated too quickly.
Nov 30 10:15:32 u3 systemd[1]: Failed to start Kubernetes Kubelet Server.
Nov 30 10:15:38 u3 systemd[1]: Reloading.
Nov 30 10:15:38 u3 systemd[1]: apt-daily.timer: Adding 6h 28min 458.731ms random time.
Nov 30 10:15:38 u3 systemd[1]: Started ACPI event daemon.
Nov 30 10:15:44 u3 systemd[1]: Started Kubernetes Kube-Proxt Server.
Nov 30 10:15:44 u3 kernel: [ 1410.389218] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
Nov 30 10:15:44 u3 kernel: [ 1410.389235] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
Nov 30 10:15:44 u3 kernel: [ 1410.389264] IPVS: Creating netns size=2192 id=0
Nov 30 10:15:44 u3 kernel: [ 1410.389364] IPVS: ipvs loaded.
Nov 30 10:15:44 u3 systemd[1]: Started Kubernetes systemd probe.
Nov 30 10:17:01 u3 CRON[1746]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
错误信息里面已经说了:Failed at step CHDIR spawning /usr/bin/kubelet: No such file or directory,命令找不到
WorkingDirectory目录没有创建
用下面命令创建,再运行kubelet.service就OK了
mkdir -p /var/lib/kubelet
感谢,解决了我的问题
楼主,这个systemctl 命令是ubuntu16才有的吗?ubuntu14.04是不是不支持?
对,Ubuntu 15.04以前的版本都是upstart,用
service
命令管理服务,从15.04及以后才改成systemd系统管理服务,对应的命令是systemctl
,当然为了向前兼容,service
命令依旧可以用楼主,systemctl 这个命令是不是在ubuntu16有,而14.04没有
在14.04上安装kubernetes,还好有这个guide,非常感谢!
客气~
遇到了小问题, 还请楼主指导!
我最近在搭k8s集群,配置node的kubelet服务的时候,遇到无法启动的错误!我想问一下,启动kubelet.service失败,可能会有哪些导致的?
看日志, 很可能是一些配置文件有问题
有没有好的 k8s有没有好的容器和宿主机一体化监控方案 ,最好包含报警功能?
你可以在Google下
rpc error: code = 13 desc = transport is closing kuberneter 启动apiserver 报这个错 神马原因 端口都是启动的
root@ubuntu:~# kubectl get node
Error from server (ServerTimeout): the server cannot complete the requested operation at this time, try again later (get nodes)
看下apiserver的日志,看看是不是etcd之类的连接不上?
root@ubuntu:~# systemctl status kube-apiserver.service
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2017-08-07 16:35:34 CST; 1min 44s ago
Main PID: 4049 (kube-apiserver)
Memory: 590.2M
CGroup: /system.slice/kube-apiserver.service
Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173009 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: the server cannot complete the requested operation at th
Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173279 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: the server cannot complete the requested operation at this
Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173315 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: the server cannot complete the requested operation at t
Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.173412 4049 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: the server cannot complete the requested operation a
Aug 07 16:36:35 ubuntu kube-apiserver[4049]: E0807 16:36:35.245799 4049 storage_rbac.go:140] unable to initialize clusterroles: the server cannot complete the requested operation at this time, try again later (get clusterroles.rbac.authorization.k8s.io)
Aug 07 16:36:36 ubuntu kube-apiserver[4049]: E0807 16:36:36.255947 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
Aug 07 16:36:45 ubuntu kube-apiserver[4049]: E0807 16:36:45.835551 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
Aug 07 16:36:53 ubuntu kube-apiserver[4049]: E0807 16:36:53.340572 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
Aug 07 16:37:00 ubuntu kube-apiserver[4049]: E0807 16:37:00.464829 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
Aug 07 16:37:08 ubuntu kube-apiserver[4049]: E0807 16:37:08.018637 4049 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
找到问题 了 是因为kuberneters1.6 和etcd的版本 不匹配 etcd 要3.0以上的版本
在执行systemctl status <service_name>命令后结果出现running,但是还有E0802这个提示。然后到后面执行kubectl get node后出现提示服务没完全启动,超时;要怎么办呢?
建议你先看下kubernetes的几个进程运行是不是正常,看你描述的症状,应该是哪个进程工作不正常,我估计可能是etcd或者kubelet,你可以看下进程是否运行正常,如果都是running的,再跟一下日志,比如之前我遇到kubernets的新版本和etcd旧版本接口不兼容等问题就会导致你说的这种情况,这种从etcd的日志是可以判断出来的。工作原因,已经好久没搞Kubernetes了,以后估计也不会高了...希望对你有帮助...
非常感谢您的回答,问题的原因在kubelet,配置文件WorkingDirectory路径写错,改正后结果正确,您的教程对我帮助很大,我会一直关注学习的