2-K8S集群部署

k8s.jpg

k8s集群搭建方式

kubeadm是一个k8s部署工具,提供kubeadm init和kubeadm join

kubeadm部署方式介绍

kubeadm是官方社区推出的一个用于快速部署k8s集群的工具,这个工具能通过两条指令完成一个k8s集群的部署
第一、创建一个master节点 kubeadm init
第二、将node节点加入到当前集群中 kubeadm join

安装部署要求

  1. 硬件配置:2G或更多,2个CPU或更多,硬盘30G或更多
  2. 集群中所有机器网络互通
  3. 可以访问外网,需要拉取镜像
  4. 禁止swap分区

安装步骤

  1. 在所有节点上安装容器运行时、kubeadm
  2. 部署k8s master
  3. 部署容器网络插件
  4. 部署k8s node节点,将节点加入到k8s集群中
  5. 部署 dashboard web页面,可视化查看k8s资源

准备环境

角色 IP
master1 192.168.27.8
master2 192.168.27.9
master3 192.168.27.10
node1 192.168.27.11
node2 192.168.27.12
node3 192.168.27.13

添加映射以及免密

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#添加映射
cat << EOF >> /etc/hosts
192.168.27.8 master1
192.168.27.9 master2
192.168.27.10 master3
192.168.27.11 node1
192.168.27.12 node2
192.168.27.13 node3
EOF
# 设置免密
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N ""
ssh-copy-id master2
ssh-copy-id master3
ssh-copy-id node1
ssh-copy-id node2
ssh-copy-id node3

系统初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#检查系统端口  
ss -alnupt |grep -E '6443|10250|10259|10257|2379|2380'
ss -alnupt |grep -E '10250|3[0-2][0-7][0-6][0-7]'

#主机名
hostnamectl set-hostname <hostname>

#关闭 selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config # 永久
setenforce 0;getenforce # 临时

#关闭 swap
swapoff -a # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab # 永久


#转发 IPv4 并让 iptables 看到桥接流量
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

#加载br_netfilter 和 overlay 模块
sudo modprobe overlay
sudo modprobe br_netfilter
# 确认模块加载
lsmod | grep br_netfilter
lsmod | grep overlay

# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 应用 sysctl 参数而不重新启动
sudo sysctl --system
# 确认系统变量修改为1
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward

安装containerd

  1. 安装containerd
    containerd项目地址:https://github.com/containerd/containerd
    containerd安装文档:https://github.com/containerd/containerd/blob/main/docs/getting-started.md
    K8s官方安装文档:https://v1-29.docs.kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#containerd-systemd
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    #step 1:
    # 下载安装包
    wget https://github.com/containerd/containerd/releases/download/v1.7.22/containerd-1.7.22-linux-amd64.tar.gz
    # 将安装包解压到/usr/local
    tar Cxzvf /usr/local containerd-1.7.22-linux-amd64.tar.gz
    # 使用systemctl托管containerd
    wget -O /usr/lib/systemd/system/containerd.service https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
    systemctl daemon-reload
    systemctl enable --now containerd

    # step 2:
    # 安装runc
    wget https://github.com/opencontainers/runc/releases/download/v1.2.0-rc.3/runc.amd64
    install -m 755 runc.amd64 /usr/local/sbin/runc

    # step 3:安装CNI插件
    wget https://github.com/containernetworking/plugins/releases/download/v1.5.1/cni-plugins-linux-amd64-v1.5.1.tgz
    mkdir -p /opt/cni/bin
    tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.5.1.tgz

    # 生成默认配置文件
    mkdir /etc/containerd
    containerd config default > /etc/containerd/config.toml

    #配置 systemd cgroup 驱动,从 v1.22 开始,在使用 kubeadm 创建集群时,如果用户没有在 KubeletConfiguration 下设置 cgroupDriver 字段,kubeadm 默认使用 systemd。
    sed -i '/SystemdCgroup/s/false/true/' /etc/containerd/config.toml
    #修改沙箱
    sed -i 's|\( *sandbox_image =\) "registry.k8s.io/pause:3.8"|\1 "registry.aliyuncs.com/google_containers/pause:3.9"|g' /etc/containerd/config.toml

    #重启containerd
    systemctl restart containerd

    #安装nerdctl(命令行工具,用法基本与docker一致)
    curl -L https://github.com/containerd/nerdctl/releases/download/v1.7.7/nerdctl-full-1.7.7-linux-amd64.tar.gz -o nerdctl-full.tar.gz
    sudo tar Cxzvf /usr/local nerdctl-full.tar.gz
    nerdctl --version

    # 配置阿里云镜像加速
    # 修改/etc/containerd/cert.d
    # sudo sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.registry\]/,/config_path =/s/config_path = ""/config_path = "\/etc\/containerd\/certs.d"/' /etc/containerd/config.toml
    [plugins."io.containerd.grpc.v1.cri".registry]
    config_path = "/etc/containerd/certs.d"

    # 确保没有存在与mirror相关的配置,否则删除
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
    endpoint = ["https://registry-1.docker.io"]
    # 重启containerd
    systemctl restart containerd
    # 配置阿里云docker hub镜像加速
    mkdir -p /etc/containerd/certs.d/docker.io
    cat > /etc/containerd/certs.d/docker.io/hosts.toml << EOF
    server = "https://registry-1.docker.io"

    [host."https://dexxxx.mirror.aliyuncs.com"]
    capabilities = ["pull", "resolve", "push"]
    EOF

安装nginx与keepalived搭建高可用均衡负载

  • master节点配置nginx

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    stream {
    upstream kube_apiserver {
    server 192.168.27.8:6443;
    server 192.168.27.9:6443;
    server 192.168.27.10:6443;
    }

    server {
    listen 6443;
    proxy_pass kube_apiserver;
    }
    }
  • master节点配置keepalived

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    # Keepalived配置文件示例

    global_defs {
    # 路由id:当前安装keepalived的节点主机标识符,确保全局唯一
    router_id MASTER_1 # 对于备用节点,修改为BACKUP_1或其他唯一标识符
    }

    # 业务应用检测脚本(可选)
    #vrrp_script check_component {
    # # 业务应用检测脚本路径
    # script "/etc/keepalived/check_component.sh"
    # # 每隔两秒运行一次脚本
    # interval 2
    # # 脚本运行的超时时间
    # timeout 5
    # # 配置几次检测失败才认为服务异常
    # fall 2
    # # 优先级变化幅度,如果script中的指令执行失败,那么相应的vrrp_instance的优先级会减少10个点。
    # weight -10
    #}

    vrrp_instance VI_1 {
    # 表示状态是MASTER主机还是备用机BACKUP
    state MASTER # 对于备用节点,修改为BACKUP
    # 该实例绑定的网卡,如:eth0
    interface ens160
    # 保证主备节点一致即可
    virtual_router_id 51
    # 权重,master权重一般高于backup,如果有多个,那就是选举,谁的权重高,谁就当选
    priority 100 # 对于备用节点,设置一个较低的优先级,如70
    # 主备之间同步检查时间间隔,单位秒
    advert_int 1
    # 认证权限密码,防止非法节点进入
    authentication {
    auth_type PASS
    auth_pass password # 替换为实际的密码,支持最多8个字符
    }

    # 如果需要非抢占模式,可以添加以下行
    # nopreempt

    # 业务应用检测脚本
    # track_script {
    # check_component
    # }

    # 虚拟出来的ip,可以有多个(vip)
    virtual_ipaddress {
    # 注意:主备两台的vip都是一样的,绑定到同一个vip,设置别名
    192.168.27.100/24 dev ens160 label ens160:0
    }
    }

    # 如果需要多个VRRP实例,可以添加更多vrrp_instance块
    # vrrp_instance VI_2 {
    # ...
    # }

安装K8s

配置阿里源并在所有节点安装kubelet kubeadm kubectl

使用的是阿里云镜像,目前该源支持 v1.24 - v1.29 版本,这里安装1.29

1
2
3
4
5
6
7
8
9
10
11
cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/repodata/repomd.xml.key
EOF
setenforce 0
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet

部署k8s master

  1. 在master1节点执行
    1
    2
    #生成初始化配置文件
    kubeadm config print init-defaults > kubeadm-config.yaml
    初始化配置文件说明
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: InitConfiguration
    localAPIEndpoint:
    advertiseAddress: 192.168.27.8 # 当前 Master 节点的 IP 地址
    bindPort: 6443
    nodeRegistration:
    criSocket: unix:///var/run/containerd/containerd.sock # 使用 containerd
    name: master1 # 当前 Master 节点的主机名
    taints: # 为 Master 节点设置污点,防止普通 Pod 调度到 Master 节点
    - effect: NoSchedule
    key: node-role.kubernetes.io/master
    ---
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: ClusterConfiguration
    controlPlaneEndpoint: "192.168.27.100:6443" # 高可用的 VIP 地址
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes #集群名称
    apiServer:
    certSANs: # 包含所有 Master 节点的主机名和 IP 地址
    - master1
    - master2
    - master3
    - 192.168.27.8
    - 192.168.27.9
    - 192.168.27.10
    - 192.168.27.100
    extraArgs:
    authorization-mode: "Node,RBAC"
    timeoutForControlPlane: 4m0s
    controllerManager: {}
    dns: {}
    etcd:
    local:
    dataDir: /var/lib/etcd
    serverCertSANs: # 添加 etcd 的 certSANs
    - master1
    - master2
    - master3
    - 192.168.27.8
    - 192.168.27.9
    - 192.168.27.10
    peerCertSANs: # 添加 etcd 的 certSANs
    - master1
    - master2
    - master3
    - 192.168.27.8
    - 192.168.27.9
    - 192.168.27.10
    imageRepository: registry.aliyuncs.com/google_containers # 使用国内镜像仓库
    kubernetesVersion: v1.29.0 # Kubernetes 版本
    networking:
    dnsDomain: cluster.local
    podSubnet: 10.244.0.0/16 # Flannel 的默认 Pod 网段
    serviceSubnet: 10.96.0.0/12 # Service 的网段
    scheduler: {}
    ---
    apiVersion: kubelet.config.k8s.io/v1beta1
    kind: KubeletConfiguration
    cgroupDriver: systemd # 使用 systemd 作为 cgroup 驱动
    ---
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    kind: KubeProxyConfiguration
    mode: ipvs # 使用 ipvs 作为 kube-proxy 模式
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    #预拉取镜像
    ~# ]kubeadm config images pull --config kubeadm-config.yaml
    [config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0
    [config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.0
    [config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.0
    [config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.29.0
    [config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.11.1
    [config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.9
    [config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.16-0
    #通过配置文件进行初始化
    kubeadm init --config kubeadm-config.yaml

出现的问题

在进行kubeadm初始化时出现的问题

1
2
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0912 14:05:04.449332 21523 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.aliyuncs.com/google_containers/pause:3.9" as the CRI sandbox image.

containerd中pause与kubeadm初始化下载的镜像不一致导致

1
2
3
4
5
6
7
# 修改方式,将/etc/containerd/config.toml中sandbox_image修改为registry.aliyuncs.com/google_containers/pause:3.9
sed -i 's|\( *sandbox_image =\) "registry.k8s.io/pause:3.8"|\1 "registry.aliyuncs.com/google_containers/pause:3.9"|g' /etc/containerd/config.toml
# 重新进行初始化
kubeadm reset -f
rm -rf /etc/kubernetes/pki
rm -rf /var/lib/etcd
kubeadm init --config kubeadm-config.yaml

初始化完成

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

kubeadm join 192.168.27.100:6443 --token slt8h5.nyma46fh6pemq63n \
--discovery-token-ca-cert-hash sha256:3d149184ef709ebbaf8a60723c268c1270c7af6eda87684cd0d42f6a38af8519 \
--control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.27.100:6443 --token slt8h5.nyma46fh6pemq63n \
--discovery-token-ca-cert-hash sha256:3d149184ef709ebbaf8a60723c268c1270c7af6eda87684cd0d42f6a38af8519

执行完成后会在末尾显示kubeadm join….内容,此为node节点添加到集群中时使用的命令,有效时长为24小时,过期后就需要重新创建token

1
2
kubeadm token create --print-join-command
kubeadm token list

成功初始化后根据提示进行以下操作

  1. 根据用户选择执行,以便可以使用kubectl命令
    1
    2
    3
    4
    5
    6
    7
    # 普通用户使用kubectl
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config

    # root用户使用kubectl
    export KUBECONFIG=/etc/kubernetes/admin.conf
  2. 将master节点加入
    在master节点加入时会报错
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    [preflight] Running pre-flight checks
    [preflight] Reading configuration from the cluster...
    [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
    error execution phase preflight:
    One or more conditions for hosting a new control plane instance is not satisfied.

    [failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory, failure loading key for service account: couldn't load the private key file /etc/kubernetes/pki/sa.key: open /etc/kubernetes/pki/sa.key: no such file or directory, failure loading certificate for front-proxy CA: couldn't load the certificate file /etc/kubernetes/pki/front-proxy-ca.crt: open /etc/kubernetes/pki/front-proxy-ca.crt: no such file or directory, failure loading certificate for etcd CA: couldn't load the certificate file /etc/kubernetes/pki/etcd/ca.crt: open /etc/kubernetes/pki/etcd/ca.crt: no such file or directory]

    Please ensure that:
    * The cluster has a stable controlPlaneEndpoint address.
    * The certificates that must be shared among control plane instances are provided.


    To see the stack trace of this error execute with --v=5 or higher

    这个错误表明在加入新的 Master 节点时,kubeadm 无法找到所需的证书文件(如 ca.crt、sa.key 等)。这是因为在加入新的 Master 节点时,需要将第一个 Master 节点的证书文件复制到新节点上。
    解决方式
    1
    2
    3
    4
    5
    6
    7
    8
    # master1执行
    tar -czf /tmp/pki.tar.gz -C /etc/kubernetes pki
    scp /tmp/pki.tar.gz root@192.168.27.9:/tmp/
    scp /tmp/pki.tar.gz root@192.168.27.10:/tmp/
    # master2、3执行
    sudo mkdir -p /etc/kubernetes
    sudo tar -xzf /tmp/pki.tar.gz -C /etc/kubernetes
    sudo chown -R root:root /etc/kubernetes/pki

然后将两个master节点加入控制面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
  kubeadm join 192.168.27.100:6443 --token slt8h5.nyma46fh6pemq63n \
--discovery-token-ca-cert-hash sha256:3d149184ef709ebbaf8a60723c268c1270c7af6eda87684cd0d42f6a38af8519 \
--control-plane


This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.
  1. 将node节点加入
    此为node节点添加到集群中时使用的命令,有效时长为24小时,过期后就需要重新创建token
    1
    2
    3
    #在node节点执行
    kubeadm join 192.168.27.100:6443 --token slt8h5.nyma46fh6pemq63n \
    --discovery-token-ca-cert-hash sha256:3d149184ef709ebbaf8a60723c268c1270c7af6eda87684cd0d42f6a38af8519

安装Flannel网络插件

1
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

可能会出现访问错误,下载kube-flannel.yml并将其中使用的flannel-cni-plugin、flannel镜像换成阿里云镜像仓库中的镜像
这里自己放在阿里云镜像仓库中的镜像,注意将仓库类型修改为公开,或者需要创建一个 Kubernetes Secret,包含你的阿里云镜像仓库的认证信息

1
2
3
4
5
kubectl create secret docker-registry <secret-name> \
--docker-server=registry.cn-hangzhou.aliyuncs.com \
--docker-username=<your-username> \
--docker-password=<your-password> \
--docker-email=<your-email>

然后手动执行

1
kubectl apply -f kube-flannel.yml

kubectl 命令补全

1
2
3
4
yum install bash-completion -y
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc

测试k8s集群

1
2
3
4
5
kubectl get nodes
kubectl get pods -A
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get pod,svc

访问地址:http://NodeIP:Port