YNOTES笔记

#### 环境介绍 ```sh kubernetes 1.18.20 rook-ceph 1.6.10 ceph version 15.2.13 ``` `注意:kubernetes的节点要准备未使用的磁盘，供ceph集群使用。kubernetes版本要匹配对应的rook-ceph版本` #### 安装rook ##### 下载rook ```sh $ git clone --single-branch --branch v1.6.10 https://github.com/rook/rook.git #当前kubernetes 1.18.20不支持1.7.x ``` ##### 安装rook ```sh $ cd cluster/examples/kubernetes/ceph $ kubectl create -f crds.yaml -f common.yaml -f operator.yaml # verify the rook-ceph-operator is in the `Running` state before proceeding $ kubectl -n rook-ceph get pod ``` #### 安装ceph集群 ```sh $ kubectl create -f cluster.yaml ``` 查看集群状态 ```sh $ kubectl -n rook-ceph get pod NAME READY STATUS RESTARTS AGE csi-cephfsplugin-6tfqr 3/3 Running 0 6m45s csi-cephfsplugin-nldks 3/3 Running 0 6m45s csi-cephfsplugin-provisioner-6c59b5b7d9-lnddw 6/6 Running 0 6m44s csi-cephfsplugin-provisioner-6c59b5b7d9-lvdtc 6/6 Running 0 6m44s csi-cephfsplugin-zkt2x 3/3 Running 0 6m45s csi-rbdplugin-2kwhg 3/3 Running 0 6m46s csi-rbdplugin-7j9hw 3/3 Running 0 6m46s csi-rbdplugin-provisioner-6d455d5677-6qcn5 6/6 Running 0 6m46s csi-rbdplugin-provisioner-6d455d5677-8xp6r 6/6 Running 0 6m46s csi-rbdplugin-qxv4m 3/3 Running 0 6m46s rook-ceph-crashcollector-k8s-master1-7bf874dc98-v9bj9 1/1 Running 0 2m8s rook-ceph-crashcollector-k8s-master2-6698989df9-7hsfr 1/1 Running 0 2m16s rook-ceph-crashcollector-k8s-node1-8578585676-9tf5w 1/1 Running 0 2m16s rook-ceph-mgr-a-5c4759947f-47kbk 1/1 Running 0 2m19s rook-ceph-mon-a-647877db88-629bt 1/1 Running 0 6m56s rook-ceph-mon-b-c44f7978f-p2fq6 1/1 Running 0 5m11s rook-ceph-mon-c-588b48b74c-bbkn8 1/1 Running 0 3m42s rook-ceph-operator-64845bd768-55dkc 1/1 Running 0 13m rook-ceph-osd-0-64f9fc6c65-m7877 1/1 Running 0 2m8s rook-ceph-osd-1-584bf986c7-xj75p 1/1 Running 0 2m8s rook-ceph-osd-2-d59cdbd7f-xcwgf 1/1 Running 0 2m8s rook-ceph-osd-prepare-k8s-master1-vstkq 0/1 Completed 0 2m16s rook-ceph-osd-prepare-k8s-master2-zdjk2 0/1 Completed 0 2m16s rook-ceph-osd-prepare-k8s-node1-wmzj9 0/1 Completed 0 2m16s ``` #### 卸载ceph集群 https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md ```sh kubectl delete -f cluster.yaml ``` 清楚rook目录 ```sh $ rm /var/lib/rook/* -rf ``` #### 安装Rook toolbox ```SH $ kubectl create -f toolbox.yaml $ kubectl -n rook-ceph rollout status deploy/rook-ceph-tools $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash ``` #### 安装dashboard ```sh $ kubectl -n rook-ceph get service ``` 获取登录密码 ```sh $ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo ``` #### 配置外网访问 ```sh $ kubectl create -f dashboard-external-https.yaml ``` ```sh $ kubectl -n rook-ceph get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rook-ceph-mgr ClusterIP 10.101.238.49 <none> 9283/TCP 47m rook-ceph-mgr-dashboard ClusterIP 10.107.124.177 <none> 8443/TCP 47m rook-ceph-mgr-dashboard-external-https NodePort 10.102.110.16 <none> 8443:30870/TCP 14s ``` 访问:https://172.16.100.100:30870/ #### rook提供RBD服务创建pool和StorageClass ```sh $ kubectl create -f cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml ``` ```sh $ kubectl apply -f rook/cluster/examples/kubernetes/mysql.yaml $ kubectl apply -f rook/cluster/examples/kubernetes/wordpress.yaml ``` ```sh kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cephfs-test-pvc-1 Bound pvc-571ae252-b080-4a67-8f3d-40bda1304fe3 500Mi RWX cephfs 2d23h mysql-pv-claim Bound pvc-6f6d70ba-6961-42b5-aa9d-db13c1aeec7c 20Gi RWO rook-ceph-block 15m test-claim Bound pvc-c51eef14-1454-4ddf-99e1-db483dec42c6 1Mi RWX managed-nfs-storage 3d5h wp-pv-claim Bound pvc-d5eb4142-532f-4704-9d15-94ffc7f35dd1 20Gi RWO rook-ceph-block 13m ``` 查看插件日志 ```sh $ kubectl logs csi-rbdplugin-provisioner-85c58fcfb4-nfnvd -n rook-ceph -c csi-provisioner I0925 07:13:27.761137 1 csi-provisioner.go:138] Version: v2.2.2 I0925 07:13:27.761189 1 csi-provisioner.go:161] Building kube configs for running in cluster... W0925 07:13:37.769467 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:13:47.769515 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:13:57.769149 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:14:07.769026 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:14:17.768966 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:14:27.768706 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:14:37.769526 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:14:47.769001 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock W0925 07:14:57.769711 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock I0925 07:15:01.978682 1 common.go:111] Probing CSI driver for readiness I0925 07:15:01.980980 1 csi-provisioner.go:284] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments I0925 07:15:01.982474 1 leaderelection.go:243] attempting to acquire leader lease rook-ceph/rook-ceph-rbd-csi-ceph-com... I0925 07:15:02.001351 1 leaderelection.go:253] successfully acquired lease rook-ceph/rook-ceph-rbd-csi-ceph-com I0925 07:15:02.102184 1 controller.go:835] Starting provisioner controller rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-85c58fcfb4-nfnvd_c5346baf-02b2-4983-9405-45bbc31c216a! I0925 07:15:02.102243 1 volume_store.go:97] Starting save volume queue I0925 07:15:02.102244 1 clone_controller.go:66] Starting CloningProtection controller I0925 07:15:02.102299 1 clone_controller.go:84] Started CloningProtection controller I0925 07:15:02.202672 1 controller.go:884] Started provisioner controller rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-85c58fcfb4-nfnvd_c5346baf-02b2-4983-9405-45bbc31c216a! I0925 07:46:47.261147 1 controller.go:1332] provision "default/mysql-pv-claim" class "rook-ceph-block": started I0925 07:46:47.261389 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"mysql-pv-claim", UID:"6f6d70ba-6961-42b5-aa9d-db13c1aeec7c", APIVersion:"v1", ResourceVersion:"2798560", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/mysql-pv-claim" I0925 07:46:49.310382 1 controller.go:1439] provision "default/mysql-pv-claim" class "rook-ceph-block": volume "pvc-6f6d70ba-6961-42b5-aa9d-db13c1aeec7c" provisioned I0925 07:46:49.310419 1 controller.go:1456] provision "default/mysql-pv-claim" class "rook-ceph-block": succeeded I0925 07:46:49.319422 1 controller.go:1332] provision "default/mysql-pv-claim" class "rook-ceph-block": started I0925 07:46:49.319446 1 controller.go:1341] provision "default/mysql-pv-claim" class "rook-ceph-block": persistentvolume "pvc-6f6d70ba-6961-42b5-aa9d-db13c1aeec7c" already exists, skipping I0925 07:46:49.319615 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"mysql-pv-claim", UID:"6f6d70ba-6961-42b5-aa9d-db13c1aeec7c", APIVersion:"v1", ResourceVersion:"2798560", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-6f6d70ba-6961-42b5-aa9d-db13c1aeec7c I0925 07:48:52.230183 1 controller.go:1332] provision "default/wp-pv-claim" class "rook-ceph-block": started I0925 07:48:52.230656 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"wp-pv-claim", UID:"d5eb4142-532f-4704-9d15-94ffc7f35dd1", APIVersion:"v1", ResourceVersion:"2799324", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/wp-pv-claim" I0925 07:48:52.289747 1 controller.go:1439] provision "default/wp-pv-claim" class "rook-ceph-block": volume "pvc-d5eb4142-532f-4704-9d15-94ffc7f35dd1" provisioned I0925 07:48:52.289790 1 controller.go:1456] provision "default/wp-pv-claim" class "rook-ceph-block": succeeded I0925 07:48:52.295778 1 controller.go:1332] provision "default/wp-pv-claim" class "rook-ceph-block": started I0925 07:48:52.295798 1 controller.go:1341] provision "default/wp-pv-claim" class "rook-ceph-block": persistentvolume "pvc-d5eb4142-532f-4704-9d15-94ffc7f35dd1" already exists, skipping I0925 07:48:52.295844 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"wp-pv-claim", UID:"d5eb4142-532f-4704-9d15-94ffc7f35dd1", APIVersion:"v1", ResourceVersion:"2799324", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-d5eb4142-532f-4704-9d15-94ffc7f35dd1 ``` #### rook提供cephFS服务 ```sh $ kubectl create -f filesystem.yaml ``` ```sh $ kubectl -n rook-ceph get pod -l app=rook-ceph-mds NAME READY STATUS RESTARTS AGE rook-ceph-mds-myfs-a-6dd59747f5-2qjtk 1/1 Running 0 38s rook-ceph-mds-myfs-b-56488764f9-n6fzf 1/1 Running 0 37s ``` ```tex $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash [root@rook-ceph-tools-5b85cb4766-8wj5c /]# ceph status cluster: id: 1c545199-7a68-46df-9b21-2a801c1ad1af health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 50m) mgr: a(active, since 70m) mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay #mds元数据服务 osd: 3 osds: 3 up (since 71m), 3 in (since 71m) data: pools: 4 pools, 97 pgs objects: 132 objects, 340 MiB usage: 4.0 GiB used, 296 GiB / 300 GiB avail pgs: 97 active+clean io: client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr ``` 创建存储类 ```sh $ cat >storageclass-cephfs.yaml <<EOF apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rook-cephfs # Change "rook-ceph" provisioner prefix to match the operator namespace if needed provisioner: rook-ceph.cephfs.csi.ceph.com parameters: # clusterID is the namespace where operator is deployed. clusterID: rook-ceph # CephFS filesystem name into which the volume shall be created fsName: myfs # Ceph pool into which the volume shall be created # Required for provisionVolume: "true" pool: myfs-data0 # The secrets contain Ceph admin credentials. These are generated automatically by the operator # in the same namespace as the cluster. csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph reclaimPolicy: Delete EOF ``` ```sh $ kubectl create -f storageclass-cephfs.yaml ``` 测试cephfs ```sh $ cat >test-cephfs-pvc.yaml <<EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi storageClassName: rook-cephfs EOF ``` ```sh $ kubectl apply -f test-cephfs-pvc.yaml ``` ```sh $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cephfs-pvc Bound pvc-4e200767-9ed9-4b47-8505-3081b8de6893 1Gi RWX rook-cephfs 8s ``` #### 问题 `1.[WRN] MON_CLOCK_SKEW: clock skew detected on mon.b ?` ```tex $ kubectl -n rook-ceph edit ConfigMap rook-config-override -o yaml config: | [global] mon clock drift allowed = 0.5 $ kubectl -n rook-ceph delete pod $(kubectl -n rook-ceph get pods -o custom-columns=NAME:.metadata.name --no-headers| grep mon) $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash [root@rook-ceph-tools-75cf595688-hrtmv /]# ceph -s cluster: id: 023baa6d-8ec1-4ada-bd17-219136f656b4 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 2m) mgr: a(active, since 24m) osd: 3 osds: 3 up (since 25m), 3 in (since 25m) data: pools: 1 pools, 128 pgs objects: 0 objects, 0 B usage: 19 MiB used, 300 GiB / 300 GiB avail pgs: 128 active+clean ``` `2.type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rook-ceph-block": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-34167041-eb16-4f03-b01e-6b915089488d already exists` 删除下面的pod ```sh kubectl get po -n rook-ceph | grep csi-cephfsplugin-provisioner- kubectl get po -n rook-ceph | grep csi-rbdplugin-provisioner- kubectl get po -n rook-ceph | grep csi-rbdplugin- kubectl get po -n rook-ceph | grep csi-cephfsplugin- ``` `3.创建pvc一直pengding状态，查看csi-rbdplugin-provisioner日志提示provision volume with StorageClass "rook-ceph-block": rpc error: code = DeadlineExceeded desc = context deadline exceeded` ```sh K8S和rook-ceph的版本不匹配问题,我使用的是kubernetes 1.18.20之前安装rook-ceph 1.7.x就会报这个错,改成v1.6.10就可以了 ```

#### 环境介绍 ```sh k8s版本:1.18.20 ingress-nginx: 3.4.0 ``` #### 安装ingress-nginx ##### 下载helm安装包 ```sh $ wget https://github.com/kubernetes/ingress-nginx/releases/download/ingress-nginx-3.4.0/ingress-nginx-3.4.0.tgz $ tar xvf ingress-nginx-3.4.0.tgz $ cd ingress-nginx ``` #### 配置参数 ```sh $ cat > values.yaml <<EOF controller: image: repository: registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller #更好阿里云镜像 tag: "v0.40.1" digest: sha256:abffcf2d25e3e7c7b67a315a7c664ec79a1588c9c945d3c7a75637c2f55caec6 pullPolicy: IfNotPresent runAsUser: 101 allowPrivilegeEscalation: true containerPort: http: 80 https: 443 config: {} configAnnotations: {} proxySetHeaders: {} addHeaders: {} dnsConfig: {} dnsPolicy: ClusterFirst reportNodeInternalIp: false hostNetwork: true #开启主机网络 hostPort: enabled: true #开启主机端口 ports: http: 80 https: 443 electionID: ingress-controller-leader ingressClass: nginx podLabels: {} podSecurityContext: {} sysctls: {} publishService: enabled: true pathOverride: "" scope: enabled: false tcp: annotations: {} udp: annotations: {} extraArgs: {} extraEnvs: [] kind: DaemonSet #DaemonSet运行 annotations: {} labels: {} updateStrategy: {} minReadySeconds: 0 tolerations: [] affinity: {} topologySpreadConstraints: [] terminationGracePeriodSeconds: 300 nodeSelector: ingress: nginx #配置部署选择ingress=nginx节点 livenessProbe: failureThreshold: 5 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 port: 10254 readinessProbe: failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 port: 10254 healthCheckPath: "/healthz" podAnnotations: {} #replicaCount: 1 #关闭 minAvailable: 1 resources: requests: cpu: 100m memory: 90Mi autoscaling: enabled: false minReplicas: 1 maxReplicas: 11 targetCPUUtilizationPercentage: 50 targetMemoryUtilizationPercentage: 50 autoscalingTemplate: [] enableMimalloc: true customTemplate: configMapName: "" configMapKey: "" service: enabled: true annotations: {} labels: {} externalIPs: [] loadBalancerSourceRanges: [] enableHttp: true enableHttps: true ports: http: 80 https: 443 targetPorts: http: http https: https type: LoadBalancer nodePorts: http: "" https: "" tcp: {} udp: {} internal: enabled: false annotations: {} extraContainers: [] extraVolumeMounts: [] extraVolumes: [] extraInitContainers: [] admissionWebhooks: enabled: false #关闭 failurePolicy: Fail port: 8443 service: annotations: {} externalIPs: [] loadBalancerSourceRanges: [] servicePort: 443 type: ClusterIP patch: enabled: true image: repository: docker.io/jettech/kube-webhook-certgen tag: v1.3.0 pullPolicy: IfNotPresent priorityClassName: "" podAnnotations: {} nodeSelector: {} tolerations: [] runAsUser: 2000 metrics: port: 10254 enabled: false service: annotations: {} externalIPs: [] loadBalancerSourceRanges: [] servicePort: 9913 type: ClusterIP serviceMonitor: enabled: false additionalLabels: {} namespace: "" namespaceSelector: {} scrapeInterval: 30s targetLabels: [] metricRelabelings: [] prometheusRule: enabled: false additionalLabels: {} rules: [] lifecycle: preStop: exec: command: - /wait-shutdown priorityClassName: "" revisionHistoryLimit: 10 maxmindLicenseKey: "" defaultBackend: enabled: false image: repository: k8s.gcr.io/defaultbackend-amd64 tag: "1.5" pullPolicy: IfNotPresent runAsUser: 65534 extraArgs: {} serviceAccount: create: true name: extraEnvs: [] port: 8080 livenessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 readinessProbe: failureThreshold: 6 initialDelaySeconds: 0 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 5 tolerations: [] affinity: {} podSecurityContext: {} podLabels: {} nodeSelector: {} podAnnotations: {} replicaCount: 1 minAvailable: 1 resources: {} service: annotations: {} externalIPs: [] loadBalancerSourceRanges: [] servicePort: 80 type: ClusterIP priorityClassName: "" rbac: create: true scope: false podSecurityPolicy: enabled: false serviceAccount: create: true name: imagePullSecrets: [] tcp: {} udp: {} EOF ``` #### 创建命名空间 ```sh $ kubectl create namespace ingress-nginx ``` #### 节点打标签 ```sh $ kubectl label nodes k8s-master1 ingress=nginx $ kubectl label nodes k8s-master2 ingress=nginx $ kubectl label nodes k8s-node1 ingress=nginx ``` #### 安装nginx-ingress ```sh $ helm -n ingress-nginx upgrade -i ingress-nginx . ``` #### 卸载ingress-nginx ```sh $ helm -n ingress-nginx uninstall ingress-nginx ``` #### 测试nginx-ingress #### 部署一个测试nginx服务 ```sh $ cat > nginx-deployment.yml <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deploy spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: ClusterIP EOF ``` 配置ingress对象创建TLS证书 ```sh $ kubectl create secret tls shudoon-com-tls --cert=5024509__example.com.pem --key=5024509__example.com.key ``` #### 创建ingress规则 ```sh $ cat >tnginx-ingress.yaml <<EOF apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: nginx name: tnginx-ingress spec: rules: - host: tnginx.example.com http: paths: - path: / backend: serviceName: nginx-service servicePort: 80 # This section is only required if TLS is to be enabled for the Ingress tls: - hosts: - tnginx.example.com secretName: shudoon-com-tls EOF ``` 测试 https://tnginx.example.com/ #### 问题 `问题:创建自定义ingress报错：Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io` 查看策略 ```sh $ kubectl get validatingwebhookconfigurations ``` 删除策略 ```sh $ kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission ```

```sh Cephadm通过manager daemon SSH连接到主机部署和管理Ceph群集，以添加，删除或更新Ceph daemon containers。它不依赖于诸如Ansible，Rook或Salt的外部配置或编排工具。 Cephadm管理Ceph集群的整个生命周期。它首先在一个节点上引导一个微小的Ceph集群(one monitor and one manager)，然后自动将集群扩展到多个主机节点，并提供所有Ceph守护程序和服务。这可以通过Ceph命令行界面（CLI）或仪表板（GUI）执行。Cephadm是Octopus v15.2.0版本中的新增功能，并且不支持旧版本的Ceph。 ``` 节点规划: ```sh +--------+---------------+------+---------------------------------+ | Host | ip | Disk | Role | +--------+---------------+------+---------------------------------+ | test01 | 172.16.100.1 | sdc | cephadm,mon,mgr,mds,osd | | test02 | 172.16.100.2 | sdc | mon,mgr,osd | | test11 | 172.16.100.11 | sdb | mon,mds,osd | +--------+---------------+------+---------------------------------+ ``` ##### 安装说明 ```sh ceph版本：15.2.14 octopus(stable) python: 3+ ``` #### 安装docker [参考CentOS7安装Docker-CE](https://ynotes.cn/blog/article_detail/118) #### 安装cephadm(`test01`) ```sh curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm chmod +x cephadm ./cephadm add-repo --release octopus ./cephadm install ``` cephadm引导ceph集群 ```sh cephadm bootstrap --mon-ip 172.16.100.1 ``` ```sh Creating directory /etc/ceph for ceph.conf Verifying podman|docker is present... Verifying lvm2 is present... Verifying time synchronization is in place... Unit chronyd.service is enabled and running Repeating the final host check... podman|docker (/usr/bin/docker) is present systemctl is present lvcreate is present Unit chronyd.service is enabled and running Host looks OK Cluster fsid: 797b1aa4-16cf-11ec-a787-00505693fd22 Verifying IP 172.16.100.1 port 3300 ... Verifying IP 172.16.100.1 port 6789 ... Mon IP 172.16.100.1 is in CIDR network 172.16.100.0/24 Pulling container image docker.io/ceph/ceph:v15... Extracting ceph user uid/gid from container image... Creating initial keys... Creating initial monmap... Creating mon... Waiting for mon to start... Waiting for mon... mon is available Assimilating anything we can from ceph.conf... Generating new minimal ceph.conf... Restarting the monitor... Setting mon public_network... Creating mgr... Verifying port 9283 ... Wrote keyring to /etc/ceph/ceph.client.admin.keyring Wrote config to /etc/ceph/ceph.conf Waiting for mgr to start... Waiting for mgr... mgr not available, waiting (1/10)... mgr not available, waiting (2/10)... mgr not available, waiting (3/10)... mgr not available, waiting (4/10)... mgr not available, waiting (5/10)... mgr not available, waiting (6/10)... mgr is available Enabling cephadm module... Waiting for the mgr to restart... Waiting for Mgr epoch 5... Mgr epoch 5 is available Setting orchestrator backend to cephadm... Generating ssh key... Wrote public SSH key to to /etc/ceph/ceph.pub Adding key to root@localhost's authorized_keys... Adding host test01... Deploying mon service with default placement... Deploying mgr service with default placement... Deploying crash service with default placement... Enabling mgr prometheus module... Deploying prometheus service with default placement... Deploying grafana service with default placement... Deploying node-exporter service with default placement... Deploying alertmanager service with default placement... Enabling the dashboard module... Waiting for the mgr to restart... Waiting for Mgr epoch 13... Mgr epoch 13 is available Generating a dashboard self-signed certificate... Creating initial admin user... Fetching dashboard port number... Ceph Dashboard is now available at: URL: https://ceph1:8443/ User: admin Password: hj49zqnfm2 You can access the Ceph CLI with: sudo /usr/sbin/cephadm shell --fsid 797b1aa4-16cf-11ec-a787-00505693fd22 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring Please consider enabling telemetry to help improve Ceph: ceph telemetry on For more information see: https://docs.ceph.com/docs/master/mgr/telemetry/ Bootstrap complete. ``` ```sh 在本地主机上为新集群创建monitor 和 manager daemon守护程序。为Ceph集群生成一个新的SSH密钥，并将其添加到root用户的/root/.ssh/authorized_keys文件中。将与新群集进行通信所需的最小配置文件保存到/etc/ceph/ceph.conf。向/etc/ceph/ceph.client.admin.keyring写入client.admin管理（特权！）secret key的副本。将public key的副本写入/etc/ceph/ceph.pub ``` 查看节点运行的docker容器 ```sh $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3a707133d00e ceph/ceph:v15 "/usr/bin/ceph-mds -…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-mds.cephfs.test01.kqizcm 36ada96953ea ceph/ceph:v15 "/usr/bin/ceph-osd -…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-osd.0 08aa352c61f1 ceph/ceph:v15 "/usr/bin/ceph-mgr -…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-mgr.test01.soadiy 47be185fc946 prom/node-exporter:v0.18.1 "/bin/node_exporter …" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-node-exporter.test01 dad697dd6669 ceph/ceph-grafana:6.7.4 "/bin/sh -c 'grafana…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-grafana.test01 754ba4d9e132 prom/alertmanager:v0.20.0 "/bin/alertmanager -…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-alertmanager.test01 2b7e3664edbf ceph/ceph:v15 "/usr/bin/ceph-crash…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-crash.test01 c7cf9f5502de ceph/ceph:v15 "/usr/bin/ceph-mon -…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-mon.test01 ba3958a9dec8 prom/prometheus:v2.18.1 "/bin/prometheus --c…" 5 days ago Up 5 days ceph-797b1aa4-16cf-11ec-a787-00505693fd22-prometheus.test01 c737fb65080c elasticsearch:7.8.0 "/tini -- /usr/local…" 7 months ago Up 5 days 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp es7.6 ``` 添加节点 ```sh ceph orch host add test02 172.16.100.2 ceph orch host add test11 172.16.100.11 ``` 添加硬盘 ```sh ceph orch daemon add osd test01:/dev/sdc ceph orch daemon add osd test02:/dev/sdc ceph orch daemon add osd test11:/dev/sdb ``` ceph osd pool create k8s 64