阿辉的博客

系统 网络 集群 数据库 分布式云计算等 研究

在kubernetes 上部署ceph Rook测试

下载:

mkdir rook
cd rook/

wget https://raw.githubusercontent.com/rook/rook/release-1.0/cluster/examples/kubernetes/ceph/common.yaml
wget https://raw.githubusercontent.com/rook/rook/release-1.0/cluster/examples/kubernetes/ceph/cluster.yaml
wget https://raw.githubusercontent.com/rook/rook/release-1.0/cluster/examples/kubernetes/ceph/operator.yaml

修改配置:

vim cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: ceph/ceph:v13.2.6-20190604 #ceph版本
    allowUnsupported: false
  dataDirHostPath: /data/ceph/rook  #存储的目录
  mon:
    count: 3
    allowMultiplePerNode: false
  dashboard:
    enabled: true
  network:
    hostNetwork: false
  rbdMirroring:
    workers: 0
  annotations:
  resources:
    useAllNodes: true
    useAllDevices: true

部署:

kubectl create -f common.yaml 
kubectl create -f operator.yaml 
kubectl create -f cluster.yaml

查看POD状态:

[root@sh-ops-k8stest-master-dev-01 rook]# kubectl get all -n rook-ceph 
NAME                                      READY   STATUS    RESTARTS   AGE
pod/rook-ceph-agent-2qrlm                 1/1     Running   0          57s
pod/rook-ceph-agent-l6v7f                 1/1     Running   0          57s
pod/rook-ceph-agent-td7zz                 1/1     Running   0          57s
pod/rook-ceph-mon-a-68947cffc8-j6tpt      1/1     Running   0          27s
pod/rook-ceph-mon-b-65ccd7bcc6-nvm5h      1/1     Running   0          19s
pod/rook-ceph-operator-7f5ff79d9f-dknph   1/1     Running   0          88s
pod/rook-discover-css92                   1/1     Running   0          57s
pod/rook-discover-fvkcf                   1/1     Running   0          57s
pod/rook-discover-wgl56                   1/1     Running   0          57s

NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/rook-ceph-mon-a   ClusterIP   10.248.255.203   <none>        6789/TCP   28s
service/rook-ceph-mon-b   ClusterIP   10.248.254.82    <none>        6789/TCP   21s

NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/rook-ceph-agent   3         3         3       3            3           <none>          57s
daemonset.apps/rook-discover     3         3         3       3            3           <none>          57s

NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rook-ceph-mon-a      1/1     1            1           27s
deployment.apps/rook-ceph-mon-b      1/1     1            1           19s
deployment.apps/rook-ceph-operator   1/1     1            1           88s

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/rook-ceph-mon-a-68947cffc8      1         1         1       27s
replicaset.apps/rook-ceph-mon-b-65ccd7bcc6      1         1         1       19s
replicaset.apps/rook-ceph-operator-7f5ff79d9f   1         1         1       88s

ActiveMQ 配置为每个队列一个kahaDB

ActiveMQ 使用KahaDB存储时,默认的配置是所有队列都存在一个KahaDB内。使用久了会发现一个问题,就是DB文件越来越大,几十上百G。主要原因是当队列内只要有一条消息没有被消费掉,那么ActiveMQ是不会清理KahaDB的文件的。

为此,我们可以加一个过滤器,配置成每个队列一个KahaDB,来缓解这个问题,配置如下:

        <persistenceAdapter>
           <mKahaDB directory="${activemq.data}/kahadb">
            <filteredPersistenceAdapters>
                  <!-- kahaDB per destinations -->
              <filteredKahaDB perDestination="true">
                <persistenceAdapter>
                    <kahaDB ignoreMissingJournalfiles="true"  checkForCorruptJournalFiles="true"  checksumJournalFiles="true" />
                </persistenceAdapter>
              </filteredKahaDB>
             </filteredPersistenceAdapters>
           </mKahaDB>
        </persistenceAdapter>

(更多…)

Ceph分布式存储实战读书笔记-1

1. 初识Ceph

1.1 Ceph核心组件

  • Ceph OSD:全称是Object Storage Device,主要功能包括存储数据,处理数据的复制,恢复,回补,平衡数据分布。并将一些相关数据提供给Ceph Monitor,例如Ceph OSD心跳等。每一个Disk,分区都可以成为一个OSD。
  • Ceph Monitor:Ceph的监控器,主要功能是维护整个集群的健康状态,提供一致性的决策,包含了Monitor map,OSD map,PG(Placement Group)map和CRUSH map。
  • Ceph MDS:全称是Ceph Metadata Server,主要保存的是Ceph文件系统的元数据。注意:Ceph的块存储和Ceph的对象存储都不需要MDS,Ceph FS需要。

一个Ceph集群至少需要一个Ceph Monitor和至少2个Ceph的OSD。

(更多…)

关于maven打包与jdk版本的一些关系

最近让不同JAVA版本的容器maven打包折腾的不行,终于理出了一点头绪。在这里记录下备忘。

1. Maven与jdk版本的关系

先明确一个概念,关高版本JDK运行maven,是可以打出低版本的JAVA目标二进制文件的。比如用jdk 1.8运行maven,可以编译1.8,1.7.1.6等的代码,并输出相应版本的二进制文件。

当然,用低版本的jdk运行maven,是不可能输出高版本的JAVA二进制文件的。

另外:maven用哪个版本的JDK运行,取决于环境变量JAVA_HOME指向的是哪个版本。

(更多…)

通过docker overlay2 目录名查找容器名

有时候经常会有个别容器占用磁盘空间特别大,这个时候就需要通过docker overlay2 目录名查找容器名:

先进入overlay2的目录,这个目录可以通过docker的配置文件(/etc/docker/daemon.json)内找到。然后看看谁占用空间比较多。

[root@sh-saas-k8s1-node-qa-04 overlay2]# du -sc * | sort -rn  | more
33109420        total
1138888 20049e2e445181fc742b9e74a8819edf0e7ee8f0c0041fb2d1c9d321f73d8f5b
1066548 010d0a26a1fe5b00e330d0d87649fc73af45f9333fd824bf0f9d91a37276af18
943208  030c0f111675f6ed534eaa6e4183ec91d4c065dd4bdb5a289f4b572357667378
825116  0ad9e737795dd367bb72f7735fb69a65db3d8907305b305ec21232505241d044
824756  bf3c698966bc19318f3263631bc285bde07c6a1a4eaea25c4ecd3b7b8f29b3fd
661000  15763b72802e1e71cc943e09cba8b747779bf80fa35d56318cf1b89f7b1f1e71
575564  02eaa52e2f999dc387a9dee543028bada0762022cef1400596b5cc18a6223635
486780  4353c30611d7f51932d9af24bb1330db0fdb86faa9d9cae02ed618fa975c697a
486420  562a8874cc345b8ea830c1486c42211b288c886c5dca08e14d7057cacab984c1
486420  4f897e8cd355320b0e7ee1ecc9db5c43d5151f9afa29f1925fe264c88429be4c
448652  a8d0596d123fcc59983ce63df3f3acd40d5c930ed72874ce0a9efbc3234466de
448296  851cc4912edb9989e120acf241f26c82e9715e7fcb1d2bd19a002fcfb894f1f4
417780  20608baacae6bafcd4230a18a272857bc75703a6eefef5c9b40ba4ea19496b11
387388  43a8a76de3b5531e4c12f956f7bfcfcdb8fd38548faf20812cafa9a39813abc5

再通过目录名查找容器名:

[root@sh-saas-k8s1-node-qa-04 overlay2]#  docker ps -q | xargs docker inspect --format '{{.State.Pid}}, {{.Name}}, {{.GraphDriver.Data.WorkDir}}' | grep "20049e2e445181fc742b9e74a8819edf0e7ee8f0c0041fb2d1c9d321f73d8f5b"
4884, /k8s_taskmanager_flink-taskmanager-java-qa-7879d55f45-xbd74_public-tjpm-flink-java-qa_ad8bf915-a23f-11e9-be66-52540088db9a_0, /data/kubernetes/docker/overlay2/20049e2e445181fc742b9e74a8819edf0e7ee8f0c0041fb2d1c9d321f73d8f5b/work

如果发现有目录查不到,通常是因为容器已经被删掉了,目录没有清理,这时直接清理便可:

docker system prune -a -f

ES内排除节点

ES内排除节点:

curl -XPUT 127.0.0.1:9800/_cluster/settings -d '{
  "transient" :{
      "cluster.routing.allocation.exclude._ip" : "10.16.16.30,10.16.16.63"
   }
}'

然后可以用以下命令查看迁移过程:
curl -XGET 'http://localhost:9800/_cat/shards?v'| grep RELOCATING

k8s创建集群只读service account

有时需要在k8s 集群上给比如开发人员创建一个只读的service account,在这里记录一下创建方法:

先创建oms-viewonly.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: oms-viewonly
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - endpoints
  - persistentvolumeclaims
  - pods
  - replicationcontrollers
  - replicationcontrollers/scale
  - serviceaccounts
  - services
  - nodes
  - persistentvolumeclaims
  - persistentvolumes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - bindings
  - events
  - limitranges
  - namespaces/status
  - pods/log
  - pods/status
  - replicationcontrollers/status
  - resourcequotas
  - resourcequotas/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - daemonsets
  - deployments
  - deployments/scale
  - replicasets
  - replicasets/scale
  - statefulsets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - deployments/scale
  - ingresses
  - networkpolicies
  - replicasets
  - replicasets/scale
  - replicationcontrollers/scale
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterrolebindings
  - clusterroles
  - roles
  - rolebindings
  verbs:
  - get
  - list
  - watch

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: oms-read 
  namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: oms-read
  labels: 
    k8s-app: oms-read
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: oms-viewonly
subjects:
- kind: ServiceAccount
  name: oms-read
  namespace: kube-system

然后创建:
kubectl apply -f oms-viewonly.yaml

最后就可以使用以下命令查找刚刚创建SA的token:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep oms-read | awk '{print $1}')

k8s 命令常用批量操作

用一行命令搞定:
kubectl get pods --all-namespaces -o wide | grep Evicted | awk '{print $1,$2}' | xargs -L1 kubectl delete pod -n

如:

kubectl get pods --all-namespaces -o wide | grep Evicted | awk '{print $1,$2}' | xargs -L1 kubectl delete pod -n 
pod "public-fe-tweeter-node-online-65fc7889-tjgn9" deleted
pod "flink-taskmanager-java-online-76b8c459f-5xqtv" deleted
pod "flink-taskmanager-java-online-76b8c459f-7hfdw" deleted
pod "flink-taskmanager-java-online-76b8c459f-jkb8l" deleted
pod "flink-taskmanager-java-online-76b8c459f-nwls4" deleted
pod "flink-taskmanager-java-online-76b8c459f-t7xxk" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-67bdb8585c-skzz5" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-69996b44-kcnqp" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-6cb9c6cb5-qjfj4" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-6cb9c6cb5-rr9nf" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-77948c97c5-2m4jf" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-77948c97c5-qjgh5" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-b4c456646-qr5wl" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-b4c456646-sshlr" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-b4c456646-wqnpq" deleted

也可以把Evicted换成OutOfcpu等其它状态使用。

批量加标签:

for node in `kubectl get node | grep node | awk '{print $1}'`; do kubectl label node $node edug=traefik ; done 

通过重定向实现在容器内进行网络调试

在容器内,很多时候ping,telnet的命令都没有,进行网络调试很受限,可通过重定向实现基于tcp/udp协议的软件通讯。

linux 设备里面有个比较特殊的文件:

/dev/[tcp|upd]/host/port 只要读取或者写入这个文件,相当于系统会尝试连接:host 这台机器,对应port端口。如果主机以及端口存在,就建立一个socket 连接。将在,/proc/self/fd目录下面,有对应的文件出现。

[chengmo@centos5 shell]$ cat</dev/tcp/127.0.0.1/22
SSH-2.0-OpenSSH_5.1
#我的机器shell端口是:22
#实际:/dev/tcp根本没有这个目录,这是属于特殊设备
[chengmo@centos5 shell]$ cat</dev/tcp/127.0.0.1/223
-bash: connect: 拒绝连接
-bash: /dev/tcp/127.0.0.1/223: 拒绝连接
#223接口不存在,打开失败

[chengmo@centos5 shell]$ exec 8<>/dev/tcp/127.0.0.1/22
[chengmo@centos5 shell]$ ls -l /proc/self/fd/
总计 0
lrwx------ 1 chengmo chengmo 64 10-21 23:05 0 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:05 1 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:05 2 -> /dev/pts/0
lr-x------ 1 chengmo chengmo 64 10-21 23:05 3 -> /proc/22185/fd
lrwx------ 1 chengmo chengmo 64 10-21 23:05 8 -> socket:[15067661]

#文件描述符8,已经打开一个socket通讯通道,这个是一个可以读写socket通道,因为用:"<>"打开
[chengmo@centos5 shell]$ exec 8>&-
#关闭通道
[chengmo@centos5 shell]$ ls -l /proc/self/fd/
总计 0
lrwx------ 1 chengmo chengmo 64 10-21 23:08 0 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:08 1 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:08 2 -> /dev/pts/0
lr-x------ 1 chengmo chengmo 64 10-21 23:08 3 -> /proc/22234/fd

Downloading a URL via /dev/tcp:

exec 5<>/dev/tcp/www.net.cn/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5