阿辉的博客

系统 网络 集群 数据库 分布式云计算等 研究

k8s创建集群只读service account

有时需要在k8s 集群上给比如开发人员创建一个只读的service account,在这里记录一下创建方法:

先创建oms-viewonly.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: oms-viewonly
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - endpoints
  - persistentvolumeclaims
  - pods
  - replicationcontrollers
  - replicationcontrollers/scale
  - serviceaccounts
  - services
  - nodes
  - persistentvolumeclaims
  - persistentvolumes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - bindings
  - events
  - limitranges
  - namespaces/status
  - pods/log
  - pods/status
  - replicationcontrollers/status
  - resourcequotas
  - resourcequotas/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - daemonsets
  - deployments
  - deployments/scale
  - replicasets
  - replicasets/scale
  - statefulsets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - deployments/scale
  - ingresses
  - networkpolicies
  - replicasets
  - replicasets/scale
  - replicationcontrollers/scale
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterrolebindings
  - clusterroles
  - roles
  - rolebindings
  verbs:
  - get
  - list
  - watch

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: oms-read 
  namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: oms-read
  labels: 
    k8s-app: oms-read
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: oms-viewonly
subjects:
- kind: ServiceAccount
  name: oms-read
  namespace: kube-system

然后创建:
kubectl apply -f oms-viewonly.yaml

最后就可以使用以下命令查找刚刚创建SA的token:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep oms-read | awk '{print $1}')

k8s 命令常用批量操作

用一行命令搞定:
kubectl get pods --all-namespaces -o wide | grep Evicted | awk '{print $1,$2}' | xargs -L1 kubectl delete pod -n

如:

kubectl get pods --all-namespaces -o wide | grep Evicted | awk '{print $1,$2}' | xargs -L1 kubectl delete pod -n 
pod "public-fe-tweeter-node-online-65fc7889-tjgn9" deleted
pod "flink-taskmanager-java-online-76b8c459f-5xqtv" deleted
pod "flink-taskmanager-java-online-76b8c459f-7hfdw" deleted
pod "flink-taskmanager-java-online-76b8c459f-jkb8l" deleted
pod "flink-taskmanager-java-online-76b8c459f-nwls4" deleted
pod "flink-taskmanager-java-online-76b8c459f-t7xxk" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-67bdb8585c-skzz5" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-69996b44-kcnqp" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-6cb9c6cb5-qjfj4" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-6cb9c6cb5-rr9nf" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-77948c97c5-2m4jf" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-77948c97c5-qjgh5" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-b4c456646-qr5wl" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-b4c456646-sshlr" deleted
pod "saas-jcpt-saas-uc-base-tomcat-online-b4c456646-wqnpq" deleted

也可以把Evicted换成OutOfcpu等其它状态使用。

批量加标签:

for node in `kubectl get node | grep node | awk '{print $1}'`; do kubectl label node $node edug=traefik ; done 

通过重定向实现在容器内进行网络调试

在容器内,很多时候ping,telnet的命令都没有,进行网络调试很受限,可通过重定向实现基于tcp/udp协议的软件通讯。

linux 设备里面有个比较特殊的文件:

/dev/[tcp|upd]/host/port 只要读取或者写入这个文件,相当于系统会尝试连接:host 这台机器,对应port端口。如果主机以及端口存在,就建立一个socket 连接。将在,/proc/self/fd目录下面,有对应的文件出现。

[chengmo@centos5 shell]$ cat</dev/tcp/127.0.0.1/22
SSH-2.0-OpenSSH_5.1
#我的机器shell端口是:22
#实际:/dev/tcp根本没有这个目录,这是属于特殊设备
[chengmo@centos5 shell]$ cat</dev/tcp/127.0.0.1/223
-bash: connect: 拒绝连接
-bash: /dev/tcp/127.0.0.1/223: 拒绝连接
#223接口不存在,打开失败

[chengmo@centos5 shell]$ exec 8<>/dev/tcp/127.0.0.1/22
[chengmo@centos5 shell]$ ls -l /proc/self/fd/
总计 0
lrwx------ 1 chengmo chengmo 64 10-21 23:05 0 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:05 1 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:05 2 -> /dev/pts/0
lr-x------ 1 chengmo chengmo 64 10-21 23:05 3 -> /proc/22185/fd
lrwx------ 1 chengmo chengmo 64 10-21 23:05 8 -> socket:[15067661]

#文件描述符8,已经打开一个socket通讯通道,这个是一个可以读写socket通道,因为用:"<>"打开
[chengmo@centos5 shell]$ exec 8>&-
#关闭通道
[chengmo@centos5 shell]$ ls -l /proc/self/fd/
总计 0
lrwx------ 1 chengmo chengmo 64 10-21 23:08 0 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:08 1 -> /dev/pts/0
lrwx------ 1 chengmo chengmo 64 10-21 23:08 2 -> /dev/pts/0
lr-x------ 1 chengmo chengmo 64 10-21 23:08 3 -> /proc/22234/fd

Downloading a URL via /dev/tcp:

exec 5<>/dev/tcp/www.net.cn/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5

通过进程ID找到对应的容器

先使用ps auxw 查看进程的ID,再执行:
docker ps -q | xargs docker inspect --format '{{.State.Pid}}, {{.Name}}' | grep "^%PID%"
其中%PID%是ps查看到的CONTAINER PID.

如果ps auxw取到的进程ID不为CONTAINER PID,通常情况下是由于这个进程不是容器的1号进程造成的。可以通过
pstree -sg <PID>
先找到父ID,再执行:
docker ps -q | xargs docker inspect --format '{{.State.Pid}}, {{.Name}}' | grep "^%PID%"
就可以了。

使用nsenter进入docker容器的命名空间

centos 7 已经自喧nsenter这个命令,可以直接使用,它可以方便的让我们进入docker容器的命名空间。

首先获取容器pid,示例如下:

[root@sh-saas-k8s1-master-dev-01 ~]# docker ps
CONTAINER ID        IMAGE                                                                 COMMAND                  CREATED             STATUS              PORTS               NAMES
f8b1e0b8caa7        nginx                                                                 "nginx -g 'daemon of…"   33 seconds ago      Up 33 seconds       80/tcp              nginx
[root@sh-saas-k8s1-master-dev-01 ~]# pid=$(docker inspect --format "{{ .State.Pid }}" f8b1e0b8caa7)
[root@sh-saas-k8s1-master-dev-01 ~]# echo $pid
16042

然后使用nsenter命令进入:

[root@sh-saas-k8s1-master-dev-01 ~]# nsenter --target $pid --mount --uts --ipc --net --pid
mesg: ttyname failed: No such file or directory
root@f8b1e0b8caa7:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@f8b1e0b8caa7:/# ip a
-bash: ip: command not found
root@f8b1e0b8caa7:/# exit
logout

(更多…)

mac上cue音乐分割以及转换

转换需要用到下面这些软件,通过brew安装:
brew install flac shntool cuetools ffmpeg
比如转换带cue的wav格式音乐:

luohui@MacBookPro:/Volumes/DATA HD/music/APE/李荣浩《李荣浩创作精选》$ shntool split -t "%p-%t" -f lrh.cue
enter input filename(s):
lrh.wav
shntool [split]: warning: discarding initial zero-valued split point
Splitting [lrh.wav] (69:04.36) --> [李荣浩-李白 .wav] (4:34.33) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-作曲家.wav] (3:47.42) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-模特 .wav] (5:07.18) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-太坦白.wav] (4:57.60) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-丑八怪.wav] (4:08.32) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-寂寞不痛.wav] (4:58.62) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-演员和歌手 .wav] (4:16.11) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-老伴 .wav] (3:28.40) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-蓝绿 .wav] (4:20.67) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-拜拜.wav] (5:36.60) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-哎呀.wav] (4:52.41) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-什么都没留.wav] (4:14.47) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-两个人 .wav] (4:50.46) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-有一个姑娘 .wav] (4:45.69) : 100% OK
Splitting [lrh.wav] (69:04.36) --> [李荣浩-都一样 .wav] (5:04.08) : 100% OK

这样就成功的把一个600多M的音乐分割十来个小文件了。

(更多…)

centos linux 7 更新第三方内核

centos linux 7本身的内核为3.10,比较老,很多新的特性都没有,可以考虑使用第三方的比较新的内核。如ELRepo仓库就提供了kernel 4.4.x的长期支持版本及5.0.x的最新版.

启用 ELRepo 仓库:
导入密钥:
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
安装yum源包:
yum install -y https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

安装内核:
yum --enablerepo=elrepo-kernel install kernel-lt -y

配置启动内核,使用最新的就可以了:
egrep ^menuentry /etc/grub2.cfg | cut -f 2 -d \'
grub2-set-default 0

kernel 4.4 对kdump参数有调整,需做如下修改:
vim /etc/default/grub

crashkernel=auto
改成:
crashkernel=128M

可使用以下命令一键修改:
sed -i 's/crashkernel=auto/crashkernel=128M/' /etc/default/grub
否则使用新内核重启后kdump会报错。

生成新的配置文件
grub2-mkconfig -o /boot/grub2/grub.cfg

重启机器:
reboot

重启后可以看看kdump服务是否正常:
systemctl status kdump

以下命令可以测试做一次kernel dump:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

docker 在宿主机上根据进程PID查找归属容器ID

在使用docker时经常出现一台docker主机上跑了多个容器,可能其中一个容器里的进程导致了整个宿主机load很高,其实一条命令就可以找出罪魁祸首

#查找容器ID

docker inspect -f "{{.Id}}" $(docker ps -q) |grep <PID>

#查找k8s pod name

docker inspect -f "{{.Id}} {{.State.Pid}} {{.Config.Hostname}}" $(docker ps -q) |grep <PID>

#如果PID是容器内运行子进程那docker inspect就无法显示了

for i in  `docker ps |grep Up|awk '{print $1}'`;do echo \ &&docker top $i &&echo ID=$i; done |grep -A 10 <PID>

转自:https://www.cnblogs.com/37yan/p/9559308.html

如何获取自己的公网IP

如何获取自己的公网IP呢?

1. dig查google dns

dig TXT +short o-o.myaddr.l.google.com @ns1.google.com
#或
dig +short myip.opendns.com @resolver1.opendns.com

2. curl查IP网站


curl icanhazip.com curl ifconfig.me curl icanhazip.com curl ipecho.net/plain curl ifconfig.co

3. 参考

http://www.chenshake.com/page/2/
https://www.cyberciti.biz/faq/how-to-find-my-public-ip-address-from-command-line-on-a-linux/

通过zabbix监控kubernetes集群

日前写了一个zabbix的监控脚本来监控kubernetes集群,主要用于报警的功能。性能监控还是使用其它方式来实现。

github URL

https://github.com/farmerluo/k8s_zabbix

k8s_zabbix说明

k8s_zabbix实现了使用zabbix监控kubernetes的ingress,hpa,pod状态等功能。

Template Check K8S Cluster Status.xml:zabbix模板,可通过此文件导入到zabbix

check_k8s_status.py:kubernetes的监控脚本

userparameter_k8s.conf:zabbix agent端的配置文件,需要注意脚本的路径

check_k8s_status.py说明

  • 监控的k8s集群配置:
conf.host = "https://10.10.88.20:8443"
conf.api_key['authorization'] = "xxxxxx.xxxxxxx.x-x-xxx-xxx-x"
  • 脚本会监控traefik ingress的访问状态,将对400~599的非正常状态进行报警,需事先将traefik的访问日志通过fluentd或filebeat等导入到elasticsearch集群,脚本将定时通过查询访问日志来监控ingress的访问状态。监控脚本内的配置:
# elasticsearch server config
es_server = [{"host": "10.16.252.50", "port": 9200},
             {"host": "10.16.252.50", "port": 9200},
             {"host": "10.16.252.50", "port": 9200}
             ]
# 索引名
es_index = "logstash-traefik-ingress-lb-*"
# es查询间隔,ms
es_query_duration = 60000

# 状态码报警及阈值配置
# xxx.com为自定域名的例子
status_code_config = {
    'default': {'403': '90', '404': '90', '500': '2', '502': '2', '499': '70', '406': '70', '503': '5',
                '504': '5', '599': '2', 'other': '30', '429': '5', '430': '1'
                },
    'xxx.com': {'403': '100', '404': '100', '500': '70', '502': '70', '499': '100', '406': '80', '503': '70',
                '504': '70', '599': '0', 'other': '60', '429': '5', '430': '1'
                }
}