阿辉的博客

系统 网络 集群 数据库 分布式云计算等 研究

docker 在宿主机上根据进程PID查找归属容器ID

在使用docker时经常出现一台docker主机上跑了多个容器,可能其中一个容器里的进程导致了整个宿主机load很高,其实一条命令就可以找出罪魁祸首

#查找容器ID

docker inspect -f "{{.Id}} {{.State.Pid}} {{.Name}} " $(docker ps -q) |grep <PID>

#查找k8s pod name

docker inspect -f "{{.Id}} {{.State.Pid}} {{.Config.Hostname}}" $(docker ps -q) |grep <PID>

#如果PID是容器内运行子进程那docker inspect就无法显示了

for i in  `docker ps |grep Up|awk '{print $1}'`;do echo \ &&docker top $i &&echo ID=$i; done |grep -A 10 <PID>

转自:https://www.cnblogs.com/37yan/p/9559308.html

如何获取自己的公网IP

如何获取自己的公网IP呢?

1. dig查google dns

dig TXT +short o-o.myaddr.l.google.com @ns1.google.com
#或
dig +short myip.opendns.com @resolver1.opendns.com

2. curl查IP网站


curl icanhazip.com curl ifconfig.me curl icanhazip.com curl ipecho.net/plain curl ifconfig.co

3. 参考

http://www.chenshake.com/page/2/
https://www.cyberciti.biz/faq/how-to-find-my-public-ip-address-from-command-line-on-a-linux/

通过zabbix监控kubernetes集群

日前写了一个zabbix的监控脚本来监控kubernetes集群,主要用于报警的功能。性能监控还是使用其它方式来实现。

github URL

https://github.com/farmerluo/k8s_zabbix

k8s_zabbix说明

k8s_zabbix实现了使用zabbix监控kubernetes的ingress,hpa,pod状态等功能。

Template Check K8S Cluster Status.xml:zabbix模板,可通过此文件导入到zabbix

check_k8s_status.py:kubernetes的监控脚本

userparameter_k8s.conf:zabbix agent端的配置文件,需要注意脚本的路径

check_k8s_status.py说明

  • 监控的k8s集群配置:
conf.host = "https://10.10.88.20:8443"
conf.api_key['authorization'] = "xxxxxx.xxxxxxx.x-x-xxx-xxx-x"
  • 脚本会监控traefik ingress的访问状态,将对400~599的非正常状态进行报警,需事先将traefik的访问日志通过fluentd或filebeat等导入到elasticsearch集群,脚本将定时通过查询访问日志来监控ingress的访问状态。监控脚本内的配置:
# elasticsearch server config
es_server = [{"host": "10.16.252.50", "port": 9200},
             {"host": "10.16.252.50", "port": 9200},
             {"host": "10.16.252.50", "port": 9200}
             ]
# 索引名
es_index = "logstash-traefik-ingress-lb-*"
# es查询间隔,ms
es_query_duration = 60000

# 状态码报警及阈值配置
# xxx.com为自定域名的例子
status_code_config = {
    'default': {'403': '90', '404': '90', '500': '2', '502': '2', '499': '70', '406': '70', '503': '5',
                '504': '5', '599': '2', 'other': '30', '429': '5', '430': '1'
                },
    'xxx.com': {'403': '100', '404': '100', '500': '70', '502': '70', '499': '100', '406': '80', '503': '70',
                '504': '70', '599': '0', 'other': '60', '429': '5', '430': '1'
                }
}

定时清理elasticsearch集群的索引脚本

elasticsearch集群容量总是有限的,所以必需要对超过一定时间的索引进行删除和清理。
先说明下我们索引的命令方式:xxx-xxx-xxx-yyyy.mm.dd
yyyy.mm.dd为日期。

清理脚本如下:

#!/bin/bash
###################################
#删除早于天的ES集群的索引
###################################
# crontab -e
#clean es index
#* 0 * * * sh /data/shell/clean_es_indes.sh 

#索引保存天数
days=30

#ES cluster url
es_cluster_url="http://127.0.0.1:9200"

function delete_indices() {
    comp_date=`date -d "$days day ago" +"%Y-%m-%d"`
    date1="$1 00:00:00"
    date2="$comp_date 00:00:00"

    t1=`date -d "$date1" +%s` 
    t2=`date -d "$date2" +%s` 

    if [ $t1 -le $t2 ]; then
        echo "$1时间早于$comp_date,进行索引删除"
        #转换一下格式,将类似2017-10-01格式转化为2017.10.01
        format_date=`echo $1| sed 's/-/\./g'`
        echo "curl -XDELETE $es_cluster_url/*$format_date"
        curl -s -XDELETE "$es_cluster_url/*$format_date"
    fi
}

curl -s -XGET "$es_cluster_url/_cat/indices" | awk -F" " '{print $3}' | awk -F"-" '{print $NF}' | egrep "[0-9]*\.[0-9]*\.[0-9]*" | sort | uniq  | sed 's/\./-/g' | while read LINE
do
    #调用索引删除函数
    delete_indices $LINE
done

另一个脚本,适用于日期时间格式为xxxx-2019-10-08的索引:

#!/bin/bash
searchIndex=console-log
elastic_url=10.16.16.36
elastic_port=9200
save_days=7

date2stamp () {
    date --utc --date "$1" +%s
}

dateDiff (){
    case $1 in
        -s)   sec=1;      shift;;
        -m)   sec=60;     shift;;
        -h)   sec=3600;   shift;;
        -d)   sec=86400;  shift;;
        *)    sec=86400;;
    esac
    dte1=$(date2stamp $1)
    dte2=$(date2stamp $2)
    diffSec=$((dte2-dte1))
    if ((diffSec < 0)); then abs=-1; else abs=1; fi
    echo $((diffSec/sec*abs))
}

for index in $(curl -s "${elastic_url}:${elastic_port}/_cat/indices?v" | grep -E " ${searchIndex}-20[0-9][0-9]-[0-1][0-9]-[0-3][0-9]" | awk '{ print $3 }'); do
  date=$(echo ${index: -10} | sed 's/\./-/g')
  cond=$(date +%Y-%m-%d)
  diff=$(dateDiff -d $date $cond)
  #echo -n "${index} (${diff})"
  if [ $diff -gt $save_days ]; then
    echo "curl -XDELETE \"${elastic_url}:${elastic_port}/${index}?pretty\""
    curl -XDELETE "${elastic_url}:${elastic_port}/${index}?pretty"
  else
    echo "skip delete index: ${index}"
  fi
done

参考:
https://blog.csdn.net/felix_yujing/article/details/78207667

weave scope的安装配置以及traefik ingress basic auth的实现


Weave Scope 能自动生成应用程序的映射,使你能够直观地了解、监控并控制你的微服务容器应用。

1. Weave Scope的简介

1.1. 实时了解Docker容器状态

查看容器基础设施的概况,或者专注于一个特殊的微服务。从而轻松发现并纠正问题,确保你的容器化应用的稳定与性能。

(更多…)

coredns解析结果异常的问题

我发现k8s内coredns的解析结果有点问题。经常解析不出来。

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Non-authoritative answer:

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

(更多…)

fluentd采集数据到kafka时的版本问题

在使用fluentd采集数据到kafka时,一直不通,碰到了很多报错。
fluentd版本为:1.2.5
fluent-plugin-kafka版本为:0.7.8
kafka版本为:0.9
最开始碰到了这个报错:

2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:06 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:06 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=2 next_retry_seconds=2018-09-05 01:42:06 +0000 chunk="57515e0ef787da843836cc864f9d1581" error_class=Kafka::UnknownTopicOrPartition error="unknown topic "
  2018-09-05 01:42:06 +0000 [warn]: plugin/output.rb:1157:rescue in try_flush: suppressed same stacktrace
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: 61 messages send.
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:09 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=3 next_retry_seconds=2018-09-05 01:42:09 +0000 

这是因为没有配置default_topic,使用下面的配置指定topic就可以了。
(更多…)

nginx配置维护页面

经常性的,在版本上线时,我们需要配置一个维护页面,以便让用户看到。而同时自己还需要能访问。

也就是说在维护的同时,还需要指定的IP能访问。

以下就是一个nginx配置维护页面的例子:

其中:

/weihu/是维护页面的URL,应该在/data/www下建一个weihu的目录,把维护页面index.html放到这个目录内.

103.214.84.224|101.231.194.4|180.168.251.235为允许访问的IP地址。

最终效果:当用户访问真实的URL时,会显示跳转至/weihu/

(更多…)

docker swarm 集群及多主机overlay网络测试


docker的swarm集群已经支持多主机的overlay网络,而且目前测试下来发现安装及配置非常方便,跟k8s相比,安装及配置要轻松好多。

1. 测试环境

使用2台虚拟机来测试,操作系统为ubuntu 14.04.04,系统自带内核为4.2,注意overlay需要3.16以上的内核版本。

主机名 IP 备注
ubuntu1 192.168.11.21 manger
ubuntu2 192.168.11.22 worker

2. 安装docker

在所有主机上安装docker,使用官方APT源。

#删除系统自带的docker
apt-get remove docker docker-engine docker.io

#安装内核模块
apt-get install \
    linux-image-extra-$(uname -r) \
    linux-image-extra-virtual

#下载安装Docker APT库源证书
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
apt-key fingerprint 0EBFCD88

#增加APT库,使用阿里云镜像
add-apt-repository \
   "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu/ \
   $(lsb_release -cs) \
   stable"

#安装docker
apt-get update
apt-get install docker-ce

(更多…)

关于nginx配置文件测试的问题

最近在配置nginx时,发现了一个问题,是关于nginx配置文件测试的。

如下的nginx配置,在upstream没有配置的情况下:

    location /frontend-gateway/ {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        set $globalTicket $pid--$remote_addr-$request_length-$connection;
        proxy_set_header globalTicket $globalTicket;

        proxy_pass http://o2o-frontend-gateway/;
    }

我们通过nginx -t测试可以发现,是可以测试通过的。

[root@sh-o2o-nginx-router-online-04 vhost.d]# /usr/local/nginx/sbin/nginx -t
the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
configuration file /usr/local/nginx/conf/nginx.conf test is successful

照说要报错才对。初步怀疑是把o2o-frontend-gateway当成一个域名了?

(更多…)