阿辉的博客

系统 网络 集群 数据库 分布式云计算等 研究

如何获取自己的公网IP

如何获取自己的公网IP呢?

1. dig查google dns

dig TXT +short o-o.myaddr.l.google.com @ns1.google.com
#或
dig +short myip.opendns.com @resolver1.opendns.com

2. curl查IP网站


curl icanhazip.com curl ifconfig.me curl icanhazip.com curl ipecho.net/plain curl ifconfig.co

3. 参考

http://www.chenshake.com/page/2/
https://www.cyberciti.biz/faq/how-to-find-my-public-ip-address-from-command-line-on-a-linux/

通过zabbix监控kubernetes集群

日前写了一个zabbix的监控脚本来监控kubernetes集群,主要用于报警的功能。性能监控还是使用其它方式来实现。

github URL

https://github.com/farmerluo/k8s_zabbix

k8s_zabbix说明

k8s_zabbix实现了使用zabbix监控kubernetes的ingress,hpa,pod状态等功能。

Template Check K8S Cluster Status.xml:zabbix模板,可通过此文件导入到zabbix

check_k8s_status.py:kubernetes的监控脚本

userparameter_k8s.conf:zabbix agent端的配置文件,需要注意脚本的路径

check_k8s_status.py说明

  • 监控的k8s集群配置:
conf.host = "https://10.10.88.20:8443"
conf.api_key['authorization'] = "xxxxxx.xxxxxxx.x-x-xxx-xxx-x"
  • 脚本会监控traefik ingress的访问状态,将对400~599的非正常状态进行报警,需事先将traefik的访问日志通过fluentd或filebeat等导入到elasticsearch集群,脚本将定时通过查询访问日志来监控ingress的访问状态。监控脚本内的配置:
# elasticsearch server config
es_server = [{"host": "10.16.252.50", "port": 9200},
             {"host": "10.16.252.50", "port": 9200},
             {"host": "10.16.252.50", "port": 9200}
             ]
# 索引名
es_index = "logstash-traefik-ingress-lb-*"
# es查询间隔,ms
es_query_duration = 60000

# 状态码报警及阈值配置
# xxx.com为自定域名的例子
status_code_config = {
    'default': {'403': '90', '404': '90', '500': '2', '502': '2', '499': '70', '406': '70', '503': '5',
                '504': '5', '599': '2', 'other': '30', '429': '5', '430': '1'
                },
    'xxx.com': {'403': '100', '404': '100', '500': '70', '502': '70', '499': '100', '406': '80', '503': '70',
                '504': '70', '599': '0', 'other': '60', '429': '5', '430': '1'
                }
}

定时清理elasticsearch集群的索引脚本

elasticsearch集群容量总是有限的,所以必需要对超过一定时间的索引进行删除和清理。
先说明下我们索引的命令方式:xxx-xxx-xxx-yyyy.mm.dd
yyyy.mm.dd为日期。

清理脚本如下:

#!/bin/bash
###################################
#删除早于天的ES集群的索引
###################################
# crontab -e
#clean es index
#* 0 * * * sh /data/shell/clean_es_indes.sh 

#索引保存天数
days=30

#ES cluster url
es_cluster_url="http://127.0.0.1:9200"

function delete_indices() {
    comp_date=`date -d "$days day ago" +"%Y-%m-%d"`
    date1="$1 00:00:00"
    date2="$comp_date 00:00:00"

    t1=`date -d "$date1" +%s` 
    t2=`date -d "$date2" +%s` 

    if [ $t1 -le $t2 ]; then
        echo "$1时间早于$comp_date,进行索引删除"
        #转换一下格式,将类似2017-10-01格式转化为2017.10.01
        format_date=`echo $1| sed 's/-/\./g'`
        echo "curl -XDELETE $es_cluster_url/*$format_date"
        curl -s -XDELETE "$es_cluster_url/*$format_date"
    fi
}

curl -s -XGET "$es_cluster_url/_cat/indices" | awk -F" " '{print $3}' | awk -F"-" '{print $NF}' | egrep "[0-9]*\.[0-9]*\.[0-9]*" | sort | uniq  | sed 's/\./-/g' | while read LINE
do
    #调用索引删除函数
    delete_indices $LINE
done

参考:
https://blog.csdn.net/felix_yujing/article/details/78207667

weave scope的安装配置以及traefik ingress basic auth的实现


Weave Scope 能自动生成应用程序的映射,使你能够直观地了解、监控并控制你的微服务容器应用。

1. Weave Scope的简介

1.1. 实时了解Docker容器状态

查看容器基础设施的概况,或者专注于一个特殊的微服务。从而轻松发现并纠正问题,确保你的容器化应用的稳定与性能。

(更多…)

coredns解析结果异常的问题

我发现k8s内coredns的解析结果有点问题。经常解析不出来。

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Non-authoritative answer:

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

(更多…)

fluentd采集数据到kafka时的版本问题

在使用fluentd采集数据到kafka时,一直不通,碰到了很多报错。
fluentd版本为:1.2.5
fluent-plugin-kafka版本为:0.7.8
kafka版本为:0.9
最开始碰到了这个报错:

2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:06 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:06 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=2 next_retry_seconds=2018-09-05 01:42:06 +0000 chunk="57515e0ef787da843836cc864f9d1581" error_class=Kafka::UnknownTopicOrPartition error="unknown topic "
  2018-09-05 01:42:06 +0000 [warn]: plugin/output.rb:1157:rescue in try_flush: suppressed same stacktrace
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: 61 messages send.
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:09 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=3 next_retry_seconds=2018-09-05 01:42:09 +0000 

这是因为没有配置default_topic,使用下面的配置指定topic就可以了。
(更多…)

nginx配置维护页面

经常性的,在版本上线时,我们需要配置一个维护页面,以便让用户看到。而同时自己还需要能访问。

也就是说在维护的同时,还需要指定的IP能访问。

以下就是一个nginx配置维护页面的例子:

其中:

/weihu/是维护页面的URL,应该在/data/www下建一个weihu的目录,把维护页面index.html放到这个目录内.

103.214.84.224|101.231.194.4|180.168.251.235为允许访问的IP地址。

最终效果:当用户访问真实的URL时,会显示跳转至/weihu/

(更多…)

docker swarm 集群及多主机overlay网络测试


docker的swarm集群已经支持多主机的overlay网络,而且目前测试下来发现安装及配置非常方便,跟k8s相比,安装及配置要轻松好多。

1. 测试环境

使用2台虚拟机来测试,操作系统为ubuntu 14.04.04,系统自带内核为4.2,注意overlay需要3.16以上的内核版本。

主机名 IP 备注
ubuntu1 192.168.11.21 manger
ubuntu2 192.168.11.22 worker

2. 安装docker

在所有主机上安装docker,使用官方APT源。

#删除系统自带的docker
apt-get remove docker docker-engine docker.io

#安装内核模块
apt-get install \
    linux-image-extra-$(uname -r) \
    linux-image-extra-virtual

#下载安装Docker APT库源证书
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
apt-key fingerprint 0EBFCD88

#增加APT库,使用阿里云镜像
add-apt-repository \
   "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu/ \
   $(lsb_release -cs) \
   stable"

#安装docker
apt-get update
apt-get install docker-ce

(更多…)

关于nginx配置文件测试的问题

最近在配置nginx时,发现了一个问题,是关于nginx配置文件测试的。

如下的nginx配置,在upstream没有配置的情况下:

    location /frontend-gateway/ {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        set $globalTicket $pid--$remote_addr-$request_length-$connection;
        proxy_set_header globalTicket $globalTicket;

        proxy_pass http://o2o-frontend-gateway/;
    }

我们通过nginx -t测试可以发现,是可以测试通过的。

[root@sh-o2o-nginx-router-online-04 vhost.d]# /usr/local/nginx/sbin/nginx -t
the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok
configuration file /usr/local/nginx/conf/nginx.conf test is successful

照说要报错才对。初步怀疑是把o2o-frontend-gateway当成一个域名了?

(更多…)

腾讯云环境K8S安装及配置测试

1. k8s集群系统规划

1.1. kubernetes 1.10的依赖

k8s V1.10对一些相关的软件包,如etcd,docker并不是全版本支持或全版本测试,建议的版本如下:

  • docker: 1.11.2 to 1.13.1 and 17.03.x
  • etcd: 3.1.12
  • 全部信息如下:

参考:External Dependencies
* The supported etcd server version is 3.1.12, as compared to 3.0.17 in v1.9 (#60988)
* The validated docker versions are the same as for v1.9: 1.11.2 to 1.13.1 and 17.03.x (ref)
* The Go version is go1.9.3, as compared to go1.9.2 in v1.9. (#59012)
* The minimum supported go is the same as for v1.9: go1.9.1. (#55301)
* CNI is the same as v1.9: v0.6.0 (#51250)
* CSI is updated to 0.2.0 as compared to 0.1.0 in v1.9. (#60736)
* The dashboard add-on has been updated to v1.8.3, as compared to 1.8.0 in v1.9. (#57326)
* Heapster has is the same as v1.9: v1.5.0. It will be upgraded in v1.11. (ref)
* Cluster Autoscaler has been updated to v1.2.0. (#60842, @mwielgus)
* Updates kube-dns to v1.14.8 (#57918, @rramkumar1)
* Influxdb is unchanged from v1.9: v1.3.3 (#53319)
* Grafana is unchanged from v1.9: v4.4.3 (#53319)
* CAdvisor is v0.29.1 (#60867)
* fluentd-gcp-scaler is v0.3.0 (#61269)
* Updated fluentd in fluentd-es-image to fluentd v1.1.0 (#58525, @monotek)
* fluentd-elasticsearch is v2.0.4 (#58525)
* Updated fluentd-gcp to v3.0.0. (#60722)
* Ingress glbc is v1.0.0 (#61302)
* OIDC authentication is coreos/go-oidc v2 (#58544)
* Updated fluentd-gcp updated to v2.0.11. (#56927, @x13n)
* Calico has been updated to v2.6.7 (#59130, @caseydavenport)
*

1.2 测试服务器准备及环境规划

服务器名 IP 功 能 安装服务
sh-saas-cvmk8s-master-01 10.12.96.3 master master,etcd
sh-saas-cvmk8s-master-02 10.12.96.5 master master,etcd
sh-saas-cvmk8s-master-03 10.12.96.13 master master,etcd
sh-saas-cvmk8s-node-01 10.12.96.2 node node
sh-saas-cvmk8s-node-02 10.12.96.4 node node
sh-saas-cvmk8s-node-03 10.12.96.6 node node
bs-ops-test-docker-dev-04 172.21.248.242 私有镜像仓库 harbor
VIP 10.12.96.100 master vip netmask:255.255.255.0

netmask都为:255.255.255.0

所有的测试服务器安装centos linux 7.4最新版本.

VIP:10.12.96.100只是用于keepalived的测试,实际本文使用的是腾讯云LB+haproxy的模式,使用的腾讯云LB VIP为:10.12.16.101

容器网段:10.254.0.0/16
容器网段需要避免这些冲突:

  • 同vpc的其它集群的集群网络cidr
  • 所在vpc的cidr
  • 所在vpc的子网路由的cidr
  • route-ctl list 能看到的所有route table 的 cidr
    容器网段不要在VPC内创建,也要不在VPC的路由表内,使用一个VPC内不存在的网络。

k8s service cluster网络:10.254.255.0/24

(更多…)