我发现k8s内coredns的解析结果有点问题。经常解析不出来。

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Non-authoritative answer:

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

*** Can't find kubernetes-dashboard.kube-system.svc.cluster.local: No answer

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local
Server:         10.253.255.10
Address:        10.253.255.10:53

Name:   kubernetes-dashboard.kube-system.svc.cluster.local
Address: 10.253.255.40

继续阅读

在使用fluentd采集数据到kafka时,一直不通,碰到了很多报错。
fluentd版本为:1.2.5
fluent-plugin-kafka版本为:0.7.8
kafka版本为:0.9
最开始碰到了这个报错:

2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:06 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:06 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:06 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=2 next_retry_seconds=2018-09-05 01:42:06 +0000 chunk="57515e0ef787da843836cc864f9d1581" error_class=Kafka::UnknownTopicOrPartition error="unknown topic "
  2018-09-05 01:42:06 +0000 [warn]: plugin/output.rb:1157:rescue in try_flush: suppressed same stacktrace
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: 61 messages send.
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Send exception occurred: unknown topic 
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: Exception Backtrace : /var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/protocol/metadata_response.rb:141:in `partitions_for'
/var/lib/gems/2.3.0/gems/ruby-kafka-0.6.8/lib/kafka/cluster.rb:155:in `partitions_for'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:190:in `assign_partitions!'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:153:in `block in deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `loop'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:148:in `deliver_messages_with_retries'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/kafka_producer_ext.rb:102:in `deliver_messages'
/var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.6/lib/fluent/plugin/out_kafka2.rb:220:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1110:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:1389:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin/output.rb:444:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.2.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-09-05 01:42:09 +0000 [info]: fluent/log.rb:322:info: initialized kafka producer: fluentd
2018-09-05 01:42:09 +0000 [debug]: fluent/log.rb:302:debug: taking back chunk for errors. chunk="57515e0ef787da843836cc864f9d1581"
2018-09-05 01:42:09 +0000 [warn]: fluent/log.rb:342:warn: failed to flush the buffer. retry_time=3 next_retry_seconds=2018-09-05 01:42:09 +0000 

这是因为没有配置default_topic,使用下面的配置指定topic就可以了。
继续阅读


docker的swarm集群已经支持多主机的overlay网络,而且目前测试下来发现安装及配置非常方便,跟k8s相比,安装及配置要轻松好多。

1. 测试环境

使用2台虚拟机来测试,操作系统为ubuntu 14.04.04,系统自带内核为4.2,注意overlay需要3.16以上的内核版本。

主机名 IP 备注
ubuntu1 192.168.11.21 manger
ubuntu2 192.168.11.22 worker

2. 安装docker

在所有主机上安装docker,使用官方APT源。

#删除系统自带的docker
apt-get remove docker docker-engine docker.io

#安装内核模块
apt-get install \
    linux-image-extra-$(uname -r) \
    linux-image-extra-virtual

#下载安装Docker APT库源证书
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
apt-key fingerprint 0EBFCD88

#增加APT库,使用阿里云镜像
add-apt-repository \
   "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu/ \
   $(lsb_release -cs) \
   stable"

#安装docker
apt-get update
apt-get install docker-ce

继续阅读

1. k8s集群系统规划

1.1. kubernetes 1.10的依赖

k8s V1.10对一些相关的软件包,如etcd,docker并不是全版本支持或全版本测试,建议的版本如下:

  • docker: 1.11.2 to 1.13.1 and 17.03.x
  • etcd: 3.1.12
  • 全部信息如下:

参考:External Dependencies
* The supported etcd server version is 3.1.12, as compared to 3.0.17 in v1.9 (#60988)
* The validated docker versions are the same as for v1.9: 1.11.2 to 1.13.1 and 17.03.x (ref)
* The Go version is go1.9.3, as compared to go1.9.2 in v1.9. (#59012)
* The minimum supported go is the same as for v1.9: go1.9.1. (#55301)
* CNI is the same as v1.9: v0.6.0 (#51250)
* CSI is updated to 0.2.0 as compared to 0.1.0 in v1.9. (#60736)
* The dashboard add-on has been updated to v1.8.3, as compared to 1.8.0 in v1.9. (#57326)
* Heapster has is the same as v1.9: v1.5.0. It will be upgraded in v1.11. (ref)
* Cluster Autoscaler has been updated to v1.2.0. (#60842, @mwielgus)
* Updates kube-dns to v1.14.8 (#57918, @rramkumar1)
* Influxdb is unchanged from v1.9: v1.3.3 (#53319)
* Grafana is unchanged from v1.9: v4.4.3 (#53319)
* CAdvisor is v0.29.1 (#60867)
* fluentd-gcp-scaler is v0.3.0 (#61269)
* Updated fluentd in fluentd-es-image to fluentd v1.1.0 (#58525, @monotek)
* fluentd-elasticsearch is v2.0.4 (#58525)
* Updated fluentd-gcp to v3.0.0. (#60722)
* Ingress glbc is v1.0.0 (#61302)
* OIDC authentication is coreos/go-oidc v2 (#58544)
* Updated fluentd-gcp updated to v2.0.11. (#56927, @x13n)
* Calico has been updated to v2.6.7 (#59130, @caseydavenport)
*

1.2 测试服务器准备及环境规划

服务器名 IP 功 能 安装服务
sh-saas-cvmk8s-master-01 10.12.96.3 master master,etcd
sh-saas-cvmk8s-master-02 10.12.96.5 master master,etcd
sh-saas-cvmk8s-master-03 10.12.96.13 master master,etcd
sh-saas-cvmk8s-node-01 10.12.96.2 node node
sh-saas-cvmk8s-node-02 10.12.96.4 node node
sh-saas-cvmk8s-node-03 10.12.96.6 node node
bs-ops-test-docker-dev-04 172.21.248.242 私有镜像仓库 harbor
VIP 10.12.96.100 master vip netmask:255.255.255.0

netmask都为:255.255.255.0

所有的测试服务器安装centos linux 7.4最新版本.

VIP:10.12.96.100只是用于keepalived的测试,实际本文使用的是腾讯云LB+haproxy的模式,使用的腾讯云LB VIP为:10.12.16.101

容器网段:10.254.0.0/16
容器网段需要避免这些冲突:

  • 同vpc的其它集群的集群网络cidr
  • 所在vpc的cidr
  • 所在vpc的子网路由的cidr
  • route-ctl list 能看到的所有route table 的 cidr
    容器网段不要在VPC内创建,也要不在VPC的路由表内,使用一个VPC内不存在的网络。

k8s service cluster网络:10.254.255.0/24

继续阅读


前文已经安装好了一套kubernetes 1.10,下面我们来进行日常使用测试

1. 创建部署及服务

编辑一个yaml文件:

vim nginx.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-router
  namespace: test
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx-router
    spec:
      containers:
      - name: nginx-router
        image: 172.21.248.242/base/nginx
        ports:
        - containerPort: 80

---
kind: Service
apiVersion: v1
metadata:
  name: nginx-router
  namespace: test
spec:
  selector:
    app: nginx-router
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nginx-router-ingress
  namespace: test
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - host: "nginx.k8s.dev.huilog.com"
    http:
      paths:
      - backend:
          serviceName: nginx-router
          servicePort: 80

# 创建部署及服务:
[root@bs-ops-test-docker-dev-01 dev]# kubectl create -f nginx.yaml 
deployment.extensions "nginx-router" created
service "nginx-router" created
ingress.extensions "nginx-router-ingress" created
[root@bs-ops-test-docker-dev-01 dev]#

继续阅读

Traefik是一款开源的反向代理与负载均衡工具。它最大的优点是能够与常见的微服务系统直接整合,可以实现自动化动态配置。目前支持Docker, Swarm, Mesos/Marathon, Mesos, Kubernetes, Consul, Etcd, Zookeeper, BoltDB, Rest API等等后端模型。
以下是架构图:

需要指出的是,ingress-controllers其实是kubernetes的一部分,ingress就是从kubernetes集群外访问集群的入口,将用户的URL请求转发到不同的service上。Ingress相当于nginx、apache等负载均衡方向代理服务器,其中还包括规则定义,即URL的路由信息,路由信息得的刷新由Ingress controller来提供。

Ingress Controller 实质上可以理解为是个监视器,Ingress Controller 通过不断地跟 kubernetes API 打交道,实时的感知后端 service、pod 等变化,比如新增和减少 pod,service 增加与减少等;当得到这些变化信息后,Ingress Controller 再结合下文的 Ingress 生成配置,然后更新反向代理负载均衡器,并刷新其配置,达到服务发现的作用。
继续阅读

1. k8s集群系统规划

1.1. kubernetes 1.10的依赖

k8s V1.10对一些相关的软件包,如etcd,docker并不是全版本支持或全版本测试,建议的版本如下:
– docker: 1.11.2 to 1.13.1 and 17.03.x
– etcd: 3.1.12
– 全部信息如下:

参考:External Dependencies
* The supported etcd server version is 3.1.12, as compared to 3.0.17 in v1.9 (#60988)
* The validated docker versions are the same as for v1.9: 1.11.2 to 1.13.1 and 17.03.x (ref)
* The Go version is go1.9.3, as compared to go1.9.2 in v1.9. (#59012)
* The minimum supported go is the same as for v1.9: go1.9.1. (#55301)
* CNI is the same as v1.9: v0.6.0 (#51250)
* CSI is updated to 0.2.0 as compared to 0.1.0 in v1.9. (#60736)
* The dashboard add-on has been updated to v1.8.3, as compared to 1.8.0 in v1.9. (#57326)
* Heapster has is the same as v1.9: v1.5.0. It will be upgraded in v1.11. (ref)
* Cluster Autoscaler has been updated to v1.2.0. (#60842, @mwielgus)
* Updates kube-dns to v1.14.8 (#57918, @rramkumar1)
* Influxdb is unchanged from v1.9: v1.3.3 (#53319)
* Grafana is unchanged from v1.9: v4.4.3 (#53319)
* CAdvisor is v0.29.1 (#60867)
* fluentd-gcp-scaler is v0.3.0 (#61269)
* Updated fluentd in fluentd-es-image to fluentd v1.1.0 (#58525, @monotek)
* fluentd-elasticsearch is v2.0.4 (#58525)
* Updated fluentd-gcp to v3.0.0. (#60722)
* Ingress glbc is v1.0.0 (#61302)
* OIDC authentication is coreos/go-oidc v2 (#58544)
* Updated fluentd-gcp updated to v2.0.11. (#56927, @x13n)
* Calico has been updated to v2.6.7 (#59130, @caseydavenport)

继续阅读

在K8S 的ingress上配置HTTP认证的方法如下:

1 . 使用htpasswd创建一个auth文件:

htpasswd -c ./auth myusername
cat auth
myusername:$apr1$78Jyn/1K$ERHKVRPPlzAX8eBtLuvRZ0

2. 创建一个K8S的secret:

kubectl create secret generic mysecret --from-file auth --namespace=monitoring 
kubectl --namespace=monitoring get secret mysecret 
NAME      TYPE    DATA    AGE 
mysecret Opaque   1      106d

3. 通过以下参数将创建的secret与ingress关联起来:

  • ingress.kubernetes.io/auth-type: "basic"
  • ingress.kubernetes.io/auth-secret: "mysecret"
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
 name: prometheus-dashboard
 namespace: monitoring
 annotations:
   kubernetes.io/ingress.class: traefik
   ingress.kubernetes.io/auth-type: "basic"
   ingress.kubernetes.io/auth-secret: "mysecret"
spec:
 rules:
 - host: dashboard.prometheus.example.com
   http:
     paths:
     - backend:
         serviceName: prometheus
         servicePort: 9090

最后创建就可以了:

kubectl create -f prometheus-ingress.yaml -n monitoring

参考:

https://docs.traefik.io/user-guide/kubernetes/

https://docs.traefik.io/configuration/backends/kubernetes/

总体来说相当于一个普通硬盘。每个项目都测了两次,可以看到有些项目波动很大。

目前这个性能做数据库的磁盘可能有问题,那就只能是多台DB了。不知道如果有4块数据盘做Raid0后IO会不会有提升,理论上应该是有,目前我是申请试用机器测试的,没条件测试多个磁盘做raid的性能。

测试图如下:

继续阅读

AWS EC2上配置ftp服务器有它特殊的地方:

1. 理论上来说,EC2的机器是在内网,机器上只有内网IP.外网IP是映射到内网IP上的。

2. 有安全组,相当于带了一个防火墙。

一般的vsftpd配置文件是下面这样的:

anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=022
dirmessage_enable=YES
xferlog_enable=YES
connect_from_port_20=YES
xferlog_file=/var/log/vsftpd.log
xferlog_std_format=YES

use_localtime=YES
chroot_local_user=YES
chroot_list_enable=YES
chroot_list_file=/etc/vsftpd/chroot_list
listen=YES

pam_service_name=vsftpd
userlist_enable=YES
tcp_wrappers=YES
pasv_min_port=2000
pasv_max_port=3000
继续阅读