Zabbix添加自定义监控项

在Zabbix的监控系统中通常是由Zabbix Server与Zabbix Agent一起配合实现监控。在Zabbix Agent内置了很多监控基础的监控项,参见https://www.zabbix.com/documentation/2.0/manual/config/items/itemtypes/zabbix_agent。这些监控项都是CPU, 文件系统, 网络,磁盘等基础的监控项。

对于自己开发服务的监控,Zabbix提供了良好框架为用户实现监控和报警。下面将以为MySQL添加监控为例,介绍如何添加自定义监控。

实验环境

localhost

角色:Zabbix Agent, Zabbix Server, MySQL, 面板

第一步,监控规划

在创建监控项之前要尽量考虑清楚要监控什么,怎么监控,监控数据如何存储,监控数据如何展现,如何处理报警等。要进行监控的系统规划需要对Zabbix很了解,这里只是提出监控的需求。

  • 需求一:监控MySQL的状态,当状态发生异常,发出报警;
  • 需求二:监控MySQL的操作,并用图表展现;

第二步,使用自定义脚本监控扩展Agent

Zabbix Server与Agent之间监控数据的采集主要是通过Zabbix Server主动向Agent询问某个Key的值,Agent会根据Key去调用相应的函数去获取这个值并返回给Server端。Zabbix 2.0.3的Agent本并没有内置MySQL的监控功能(但是Server端提供了相应的Template配置),所以我们需要使用Zabbix的User Parameters功能,为MySQL添加监控脚本。

对于需求一,我们采用mysqladmin这个工具来实现,命令如下:

$ mysqladmin -uroot -ppasswd ping
mysqld is alive

如果MySQL状态正常,会显示mysqld is alive,否则会提示连接不上。对于服务器端,mysqld is alive这样的句子不好理解,服务器端最好只接收1和0,1表示服务可用,0表示服务不可用。那么再改进一下这个命令,如下:

$ mysqladmin -uroot -ppasswd ping | grep -c alive

用户名和密码放在命令中对于以后的维护不好,所以我们在/var/lib/zbbix下创建一个包含MySQL用户名和密码的配置文件“.my.cnf”,如下

[client]
user=root
host=localhost
password=password

Read More

Zabbix简介

Zabbix是一个开源的企业级的分布式监控项目,使用GPL协议。Zabbix没有企业版本和社区版本之分,主导Zabbix的商业公司主要通过技术支持挣钱。

Zabbix实现了如下图的框架:

Read More

OpenStack Ceilometer简介

Ceilometer项目创建时最初的目的是实现一个能为计费系统采集数据的框架。在G版的开发中,社区已经更新了他们的目标,新目标是希望Ceilometer成为OpenStack里数据采集(监控数据、计费数据)的唯一基础设施,采集到的数据提供给监控、计费、面板等项目使用。

Project Goal

For Grizzly, the new objective is The project aims to become the infrastructure to collect measurements within OpenStack so that no two agents would need to be written to collect the same data. It’s primary targets are monitoring and metering, but the framework should be easily expandable to collect for other needs. To that effect, Ceilometer should be able to share collected data with a variety of consumers.

In the 0.1 (folsom) release its goal was just to deliver a unique point of contact for billing systems to aquire all meters they need to establish customer billing, across all current and future OpenStack core components.

Wiki地址:https://wiki.openstack.org/wiki/Ceilometer
代码地址:https://github.com/openstack/ceilometer
文档地址:http://docs.openstack.org/developer/ceilometer/

 

社区现状

目前Ceilometer项目有11000+ lines代码,16位贡献者,最近的活跃贡献者有7位。社区的Roadmap如下:

  • v1 delivered with Folsom with all functions required to collect base metering info and provide standard API access
  • v2 delivered with G as an incubated project with (subject to variation)
    End-User API access to own metering information
    Integration of information summary as an Horizon plugin
    New agents for other OpenStack components (Quantum Engines? Heat? etc…)
    Multi publisher to handle other usage for data collection
    Individual frequency per meter
  • Move to core for H

Ceilometer架构介绍

Ceilometer项目主要由Agent,Collector,DataStore,API和消息队列组成。

Agent

Agent的主要职责是周期性的从它管理的Plugin中轮询,触发查询,Plugin中有具体获取数据的逻辑。Ceilometer中的Agent分为Central Agent和Compute Agent。
Central Agent负责管理除了Compute(Nova)之外所有的Plugin,例如Swift,Cinder的Plugin。这些Plugin通过RPC调用相关服务的API并获取数据,然后将数据publish到Message Queue。Central Agent作为一个中心的数据采集调度器,之需要部署一个即可。
Compute Agent负责Compute节点的数据采集,在每一个Compute节点都需要部署一个Compute Agent。它一方主要负责周期性的采集Compute相关的数据并发布到MQ。
目前所规划的监控指标:http://docs.openstack.org/developer/ceilometer/measurements.html

Plugin

Ceilometer实现的Plugin框架依赖setuptools的Dynamic Discovery of Services and Plugins实现。这是Ceilometer能进行扩展的基础。Ceilometer中有四种类型的Plugin:Poller,Publisher,Notification和Transformer。

  • Poller主要负责被Agent调用去查询数据,返回Counter类型的结果给Agent框架;
  • Notification负责在MQ中监听相关topic的消息(虚拟机创建等),并把他转换成Counter类型的结果给Agent框架。
  • Transformer负责转换Counter(目前在代码中还没有发现具体用li)
  • Publisher负责将Agent框架中Counter类型的结果转换成消息(包括签名),并将消息发送到MQ;

Agent的Pipeline定义了这些插件之间的数据流。Agent的Plugin框架就向一个流水线,每个Plugin就像流水线上的工人。

Collector

Collector负责监听消息队列,将Publisher发布的消息(Meter Message)存储到DataStore。

DataStore

由MongoDB实现。

API

负责为其它项目提供数据,例如计费、面板等。

Hadoop单机模拟集群安装

最近业余时间折腾一下Hadoop,记录一下这个过程,参考的是这篇教程http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

操作系统: Ubuntu 12.04 x64

软件依赖:

  1. jdk6;
    hadoop官方推荐使用sun-jdk,而不是open-jdk。由于证书问题Ubuntu并不能直接用apt-get来安装sun-jdk,所以安装sun-jdk这一步就有点麻烦;
  2. ssh
  3. rsync

安装依赖:

$ sudo apt-get install ssh
$ sudo apt-get install rsync

安装sun-jdk-6:

  1. 去官网下载http://www.oracle.com/technetwork/java/javase/downloads/jdk6u37-downloads-1859587.html相应的版本,32位就x86,64位就x64,我下载的是jdk-6u37-linux-x64.bin;
  2. sudo mkdir /usr/java
  3. cd /usr/java
  4. 把下载好的jdk拷贝到/usr/java目录;
  5. 安装:
    sudo chmod +x jdk-6u37-linux-x64.bin
    安装完成后会在/usr/java目录下多出一个jdk1.6.0_37的目录;
  6. 配置环境变量,编辑/etc/bashrc,添加如下内容:
    JAVA_HOME=/usr/java/jdk1.6.0_37
    PATH=$PATH:$JAVA_HOME/bin
    CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH
    export JAVA_HOME PATH CLASSPATH
  7. 使环境变量生效:
    source /etc/bashrc
  8. 由于ubuntu的默认jdk是open-jdk,所以需要做一下替换:
    sudo update-alternatives –install /usr/bin/java java /usr/java/jdk1.6.0_37/bin/java 999
    sudo update-alternatives –install /usr/bin/javac javac /usr/java/jdk1.6.0_37/bin/javac 999
    sudo update-alternatives –install /usr/bin/javadoc javadoc /usr/java/jdk1.6.0_37/bin/javadoc 999
    sudo update-alternatives –install /usr/bin/javac javac /usr/java/jdk1.6.0_37/bin/javac 999
    sudo update-alternatives –config java (会让你选择,选择刚安装的版本就行了)
    sudo update-alternatives –config javac (同上)
    ls -lh /etc/alternatives/java* 检查一下

到此,就完成了Hadoop依赖的安装,下面安装Hadoop。

在系统中为Hadoop添加用户:

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
按提示贴写密码等信息就ok了。
$ su - hduser
切换到hduser用户继续下面的步骤

安装Hadoop:

  • 去官网下载Hadoop1.0.4的源码包到/usr/local/目录;
  • 安装:
    $ cd /usr/local
    $ sudo tar xzf hadoop-1.0.3.tar.gz
    $ sudo mv hadoop-1.0.3 hadoop
    $ sudo chown -R hduser:hadoop hadoop

配置SSH:

  • 首先切换到hduser这个用户,并确hduser能ssh到localhost:
    su hduser
    ssh localhost
  • 如果不能,查看你是否有~/./ssh目录和~/./ssh/id_rsa.pub文件,如果没有执行:
    ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • 如果有:
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • 确保能ssh localhost就ok了;

配置环境变量:

编辑/home/hduser/.bashrc,并在末尾添加:

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
添加完成后运行下面的命令使配置生效:
$source /home/hduser/.bashrc

配置hadoop目录下的conf/hadoop-env.sh:

export JAVA_HOME=/usr/java/jdk1.6.0_37

为hadoop添加临时目录:

$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
$ sudo chmod 755 /app/hadoop/tmp

添加如下内容到hadoop目录下的conf/core-site.xml中,并放置于<configuration></configuration>标签之间:

<configuration>
    <!-- In: conf/core-site.xml -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/app/hadoop/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:54310</value>
        <description>The name of the default file system.  A URI whose
            scheme and authority determine the FileSystem implementation.  The
            uri's scheme determines the config property (fs.SCHEME.impl) naming
            the FileSystem implementation class.  The uri's authority is used to
            determine the host, port, etc. for a filesystem.
        </description>
    </property>
</configuration>

同上一步一样,修改conf/mapred-site.xml:

<configuration>
    <!-- In: conf/mapred-site.xml -->
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:54311</value>
        <description>The host and port that the MapReduce job tracker runs
            at.  If "local", then jobs are run in-process as a single map
            and reduce task.
        </description>
    </property>
</configuration>

修改conf/hdfs-site.xml:

<configuration>
    <!-- In: conf/hdfs-site.xml -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>Default block replication.
            The actual number of replications can be specified when the file is created.
            The default is used if replication is not specified in create time.
        </description>
    </property>
</configuration>

执行例子: 

格式化HDFS:

$ hadoop namenode -format

启动Hadoop:

$ start-all.sh               #关闭Hadoop的命令是stop-all.sh

用jps命令显示正在运行的hadoop进程:

hduser@alex-sina:/usr/local/hadoop$ jps
15751 NameNode
16775 Jps
16046 DataNode
16669 TaskTracker
16399 JobTracker
16313 SecondaryNameNode
hadoop的web管理界面:

ok,安装的记录到此结束,随后还会记录一些学习hadoop的过程。

[转]What is object storage?

Object storage, also called object-based storage, is a generic term that describes an approach to addressing and manipulating discrete units of storage called objects.

Like files, objects contain data — but unlike files, objects are not organized in a hierarchy. Every object exists at the same level in a flat address space called a storage pool and one object cannot be placed inside another object.

Both files and objects have metadata associated with the data they contain, but objects are characterized by their extended metadata. Each object is assigned aunique identifier which allows a server or end user to retrieve the object without needing to know the physical location of the data. This approach is useful for automating and streamlining data storage in cloud computing environments.

Object storage is often compared to valet parking at an upscale restaurant. When a customer uses valet parking, he exchanges his car keys for a receipt. The customer does not know where his car will be parked or how many times an attendant might move the car while the customer is dining. In this analogy, a storage object’s unique identifier represents the customer’s receipt.

图片没了

同事在一次测试中用我的accesskey,secretkey,把我博客的storage给清空了。我对swift进行了修改,理论上是能恢复的,但是python-swiftclient尽然把container和account也删了,恢复无望了。我要修改一下删除的逻辑了。

好久没写博客了,还是要坚持,把学习的过程记录下来。

Python字典类型使用注意事项

对于一个字典,通过key直接获取value的方式有两种:

  1. value = dict[key]
  2. value = dict.get(key)

值得注意的是,在Python中,第一种方式获取key,如果key不存在与字典中,会抛出KeyError异常;而对于第二种方式,如果key不存在,会返回None。

更进一步的扩展地二种方式的使用,value = dict.get(key, default_value),这句话表示如果key不存在与字典中,就用default_value来填充key,例如:
dict = {
’1′: 1,
’2′: None
}
print dict['1'] 输出1;
print dict.get(’1′)输出1;
print dict['3']抛出KeyError异常;
print dict.get(’3′)输出None;
print dict.get(’3′, 1)输出1;

而我错误地认为dict.get(’2′, 1)会输出1,其实应该输出None,因为key是存在的;
我很天真的认为dict.get(’3′, 1)会判断’3′这个键的值是否为空,如果为空,用1填充。

最后,要实现即判断key存不存在,又判断dict[key]为不为空,并用指定的值去替代的方法是:
value = dict.get(’3′) or 1

Read More

Improve the process of eventual consistency in swift

As a storage service like amazon s3, swift plays an important role in openstack. Depending on the Dynamo architecture, swift has an excellent ability of scaling out and easy to operate. So we want to use swift as a web scale storage service. But after the test, we found that there is a problem with swift: the process of eventual consistency is too simple and inefficient.
Read More

OpenStack开发历史视频~转自Youtube

Read More

Swift一致性保证

Swift的一致性模型是可选的

在介绍swift的一致性之前,可以先看看一致性的基本概念,Amazon CTO Werner Vogels的博文
http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

 

分布式系统中的数据一致性可以从客户端和服务器端两个不同的角度看。这里主要介绍swift服务器端的数据一致性。

quorum 协议:

  • N表示对象副本数量
  • W表示写入数据时,成功的节点数
  • R表示读取数据时,成功的节点数
  • 如果R+W>N,强一致性

在分布式存储系统中,N通常为3,这样做既能保证数据的冗余和高可用,也能降低一致性的开销,降低存储成本。在Swfit中,通常将N设置为3,W为2,R为可选的。

Read More