1.1. Judge

Judge is used for alarm judgment. The agent pushes data to Transfer, then Transfer will not only transfer the data to the Graph component for drawing, but also transfer it to Judge to judge whether an alarm should be triggered.

1.2. Design intention

Because there is large data quantity in the monitoring system, which obviously one machine is not enough to deal with, so a data fragment scheme is necessary. Transfer fragments data through consistent hashing, and then every Judge only needs to deal with a fraction of data. Therefore, the function to judge alarms should not be located in the direct data receiving end: Transfer, but in the module behind Transfer.

1.3. Source code installation

cd $GOPATH/src/github.com/open-falcon/judge
go get ./...
./control build
./control pack

A tar.gz package will be packed at the last step. We can deploy with this package.

1.4. Deployment instruction

Judge monitors a http port and provide a http interface: /count, by visiting which we can know the data quantity that the current Judge instance is dealing with. A recommended method is that a Judge instance deals with 500,000 - 1,000,000 pieces of data with a 5G - 10G memory. If the memory of the used physical machine is large, e.g. 128G, several Judge instances can be deployed in one physical machine.

1.5. Configuration instruction

The name of the configuration file must be cfg.json. We can change the configuration file based on cfg.example.json.

{
    "debug": true,
    "debugHost": "nil",
    "remain": 11,
    "http": {
        "enabled": true,
        "listen": "0.0.0.0:6081"
    },
    "rpc": {
        "enabled": true,
        "listen": "0.0.0.0:6080"
    },
    "hbs": {
        "servers": ["127.0.0.1:6030"], # It should better be located behind lvs vip, so it is recommended to configure this place as vip:port
        "timeout": 300,
        "interval": 60
    },
    "alarm": {
        "enabled": true,
        "minInterval": 300, # The fewest seconds between two continuous alarms. Default setting should be maintained.
        "queuePattern": "event:p%v",
        "redis": {
            "dsn": "127.0.0.1:6379", #  Uses the same redis with "alarm" and "sender"
            "maxIdle": 5,
            "connTimeout": 5000,
            "readTimeout": 5000,
            "writeTimeout": 5000
        }
    }
}

Let's explain the "remain" configuration in detail: "remain" specifies how many points should be saved for one data in the judge memory, for example, how many cpu.idle values of the host01 machine can be saved in the memory at most. When configuring alarms, e.g. all (#3), the number behind # should not exceed "remain-1". Usually maintaining default setting is enough.

1.6. Process management

We provide a control script to complete normal actions.

./control start    Start a process
./control stop    Stop a process
./control restart    Restart a process
./control status    View the process state
./control tail    Use the method of "tail -f" to view var/app.log

1.7. Verification

Visit the /health interface to verify whether Judge works normally.

curl 127.0.0.1:6081/health

The other method is to view the log of Judge which is under the var directory.

1.8. Video course

We recoded a video for the Judge module to provide interpretation at source code level: http://www.jikexueyuan.com/course/1850.html

Copyright 2015 - 2018 Xiaomi Inc. all right reserved,powered by Gitbook该文件修订时间: 2022-05-30 16:56:29