elasticsearch

简介

Elasticsearch客户端

组件接口

Client.gs

函数原型	函数作用
map cluster_info(map opts = nil)	Get cluster info.
mixed index(string index, mixed body = nil, map opts = nil)	Index a document.
map bulk(string index = nil, mixed body = nil, map opts = nil)	Performs multiple indexing or delete operations in a single API call.
map search(string index = nil, map body = nil, map opts = nil)	Returns search hits that match the query defined in the request.
mixed get(string index, string id, map opts = nil)	Retrieves the document with the specified ID from an index.
mixed indices_create(string index, mixed body = nil, map opts = nil)	Creates a new index.
mixed indices_delete(string index, map opts = nil)	Deletes an index.

插入性能测试结论：

1.在每条文档数据大小不变化的情况下，从插入用时来分析，bulk插入的平均用时基本在0.26ms到0.32ms之间，而单次插入的平均用时基本在65ms到72ms之间，bulk的性能明显优于单次插入。 2.在每条文档数据大小不变化的情况下，两种插入的用时均未明显受到数量级大小的影响。 3.针对于bulk操作，在数量级不变的情况下，耗时几乎是随着每条文档的数据大小的增加而线性增加。例如数量级为1K的时候，平均每条数据大小为2.86kb,插入耗时为0.322ms,平均每条数据大小为5.5kb,插入耗时为0.556ms, 平均每条数据大小为7.9kb,插入耗时为0.748ms,平均每条数据大小为10.6kb,插入耗时为0.979ms。 4.推测bulk操作的耗时与bulk要操作的数据总量成正相关。

搜索性能测试结论：

在本次测试的数据容量下，从搜索用时来分析，每次搜索的耗时几乎不受到索引当前存储文档数量的影响。

删除性能测试结论：

在测试中发现在使用bulk对数据进行批量删除的过程中，数量级的大小对于删除操作的耗时几乎没有影响。

多协程环境测试结论：

Elasticsearch 在多协程环境下能够较好地利用 CPU 资源，但需要根据硬件资源（CPU 核心数）合理分配协程数量，内存资源尚未成为主要瓶颈。磁盘 I/O 性能在多协程环境下得到了显著优化，但在高并发场景下，磁盘可能成为性能瓶颈。Elasticsearch 在多协程工作环境下表现出较好的性能，能够有效利用 CPU、内存和磁盘资源。

CPU：

CPU 利用率在合理范围内，没有明显的性能瓶颈。
如果未来数据量或查询负载增加，可以进一步监控 CPU 使用情况。

内存：

内存占用在合理范围内，没有明显的内存泄漏或不足。
如果未来数据量增加，可以适当调整 JVM 堆大小（建议不超过物理内存的 50%）。

硬盘：

磁盘写入速度和读取速度以及利用率在合理范围内，没有明显的磁盘瓶颈。
如果未来数据量增加，可以进一步监控磁盘 I/O 性能，尤其是写入速度和磁盘队列长度。

样例

public void sample()
{
    /**
     * 1. 在本地运行Elasticsearch实例：
     *
     * ```
     * curl -fsSL https://elastic.co/start-local | sh
     * ```
     *
     * 2. 设置ES_LOCAL_PASSWORD环境变量（Elasticsearch成功部署后在控制台可查看管理账号密码）：
     *
     * ```
     * export ES_LOCAL_PASSWORD=<elastic_password>
     * ```
     */

    // 连接密码
    string elastic_password = "3xX5bDFO49HCyxFsqcL=";
    // string elastic_password =  "ES_LOCAL_PASSWORD"
    assert(
        elastic_password != nil,
        "Deploy Elasticsearch and execute command: `source elastic-start-local/.env`"
    );

    object client = elasticsearch.Elasticsearch(
        "https://127.0.0.1:9200",
        sprintf("elastic:%s", elastic_password)
    );

    // 获取集群信息
    map opt = {"error_trace": true, "human": true, "pretty": true};
    Options opts = Options.new(opt);
    map response = client.cluster_info(opts);
    printf("client.info(): %O\n", response);

    // 重新创建索引,如果索引存在就先删除索引再创建索引
    // opt = {"ignore_unavailable": true };
    // OptionsIndexDelete optsID = OptionsIndexDelete.new( opt );
    // client.indices_delete(optsID,"my_documents");

    // opt = nil;
    // OptionsIndicesCreate optsIC = OptionsIndicesCreate.new( opt );
    // response = client.indices_create(optsIC, "my_documents");

    // 获取目标索引下存储的文档数量
    printf("Document's number is %d\n", client.get_index_count("my_documents"));

    // 测试单次插入用时
    OptionsIndex optsIdx = OptionsIndex.new(opt);
    int t = time.time_ms();
    response = client.index(
        optsIdx,
        "my_documents",
        {
            "title": "Work From Home Policy",
            "contents": "The purpose of this full-time work-from-home policy is...",
            "created_on": "2023-11-02"
        }
    );
    t = time.time_ms() - t;
    printf("index a document used %dms\n", t);

    // 根据文档ID删除文档
    OptionsDeleteById optsDid = OptionsDeleteById.new(opt);
    printf(
        "delete result is %d\n",
        client.document_delete(optsDid, "my_documents", response["_id"])
    );

    // 测试批量插入用时，平均每条文档的大小在7.9KB左右
    array resultTime = [];
    int totalTime = 0;
    for (int i = 0; i < 5; i++)
    {
        // 获取随机生成的文档
        array documents = get_data(i + 2300000);
        array operations = [];
        for (map document : documents)
        {
            operations.push_back(
                {
                    "index": {"_index": "my_documents"}
                }
            );
            operations.push_back(document);
        }
        // 在一次调用中插入所有文档
        opt = nil;
        OptionsBulk optsBl = OptionsBulk.new(opt);
        int t1 = time.time_ms();
        response = client.bulk(optsBl, nil, operations);
        t1 = time.time_ms() - t1;
        resultTime.push_back(t1);
        totalTime += t1;
    }
    for (int tl : resultTime)
    {
        printf("bulk insert used %dms\n", tl);
    }
    printf("bulk_insert avg time is %d\n", totalTime / 5);

    // 批量插入文档
    array documents = json.parse((string)file.read_all("/sample/data.json", "b"));
    array operations = [];
    for (map document : documents)
    {
        operations.push_back(
            {
                "index": {"_index": "my_documents"}
            }
        );
        operations.push_back(document);
    }
    // 在一次调用中插入所有文档
    opt = nil;
    OptionsBulk optsBl = OptionsBulk.new(opt);
    response = client.bulk(optsBl, nil, operations);

    // 需要等待Elasticsearch写入

    coroutine.sleep(2.0);

    /************************************************************************************************************** */
    // 进行bool搜索测试
    opt = nil;
    OptionsSearch optsSr1 = OptionsSearch.new(opt);
    BoolQuery BQ1 = BoolQuery.new();
    BQ1.add_must(
        {
            "match": {"content": "Effective"}
        }
    );
    // BQ.add_range_must("created_on","2025-01-06","2027-02-05");
    // BQ.add_range_must("updated_at","2025-01-01","2027-01-01");
    response = client.search(optsSr1, "my_documents", BQ1.body);
    // response.hits.total.value默认最大值为10000。如果实际匹配的文档数超过此值，total.value会显示为10000，且total.relation字段会标记为gte（表示实际匹配数大于等于10000）。
    printf("search result number is %O\n", response.hits.total.value);

    array results = response.hits.hits ?? []; // 获取，命中结果
    if (results.length() > 0)
    {
        // 按ID获取文档
        opt = nil;
        OptionsGet optGet = OptionsGet.new(opt);
        response = client.get(optGet, "my_documents", results[0]._id);
        // printf("response is %O\n", response);
    }

    // 测试搜索用时
    opt = nil;
    OptionsSearch optsSr2 = OptionsSearch.new(opt);
    resultTime = [];
    totalTime = 0;
    int count = 0;
    for (int z = 1; z < 10; z++)
    {
        for (int j = 1; j < 10; j++)
        {
            for (int i = 1; i < 10; i++)
            {
                string date = sprintf("202%d-0%d-1%d", z, i, j);
                string date2 = sprintf("202%d-0%d-1%d", z, i, j);
                map serachCondtion = {"created_on": date, "updated_at": date2};
                BoolQuery BQ2 = BoolQuery.new();
                BQ2.add_match_must(serachCondtion);
                int st = time.time_ms();
                response = client.search(optsSr2, "my_documents", BQ2.body);
                st = time.time_ms() - st;
                totalTime += st;
                resultTime.push_back(st);
                count++;
            }
        }
    }
    printf("avg time is %f\n", totalTime / count);

    // 测试删除操作用时
    opt = nil;
    OptionsBulk optsBlD = OptionsBulk.new(opt);
    map searchOpts = nil;
    OptionsSearch optsSr3 = OptionsSearch.new(searchOpts);
    // 设置查询条件，创建时间在2025-01-06到2027-02-05以及更新时间在2025-01-01到2027-01-01
    BoolQuery BQ3 = BoolQuery.new();
    BQ3.add_range_must("created_on", "2025-01-06", "2027-02-05");
    BQ3.add_range_must("updated_at", "2025-01-01", "2027-01-01");
    int t2 = time.time_ms();
    // 返回Bulk操作所需要的数组
    array temp = client.find_match_id(optsSr3, "my_documents", BQ3.body);
    // 进行批量删除
    client.document_bulk_delete(optsBlD, nil, temp);
    t2 = time.time_ms() - t2;
    printf("bulk_delete used %dms\n", t2);
}

简介​

组件接口​

Client.gs​

插入性能测试结论：​

搜索性能测试结论：​

删除性能测试结论：​

多协程环境测试结论：​

样例​

简介