ElasticSearch之集群配置问题:无法构建集群

系统环境

  1. Elasticsearch版本:7.6.2,安装方式为:rpm
  2. OS版本:CentOS Linux release 8.1.1911 (Core)

上次我们在开发环境中构建集群,方式非常简单Elasticsearch基础之快速搭建开发环境集群(7.6.2),对于生产环境环境中的安装官方推荐为rpm,对于该种方式的注意点简单整理一下。

问题:无法构建集群

目标:两台独立服务器配置集群,当然啦,正式环境中推荐最少要三台。

IP分别为:192.168.50.21、192.168.50.108 两台机器配置文件类似如下:

cluster.name: elasticsearch-cluster-seraph
node.name: es-node-1/2/3 #这里不同机器不一样
node.master: true
node.data: true
path.data: /home/elasticsearch/data
path.logs: /home/elasticsearch/logs
path.plugins: /home/elasticsearch/plugins
# 快照备份存储目录
path.repo: /home/elasticsearch/repo
bootstrap.memory_lock: true
network.host: 0.0.0.0
http.port: 9200

discovery.seed_hosts: ["192.168.50.21", "192.168.50.108"]
cluster.initial_master_nodes: ["es-node-1","es-node-2"]

分别启动成功后,显示如下,”cluster_uuid” : “_na_” 表示各自独立并没有组建集群

{
  "name" : "es-node-2",
  "cluster_name" : "elasticsearch-cluster-seraph",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

查询系统日志显示如下:

[2020-05-05T10:27:28,367][INFO ][o.e.c.c.JoinHelper       ] [es-node-2] failed to join {es-node-1}{wl-Ksj5BQ8OzLLWU5miEPg}{QpHvnUNjSbuazmu36pK0Iw}{192.168.50.21}{192.168.50.21:9300}{dilm}{ml.machine_memory=6582480896, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={es-node-2}{oDNuTOVkRyaBN-NRXUFIJA}{zI0THVLuQLSe0NBSlwMrfA}{172.18.0.1}{172.18.0.1:9300}{dilm}{ml.machine_memory=5703053312, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [es-node-1][192.168.50.21:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [es-node-2][172.18.0.1:9300] connect_exception
	at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:989) ~[elasticsearch-7.6.2.jar:7.6.2]
	at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:162) ~[elasticsearch-7.6.2.jar:7.6.2]
	at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.6.2.jar:7.6.2]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
	at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.6.2.jar:7.6.2]
	at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[transport-netty4-client-7.6.2.jar:7.6.2]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) ~[netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:150) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050) [netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.43.Final.jar:4.1.43.Final]
	at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: java.io.IOException: connection timed out: 172.18.0.1/172.18.0.1:9300
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:150) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510) ~[netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:830) ~[?:?]

通过该日志分析得知,由于我们配置的network.host: 0.0.0.0,而主机上默认安装了docker环境,导致有多个IP,并且172.18.0.1这样的IP跟其他机器是不通的,所以问题点就找到了,在自动进行集群内部交互时,不一定选择本机的哪一个IP,导致集群内部网络不通。 我们需要重新配置network.host为我们指定的服务器IP地址192.168.50.21、192.168.50.108。

注意:修改完配置后,需要清空一下data文件夹才会生效,然后我们重启服务systemctl restart elasticsearch 集群恢复正常。

当然对于一台服务器只有一个IP的情况是不会存在该问题的。

Leave a Reply


正在读取数据……