0

0

Oracle Grid Infrastructure: Understanding Split

php中文网

php中文网

发布时间:2016-06-07 15:37:31

|

1324人浏览过

|

来源于php中文网

原创

APPLIES TO: Oracle Database - Enterprise Edition - Version 11.2.0.1 and later Information in this document applies to any platform. PURPOSE The purpose of this note is to explain split-brain node evictions in Oracle Clusterware release 11.

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

PURPOSE

the purpose of this note is to explain split-brain node evictions in oracle clusterware release 11.2

SCOPE

The intended audience of this note is Oracle Clusterware 11.2 administrators at any level of expertise. As written, this note applies only to 11.2.

DETAILS

Missed network heartbeat (NHB) evictions happen when ocssd of the surviving node loses contact with the evicted node over the interconnect. The nodes must be able to communicate over the interconnect to avoid a "split brain" situation. In the case of a "split brain" node eviction, one node aborted itself to avoid "split brain" when communication over the interconnect was compromised.

What does "split brain" mean?

"Split brain" means that there are 2 or more distinct sets of nodes, or "cohorts", with no communication between the two cohorts.

For example:
Suppose there are 4 nodes named A, B, C, D, in the following situation
* Nodes A,B can talk to each other; nodes C,D can talk to each other
* But A and B cannot talk to C or D, and vice versa
Then there are two cohorts: {A, B} and {C, D}.

Why is this a problem?

In a split-brain situation, there are in a sense two (or more) separate clusters working on the same shared storage. This has the potential for data corruption. So the split-brain must be resolved.

How does the clusterware resolve a "split brain" situation?

Oracle Clusterware handles the split-brain by terminating all the nodes in the SMALLER cohort.
If both of the cohorts are the same size, the cohort with the lowest numbered node in it survives.

The clusterware identifies the LARGEST cohort, and aborts all the nodes which do NOT belong to that cohort.

Identifying a split-brain eviction

In a split-brain node eviction, the following message is present in the ocssd log ($GRID_HOME/log//ocssd/ocssd.log) of the evicted node:

clssnmCheckDskInfo: Aborting local node to avoid splitbrain.

And earlier in the same log, within 10 minutes prior to "clssnmCheckDskInfo: Aborting local node" message:

clssnmPollingThread: node %s (%n) at % heartbeat fatal, removal in...

 

Img.Upscaler
Img.Upscaler

免费的AI图片放大工具

下载

Finding the cohort

The split-brain message in the ocssd.log will show "cohort" information. For example:

2012-12-28 20:26:25.803: [    CSSD][1111296320]clssnmCheckDskInfo: My cohort: 1
2012-12-28 20:26:25.803: [    CSSD][1111296320]clssnmCheckDskInfo: Surviving cohort: 2,3,4
2012-12-28 20:26:25.803: [    CSSD][1111296320](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 1, sprora01, is smaller than cohort of 3 nodes led by node 2, sprora02, based on map type 2

Understanding the cohort message

In a split-brain situation, ocssd on each node records on the voting disk the set of nodes it can communicate with. Each set is known as a "cohort". When there are two (or more) mutually non-intersecting sets, we have a "split-brain" situation. It means that there are two (or more) separate sets of nodes which cannot talk to each other over the interconnect. 

For example, in the above quote

My cohort: 1
Surviving cohort: 2,3,4


The meaning of these messages is

* "My cohort: 1" => The list of nodes I can communicate with: 1
* "Surviving cohort: 2,3,4" => From the voting disk, I know that nodes 2,3,4 can all communicate with each other.
* "Cohort of 1 nodes with leader 1, sprora01, is smaller than cohort of 3 nodes led by node 2, sprora02"
=> Oracle Clusterware has identified that the cohort {1} is smaller than the cohort {2,3,4}.

Oracle Clusterware handles the split-brain by terminating all the nodes in the SMALLER cohort. In this case, the smaller cohort is {1}. Therefore, ocssd on node {1} aborts the node.

Using the cohort message to identify interconnect network issues

The cohort message describes which nodes can communicate with each other.

Each cohort is a set of nodes that can talk to each other, and cannot talk to the nodes NOT in the cohort.

In the above example, the cohort message tells us that nodes {2,3,4} are all in communication; node 1 is not in communication with any of them.

 

Follow-up Action


The private network between node 1 and the other 3 nodes should be checked.

Please refer to the following note to check private interconnect network:  Document 1534949.1 - Oracle Grid Infrastructure: How to Troubleshoot Missed Network Heartbeat Evictions


===================================================

在理解脑裂(Brain Split)处理过程前,有必要介绍一下Oracle RAC Css(Cluster Synchronization Services)的工作框架:
Oracle Grid Infrastructure: Understanding Split

Oracle RAC CSS提供2种后台服务包括群组管理(Group Managment简称GM)和节点监控(Node Monitor简称NM),其中GM管理组(group)和锁(lock)服务。在集群中任意时刻总有一个节点会充当GM主控节点(master node)。集群中的其他节点串行地将GM请求发送到主控节点(master node),而master node将集群成员变更信息广播给集群中的其他节点。组成员关系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独立地诠释集群成员变化信息。

而节点监控NM服务则负责通过skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集群软件保持节点信息的一致性。此外NM还提供对我们熟知的网络心跳(Network heartbeat)和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常Network heartbeat或Disk heartbeat时NM负责将成员踢出集群,被踢出集群的节点将发生节点重启(reboot)。

NM服务通过OCR中的记录(OCR中记录了Interconnect的信息)来了解其所需要监听和交互的端点,将心跳信息通过网络发送到其他集群成员。同时它也监控来自所有其他集群成员的网络心跳Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳在misscount(by the way:10.2.0.1中Linux上默认misscount为60s,其他平台为30s,若使用了第三方vendor clusterware则为600s,但10.2.0.1中未引入disktimeout;10.2.0.4以后misscount为60s,disktimeout为200s;11.2以后misscount为30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死亡”了。NM还负责当其他节点加入或离开集群时初始化集群的重置(Initiates cluster reconfiguration)。

在解决脑裂的场景中,NM还会监控voting disk以了解其他的竞争子集群(subclusters)。关于子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以Oracle官方构建过的128个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是全局的网络失败,即128个节点中每个节点都不能互相发生网络心跳,此时会产生多达128个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128个节点中被分成多个部分,每个部分中包含多于一个的节点,这些部分就可以被称作子集群(subclusters)。当出现网络故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点之间已经无法通过常规的Interconnect网络交流了,这个时候NM Reconfiguration就需要用到voting disk投票磁盘。

因为NM要使用voting disk来解决因为网络故障造成的通信障碍,所以需要保证voting disk在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说就是会到投票磁盘的某个块上写入disk心跳信息,这种活动每一秒钟都会发生,同时CSS还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群时,CSS会主动重启节点。

为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作CSS要求保证至少(N/2+1)个投票磁盘要被节点正常访问,这样就保证了每2个节点间总是至少有一个投票磁盘是它们都可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线voting disk多于无法访问的voting disk,该节点都能幸福地活下去,当无法访问的voting disk多于正常的voting disk时,Cluster Communication Service进程将失败并引起节点重启。所以有一种说法认为voting disk只要有2个足以保证冗余度就可以了,没有必要有3个或以上voting disk,这种说法是错误的。Oracle推荐集群中至少要有3个voting disks。


热门AI工具

更多
DeepSeek
DeepSeek

幻方量化公司旗下的开源大模型平台

豆包大模型
豆包大模型

字节跳动自主研发的一系列大型语言模型

通义千问
通义千问

阿里巴巴推出的全能AI助手

腾讯元宝
腾讯元宝

腾讯混元平台推出的AI助手

文心一言
文心一言

文心一言是百度开发的AI聊天机器人,通过对话可以生成各种形式的内容。

讯飞写作
讯飞写作

基于讯飞星火大模型的AI写作工具,可以快速生成新闻稿件、品宣文案、工作总结、心得体会等各种文文稿

即梦AI
即梦AI

一站式AI创作平台,免费AI图片和视频生成。

ChatGPT
ChatGPT

最最强大的AI聊天机器人程序,ChatGPT不单是聊天机器人,还能进行撰写邮件、视频脚本、文案、翻译、代码等任务。

相关专题

更多
go语言 注释编码
go语言 注释编码

本专题整合了go语言注释、注释规范等等内容,阅读专题下面的文章了解更多详细内容。

32

2026.01.31

go语言 math包
go语言 math包

本专题整合了go语言math包相关内容,阅读专题下面的文章了解更多详细内容。

23

2026.01.31

go语言输入函数
go语言输入函数

本专题整合了go语言输入相关教程内容,阅读专题下面的文章了解更多详细内容。

16

2026.01.31

golang 循环遍历
golang 循环遍历

本专题整合了golang循环遍历相关教程,阅读专题下面的文章了解更多详细内容。

5

2026.01.31

Golang人工智能合集
Golang人工智能合集

本专题整合了Golang人工智能相关内容,阅读专题下面的文章了解更多详细内容。

6

2026.01.31

2026赚钱平台入口大全
2026赚钱平台入口大全

2026年最新赚钱平台入口汇总,涵盖任务众包、内容创作、电商运营、技能变现等多类正规渠道,助你轻松开启副业增收之路。阅读专题下面的文章了解更多详细内容。

268

2026.01.31

高干文在线阅读网站大全
高干文在线阅读网站大全

汇集热门1v1高干文免费阅读资源,涵盖都市言情、京味大院、军旅高干等经典题材,情节紧凑、人物鲜明。阅读专题下面的文章了解更多详细内容。

195

2026.01.31

无需付费的漫画app大全
无需付费的漫画app大全

想找真正免费又无套路的漫画App?本合集精选多款永久免费、资源丰富、无广告干扰的优质漫画应用,涵盖国漫、日漫、韩漫及经典老番,满足各类阅读需求。阅读专题下面的文章了解更多详细内容。

170

2026.01.31

漫画免费在线观看地址大全
漫画免费在线观看地址大全

想找免费又资源丰富的漫画网站?本合集精选2025-2026年热门平台,涵盖国漫、日漫、韩漫等多类型作品,支持高清流畅阅读与离线缓存。阅读专题下面的文章了解更多详细内容。

85

2026.01.31

热门下载

更多
网站特效
/
网站源码
/
网站素材
/
前端模板

精品课程

更多
相关推荐
/
热门推荐
/
最新课程
SQL 教程
SQL 教程

共61课时 | 3.7万人学习

Java 教程
Java 教程

共578课时 | 54.6万人学习

oracle知识库
oracle知识库

共0课时 | 0人学习

关于我们 免责申明 举报中心 意见反馈 讲师合作 广告合作 最新更新
php中文网:公益在线php培训,帮助PHP学习者快速成长!
关注服务号 技术交流群
PHP中文网订阅号
每天精选资源文章推送

Copyright 2014-2026 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号