简介
本文档介绍如何在5G用户微服务基础设施(SMI)协议虚拟机(协议VM)中实施网络时间协议(NTP)时间同步问题的解决方法。
先决条件
要求
思科建议您了解以下主题:
- 思科SMI
- 5G云本地部署平台(CNDP)架构
- 云 — Red Hat OpenStack平台13(“Queens”版本)
- 多克斯和库伯内特斯
使用的组件
本文档中的信息基于以下软件和硬件版本:
- SMI 2020.01.1-21
- 库贝内特斯v1.16.2
- Red Hat OpenStack平台13导向器
本文档中的信息都是基于特定实验室环境中的设备编写的。本文档中使用的所有设备最初均采用原始(默认)配置。如果您的网络处于活动状态,请确保您了解所有命令的潜在影响。
背景信息
什么是SMI?
思科SMI是云技术和标准的分层堆栈,支持来自思科移动、电缆和宽带网络网关(BNG)业务单元的基于微服务的应用。这些应用具有类似的用户管理功能和类似的datastore要求。
Attributes:
- 层云堆栈(技术和标准)提供自上而下的部署并适应当前云基础设施。
- 所有应用共享通用执行环境(CEE),用于非应用功能(数据存储、部署、配置、遥测、报警),为所有客户触点和集成点提供一致的交互和体验。
- 应用和CEE部署在微服务容器中,并与智能服务网状网连接。
- 用于部署、配置和管理的外露API可实现自动化。
什么是CEE?
- CEE是一种软件解决方案,旨在监控部署在SMI上的移动和电缆应用。CEE以集中方式从应用中获取信息(关键指标),供工程师调试和故障排除。
- CEE是为所有应用安装的通用工具集。它配备了专用的运营中心,该中心提供命令行界面(CLI)和API来管理监控工具。每个集群只有一个CEE可用。
什么是协议VM?
协议VM托管负责协议转换的会话管理功能(SMF)应用微服务。每个SMF实例部署两个协议VM。协议VM具有GTPc、合法拦截(LI)、Radius、Rest等Pod。
问题
从显示问题的节点(协议VM)中,您可以看到网络尝试使用两个不同的接口将连接路由到NTP服务器。但是,只有一个网络可以到达IP。
在一段时间内运行不带接口的ping时,您会看到它会导致数据包丢失。
来自CEE的警报示例:
[pod-name-cnat/global] cee# show alerts active summary
NAME UID SEVERITY STARTS AT SOURCE SUMMARY
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
VoNR_Ded_Bearer_Creat be183f61dd65 major 01-18T16:32:17 System Success percentage of VoNR - PCF Initiated Dedicated Bearer Creation procedure in NR turns less than t...
clock-is-not-in-synch 38a35e9a8bd8 major 01-18T13:11:15 pod-name-cnat-cnat-co Clock not in synch detected on hostname pod-name-cnat-cnat-core-protocol-data1 . Ensure NTP is configu...
clock-is-not-in-synch 4a7e138b8bae major 01-18T13:07:35 pod-name-cnat-cnat-co Clock not in synch detected on hostname pod-name-cnat-cnat-core-protocol-ims2 . Ensure NTP is configur...
clock-is-not-in-synch 5b3e128f0101 major 01-18T12:05:55 pod-name-cnat-cnat-co Clock not in synch detected on hostname pod-name-cnat-cnat-core-protocol-data2 . Ensure NTP is configu...
container-memory-usag f588aa627792 critical 12-28T15:47:30 path-provisioner-v6zx Pod cee-global/path-provisioner-v6zxj/k8s_path-provisioner_path-provisioner-v6zxj_cee-global_f1a1e8fa-...
container-memory-usag 55b2ea84466f critical 12-27T22:27:10 path-provisioner-v6zx Pod cee-global/path-provisioner-v6zxj/ uses high memory 82.69%.
N11_SM_Timeout_SR f6fb82cac197 major 10-31T15:51:16 System This alert is fired when the increase in timeout for N11 messages toward AMF crosses threshold
Radius_Server_RTT 94ac2cff43a3 warning 10-28T05:08:56 System RTT for Radius Server: 216.x.x.x, Port: 1813 in namespace: smf-mvno is more than 5 ms.
Radius_Server_RTT d02b60b3a3a4 warning 10-28T05:05:56 System RTT for Radius Server: 52.x.x.x, Port: 1813 in namespace: smf-mvno is more than 5 ms.
Radius_Server_RTT 9afcee101013 warning 10-28T05:05:36 System RTT for Radius Server: 52.x.x.x, Port: 1813 in namespace: smf-mvno is more than 5 ms.
N11_SM_Timeout_SR 206c4bbccf21 major 10-20T13:26:16 System This alert is fired when the increase in timeout for N11 messages toward AMF crosses threshold
N4_Total_Outbound_Int 4e0f3c1d6300 major 10-20T09:01:23 System This alert is fired when the percentage of N4 outbound responses sent is lesser than threshold %.
N4_Total_Outbound_Int 3e2fed624704 major 10-20T08:21:23 System This alert is fired when the percentage of N4 outbound responses sent is lesser than threshold %.
watchdog 97a7976a4103 minor 05-05T09:10:58 System This is an alert meant to ensure that the entire alerting pipeline is functional. This alert is always...
故障排除
-
登录受影响的VM并运行以下命令: ip route | grep -i default
在本例中,您可以看到两条默认路由,不正确。(应该只有一条默认路由。)
ubuntu@pod-name-cnat-cnat-core-protocol-ims2:~$ ip route | grep -i default
default via 172.16.x.x dev ens4 proto dhcp src 172.16.x.x metric 100
default via 172.16.x.x dev ens3 proto dhcp src 172.16.x.x metric 100
-
检查计时(NTP)服务的状态。即使服务已启动,您仍会看到“无法同步”错误。
示例输出:
root@pod-name-cnat-cnat-core-protocol-data1:/home/ubuntu# systemctl status chronyd.service
chrony.service - chrony, an NTP client/serverLoaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-01-08 07:19:19 UTC; 11 months 24 days ago
Docs: man:chronyd(8)
man:chronyc(1)
man:chrony.conf(5)
Main PID: 4300 (chronyd)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/chrony.service
└─4300 /usr/sbin/chronyd
Dec 30 17:06:28 pod-name-cnat-cnat-core-protocol-data1 chronyd[4300]: Selected source 5.196.x.x
Dec 31 06:37:11 pod-name-cnat-cnat-core-protocol-data1 chronyd[4300]: Can't synchronise: no selectable sources
Dec 31 17:15:31 pod-name-cnat-cnat-core-protocol-data1 chronyd[4300]: Selected source 5.196.x.x
Jan 01 06:29:43 pod-name-cnat-cnat-core-protocol-data1 chronyd[4300]: Can't synchronise: no selectable sources
Jan 01 16:50:02 pod-name-cnat-cnat-core-protocol-data1 chronyd[4300]: Selected source 5.196.x.x
Jan 01 19:25:05 pod-name-cnat-cnat-core-protocol-data1 chronyd[4300]: Can't synchronise: no selectable sources
-
检查network-config服务的状态。(它处于非活动状态。)
ubuntu@pod-name-cnat-cnat-core-protocol-data1:~$ systemctl status network-config
network-config.service - Job that configures for routes
Loaded: loaded (/etc/systemd/system/network-config.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Fri 2021-01-08 09:29:05 UTC; 1 years 0 months ago
Process: 1185 ExecStart=/etc/cisco/network-config.sh start (code=killed, signal=TERM)
Main PID: 1185 (code=killed, signal=TERM)
解决方法
注意:此过程不会导致应用程序出现任何停机。
-
编辑network-config服务脚本(/etc/systemd/system/network-config.service
)添加以下行:
root@pod-name-cnat-cnat-core-protocol-data1:/home/ubuntu# vim /etc/systemd/system/network-config.service
Restart=always
RestartSec=10s
StartLimitIntervalSec=300s
StartLimitBurst=30
-
启动network-config服务:
root@pod-name-cnat-cnat-core-protocol-data1:/home/ubuntu# systemctl start network-config
-
使用systemctl重新加载配置:
root@pod-name-cnat-cnat-core-protocol-data1:/home/ubuntu# systemctl daemon-reload
-
重新启动计时服务:
root@pod-name-cnat-cnat-core-protocol-data1:/home/ubuntu# systemctl restart chronyd.service
确认
完成解决方法步骤后,请运行以下命令以验证问题是否已解决:
for server in $(awk '/protocol/{print $1}' /etc/hosts); do ssh $server "hostname && timedatectl status && chronyc sources -vn" ; done
注意:“系统时钟同步”值为“是”表示问题已解决。
示例输出:
ubuntu@pod-name-cnat-cnat-core-master1:~$ for server in $(awk '/protocol/{print $1}' /etc/hosts); do ssh $server "hostname && timedatectl status && chronyc sources -vn" ; done
pod-name-cnat-cnat-core-protocol-data1
Local time: Tue 2021-09-07 10:22:16 UTC
Universal time: Tue 2021-09-07 10:22:16 UTC
RTC time: Tue 2021-09-07 10:22:17
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
systemd-timesyncd.service active: no
RTC in local TZ: no
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* svtsnq01.mgmt 1 6 170 282 -1100us[-1858us] +/- 47ms
pod-name-cnat-cnat-core-protocol-data2
Local time: Tue 2021-09-07 10:22:16 UTC
Universal time: Tue 2021-09-07 10:22:16 UTC
RTC time: Tue 2021-09-07 10:22:17
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
systemd-timesyncd.service active: no
RTC in local TZ: no
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* svtsnq01.mgmt 1 6 74 206 -218us[-1550us] +/- 46ms
pod-name-cnat-cnat-core-protocol-ims1
Local time: Tue 2021-09-07 10:22:17 UTC
Universal time: Tue 2021-09-07 10:22:17 UTC
RTC time: Tue 2021-09-07 10:22:18
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
systemd-timesyncd.service active: no
RTC in local TZ: no
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* svtsnq01.mgmt 1 6 77 24 +796us[+1044us] +/- 47ms
pod-name-cnat-cnat-core-protocol-ims2
Local time: Tue 2021-09-07 10:22:17 UTC
Universal time: Tue 2021-09-07 10:22:17 UTC
RTC time: Tue 2021-09-07 10:22:18
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
systemd-timesyncd.service active: no
RTC in local TZ: no
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* svtsnq01.mgmt 1 6 17 98 +2176us[+2275us] +/- 47ms