CNDP 솔루션의 서버 문제 해결

다운로드 옵션

PDF (148.2 KB)
다양한 디바이스에서 Adobe Reader로 보기
ePub (78.8 KB)
iPhone, iPad, Android, Sony Reader 또는 Windows Phone의 다양한 앱에서 보기
Mobi (Kindle) (64.1 KB)
Kindle 디바이스에서 보기 또는 다양한 디바이스의 Kindle 앱에서 보기

업데이트:2022년 5월 26일

문서 ID:217899

편견 없는 언어

본 제품에 대한 문서 세트는 편견 없는 언어를 사용하기 위해 노력합니다. 본 설명서 세트의 목적상, 편견 없는 언어는 나이, 장애, 성별, 인종 정체성, 민족 정체성, 성적 지향성, 사회 경제적 지위 및 교차성에 기초한 차별을 의미하지 않는 언어로 정의됩니다. 제품 소프트웨어의 사용자 인터페이스에서 하드코딩된 언어, RFP 설명서에 기초한 언어 또는 참조된 서드파티 제품에서 사용하는 언어로 인해 설명서에 예외가 있을 수 있습니다. 시스코에서 어떤 방식으로 포용적인 언어를 사용하고 있는지 자세히 알아보세요.

이 번역에 관하여

Cisco는 전 세계 사용자에게 다양한 언어로 지원 콘텐츠를 제공하기 위해 기계 번역 기술과 수작업 번역을 병행하여 이 문서를 번역했습니다. 아무리 품질이 높은 기계 번역이라도 전문 번역가의 번역 결과물만큼 정확하지는 않습니다. Cisco Systems, Inc.는 이 같은 번역에 대해 어떠한 책임도 지지 않으며 항상 원본 영문 문서(링크 제공됨)를 참조할 것을 권장합니다.

소개

이 문서에서는 CNDP(Cloud Native Deployment Platform)에서 UCS(Unified Computing System)를 식별하고 결함 항목을 확인하는 방법에 대해 설명합니다.

배경 정보

하드웨어 관련 경고는 SMI(Ultra Cloud Core Subscriber Microservices Infrastructure) CM(Cluster Manager) CEE(Common Execution Environment)에 보고됩니다. Kubernetes(K8s), docker 등 관련 정보는 CM VIP(가상 IP)에 보고됩니다.

주의: IP를 확인하려면 Network Design and Customer Information Questionnaire(CIQ)를 참조하십시오.

문제

"Equipment Alarm" 오류가 show alerts에 보고됩니다.

CM-CEE에 로그인하여 경고 활성 세부 정보 표시 명령을 실행하고 모든 활성 및 기록 경고를 표시하려면 알림 기록 요약을 표시합니다.
알림에 보고된 서버 IP를 확인합니다.

[lab-deployer/labceec01] cee# show alerts active detail 
alerts active detail server-alert 9c367ce5ee48
 severity    major
 type        "Equipment Alarm"
 startsAt    2021-10-27T17:10:37.025Z
 source      10.10.10.10
 summary     "DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM"
 labels      [ "alertname: server-alert" "cluster: cr-chr-deployer" "description: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "fault_id: sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185" "id: 134219020" "monitor: prometheus" "replica: cr-chr-deployer" "server: 10.10.10.10" "severity: major" ]
 annotations [ "dn: cr-chr-deployer/10.10.10.10/sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185/134219020" "summary: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "type: Equipment Alarm" ]

[lab-deployer/labceec01] cee# show alerts history summary
NAME      UID           SEVERITY  STARTS AT       DURATION  SOURCE       SUMMARY            
---------------------------------------------------------------------------------------------
vm-alive  f6a65030b593  minor     09-02T10:28:28  1m40s     10-192-0-13  labd0123 is alive. 
vm-error  3a6d840e3eda  major     09-02T10:27:18  1m        10-192-0-13  labd0123 is down.  
vm-alive  49b2c1941dc6  minor     09-02T10:25:38  1m40s     10-192-0-14  labd0123 is alive.

솔루션

SMI CM의 서버에서 호스팅되는 서비스(컨테이너) 및/또는 가상 머신(VM) 또는 커널 기반 가상 머신(KVM)을 식별하고 show running-config 명령을 실행하여 서버 IP에 대한 컨피그레이션을 찾습니다.

CM VIP에 로그인합니다(사용자 이름: 클라우드 사용자)
OPS Center에서 smi-cm 네임스페이스에 대한 IP 가져오기
OPS 센터에 로그인하고 클러스터 컨피그레이션을 확인합니다.
서버에서 실행되는 노드 및 VM 식별

cloud-user@lab-deployer-cm-primary:~$ kubectl get svc -n smi-cm
NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP      PORT(S)                                                 AGE
cluster-files-offline-smi-cluster-deployer    ClusterIP   10.102.200.178   <none>           8080/TCP                                                98d
iso-host-cluster-files-smi-cluster-deployer   ClusterIP   10.102.100.208     192.168.1.102    80/TCP                                                  98d
iso-host-ops-center-smi-cluster-deployer      ClusterIP   10.102.200.73    192.168.1.102    3001/TCP                                                98d
netconf-ops-center-smi-cluster-deployer       ClusterIP   10.102.100.207   192.168.184.193   3022/TCP,22/TCP                                         98d
ops-center-smi-cluster-deployer               ClusterIP   10.10.20.20     <none>           8008/TCP,2024/TCP,2022/TCP,7681/TCP,3000/TCP,3001/TCP   98d
squid-proxy-node-port                         NodePort    10.102.60.114    <none>           3128:32261/TCP                                          98d

cloud-user@lab-deployer-cm-primary:~$ ssh -p 2024 admin@10.10.20.20
admin@10.10.20.20's password:
      Welcome to the Cisco SMI Cluster Deployer on lab-deployer-cm-primary
      Copyright © 2016-2020, Cisco Systems, Inc.
      All rights reserved.
admin connected from 192.168.1.100 using ssh on ops-center-smi-cluster-deployer-7848c69844-xzdw6
[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters

컨테이너의 출력 예

이 예에서는 서버가 노드 primary-1에서 사용됩니다.

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-smf nodes primary-1
clusters lab01-smf
nodes primary-1
  maintenance false
  k8s node-type       primary
  k8s ssh-ip          10.192.10.22
  k8s sshd-bind-to-ssh-ip true
  k8s node-ip         10.192.10.22
  k8s node-labels smi.cisco.com/node-type oam
  exit
  k8s node-labels smi.cisco.com/node-type-1 proto
  exit
  ucs-server cimc user admin
  ucs-server cimc ip-address 10.10.10.10

VM의 출력 예

서버는 KVM 기반 VM에 사용할 수 있습니다.

이 예에서는 서버에 UPF(User Plane Functions) - upf1 및 upf2가 있습니다.

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-upf nodes labupf
clusters lab01-upf
nodes labupf
  maintenance false
  ssh-ip      10.192.30.7
  type        kvm
  vms upf1
   upf software lab...
...
   type upf
  exit
  vms upf2
   upf software lab...
...
   type upf
  exit
  ucs-server cimc user admin
...
  ucs-server cimc ip-address 10.10.10.10
...
  exit

UCS 호스트에 대한 SSH

UCS 호스트에 연결하고 범위 결함이 있는 결함 항목을 확인하고, 결함 항목을 표시하고, 결함 기록을 표시합니다.

labucs111-cmp1-11 /fault # show fault-entries 
Time Severity Description ------------------------- ------------- --------------------------------------- 
2021-03-26T10:10:10 major "DDR4_P1_C1_ECC: DIMM 19 is inoperable : Check or replace DIMM"

LABCP0222-Server22-02 /fault # show fault-history
Time                Severity      Source          Cause                     Description                             
------------------- ------------- --------------- ------------------------- ----------------------------------------
2021 Dec 10 02:02:02 UTC info          %CIMC           EQUIPMENT_INOPERABLE      "[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared "
2021 Dec 1 01:01:01 UTC critical      %CIMC           EQUIPMENT_INOPERABLE      "[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status. "

개정 이력

개정	게시 날짜	의견
1.0	26-May-2022	최초 릴리스

Cisco 엔지니어가 작성

Cinthia Janneth Martinez
Cisco TAC 엔지니어
Nebojsa Kosanovic
Cisco TAC 엔지니어

이 문서가 도움이 되셨습니까?

피드백

지원 문의

지원 케이스 접수
(시스코 서비스 계약 필요)