Search...
Menu

Sangfor MCS Management Guide (1)

Author: Jojo
  1. Background, BackgroundBackgroundBackgroundBackground

During MCS site daily management, multiple teams from HQ will be involved together to accomplish a task, such as new nodes expansion in MCS, public Internet IP segment expansion, MCS site platform upgrade, MCS service issue troubleshooting and so on.

To improve the efficiency of collaborative work, in this document, management guidelines are provided to guide overseas team to work more closely and effective with HQ team in these scenarios.

In some chapters, both English and Chinese will be provided at the same time, especially in the template, so as to make sure the HQ team and local team are in the same page.

This document is not a process guide, but a management guide.

  1. Cloud support system

ybgitsm Integrated system for incident report, MCS service request and so on. The ticket for service request, incident, issues can be raised and followed in this system.
  1. HCI Node expansion

HCI node expansion for HCI cluster of MCS site is the most common site management scenario, the new nodes can be added to an existing HCI cluster, or they will compose a new HCI cluster.

Type of cluster:

  1. management cluster;

  2. shared cluster for tenant;

  3. dedicated cluster for tenant;

    1. Role and Responsibility

The role and responsibilities for node expansion scenario is listed below.

Roles Key Responsibility
FAE

1) raise ticket in the system for HCI node expansion

2) transport servers and components to DC

3) mount the server on the rack, cable connection for the servers and switches.

4) packages prepare and server initial configuration (usually configure IPMI for it)

5) help server team and network team check and adjust port connection

6) test and verification after server expansion

Coordinator

(托管云NOC技术支持)

1) Coordinate team member from Infrastructure Delivery Department and Network Platform Department and SRE team to support and perform node expansion operation, create chatgroup and invite all team members, coordinate the work among them.

Note: can be founded in Wecom by “托管云NOC技术支持”

Infrastructure Delivery Engineer

1) design rack layout, cable connection for server and switches, design network segment and IP address for the server.

2) cable connection checking, server configuration and expansion operation.

Note: team leader is林育佳 (53683), but normally the coordinator will ask the team leader to assign the task to one specific team member to finish it.

Network Engineer 1) configure the network device for the HCI nodes and check the cable connection together.
SRE Engineer 1) verification and check the expansion configuration, make sure no violation again internal standard.
  1. Ticket template

Raise a ticket for HCI node expansion 3 workdays in advance.

  1. Firstly, login to the ybgitsm system, raise incident ticket as the figure details.

图形用户界面, 网站 描述已自动生成

Figure-1 raise incident ticket

  1. Secondly, fill the item contents for the ticket as the table details.

图形用户界面, 文本, 应用程序, 电子邮件 描述已自动生成

Figure-2 incident ticket description

Items Detailed description
1

Title of this ticket.

X1站点托管云扩容HCI节点

X1 can be 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID.

choose the right site name and provide full title for this task.

2

X2, details about the HCI expansion

We want to expand X HCI nodes to the mgmt cluster| shared cluster | dedicated cluster, the name of the cluster is XXX. (provide the name of existing cluster is preferred). Or we want to create a new shared| dedicated cluster for them. And hope to finish the expansion before yyyy.mm.dd.

Note: If you cannot type in Chinese, please provide the original request in English, then provide Chinese translation below.

3 Data center, this can be selected from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID, 香港一区.
4 Select 云平台相关/HCI/HCI from the drop-list.
5 P3 default
6 Select 海外托管云 from the drop-list.
7 Select 业务部门提出 from the drop-list
8 Select 服务请求from the drop-list
9

X3站点托管云扩容HCI节点

X3 can be 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID.

10 Time of ticket creation
11 Time of ticket creation
12 Select 二线交付专家组 from the drop-list
13 Select anyone from the drop-list
14 Subnet the ticket.
  1. Send the ticket number to 托管云NOC技术支持 in Wecom and ask him to create a chat group and invite all related team members to join the group. Align with them about the date and time for DC operation, Accessories that need to take to DC, and package that need to download in advance and so on.

    Note: especially for MCS-TH site, it takes about 2 hours to drive to the primary data center, must prepare everything in advance.

  2. After the expansion operation is done with success, perform basic verification testing before allocating the resource on the new cluster to tenant, especially for the dedicated HCI cluster.

  1. Network expansion

Public Internet IP address expansion and Bandwidth expansion are included in this scenario.

  1. Role and Responsibility

The role and responsibilities for network expansion scenario is listed below.

Roles Key Responsibility
FAE

1) raise ticket in the system for network expansion

2) gather information about network device, port, vlan, subnet, gateway and so on.

3) test and verification after network expansion

Coordinator

(托管云NOC技术支持)

1) Coordinate team member from Network Platform Department to perform the network expansion operation, create chatgroup and invite all team members, coordinate the work among them.

Note: can be founded in Wecom by “托管云NOC技术支持”

Network Engineer

1) configure the network device for the network expansion.

2) verification and check the expansion configuration, make sure no violation again internal standard.

  1. Ticket template

Better raise a service request for network expansion 5 workdays in advance.

  1. Firstly, login to the ybgitsm system, raise service request ticket as the figure details.

    图形用户界面, 网站 描述已自动生成

    Figure 3-service request ticket

  2. Seach “网络资源扩容” in box.

    文本 描述已自动生成

    Figure 4-service request type

  3. fill the item contents for the ticket as the table details.

    图形用户界面, 文本, 应用程序, 电子邮件 描述已自动生成

    Figure 5-service request description

Items Detailed description
1

Title of this ticket.

X1托管云站点网络扩容

X1 can be 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID.

choose the right site name and provide full title for this task.

2

Priority for this request.

1) normal ---一般

2) urgent ---紧急

3) extremely urgent ---非常紧急

3 Site: choose from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID.
4 Expected reply date
5

Application Office

choose from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID

6 Data center, this can be selected from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID, 香港一区.
7

Service request type.

1) IP扩容, Public Internet IP address expansion

2) 带宽扩容, Bandwidth expansion

8 Line Name, can be selected from the drop-list.
9

X2详细描述, detailed expansion about the expansion request.

We want to expand public Internet IP address | bandwidth for our MCS site.

Example ---

NEW vlan 1508, 103.115.16.176/28, total 16 IPs
176 network address (gone by network)
177 isp gw (virtual ip) (gone by infra)***
178 router1isp (vrrp) (gone by infra)***
179 router2isp (vrrp) (gone by infra)***
180-190 real usable ip (total 11)
191 broadcast (gone by network)

Note: If you cannot type in Chinese, please provide the original request in English, then provide Chinese translation below.

10 Submit the request
  1. Send the service request number to 托管云NOC技术支持 in Wecom and ask him to create a chat group and invite all related team members to join the group. Align with them about expected delivery date, and the date and time for DC operation, Accessories that need to take to DC if it is required.

  2. After the expansion operation is done with success, perform basic verification testing before allocating the bandwidth or IP address to the tenant.

  1. aDR service

Hybrid Cloud Disaster Recovery is a big sell-point of MCS. During the configuration period, after the local SCP is added to local data center from SCC, it is required that the SRE team must execute a script on the SCC backend platform.

That’s mean, we follow the aDR deployment guide to perform the configuration on MCS site and on-premise site, the SRE team must execute step 6.2.3 after step 6.2.2 , then we are able to execute step 6.2.4.

图形用户界面, 文本, 应用程序 描述已自动生成

  1. Role and Responsibility

Roles Key Responsibility
FAE

1) raise ticket in the system for scripts execution.

2) provide login address, admin and password for local SCP, and remote access to both local SCP and SCP backend of MCS (normally the jumper server in MCS)

Coordinator

(托管云NOC技术支持)

1) Coordinate SRE member from SRE Department to execute the scripts.

Note: can be founded in Wecom by “托管云NOC技术支持”

SRE Engineer

Execute the scripts for the aDR configuration between the MCS and on-premises site.

Note: Team Leader can be founded in Wecom by 常宏建 (23728)

  1. Ticket template

Better raise a service request for aDR backend scripts execution 3 days in advance.

  1. Firstly, login to the ybgitsm system, raise service request ticket as the figure details.

    图形用户界面, 网站 描述已自动生成

    Figure 6-service request ticket

  1. Double-click “托管云SRE服务申请”.

    图片包含 图形用户界面 描述已自动生成

    Figure 7-service request type

  2. fill the item contents for the ticket as the table details.

    图形用户界面, 文本, 应用程序, 电子邮件 描述已自动生成 Figure 8-service request description

Items Detailed description
1

Title of this ticket.

X1托管云对接aDR客户SCP

X1 can be 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID.

choose the right site name and provide full title for this task.

2 Data center, this can be selected from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID, 香港一区.
3 HCI cluster name: MCS HCI cluster that used for aDR with customer’s on premises HCI cluster.
4 HCI cluster IP: MCS HCI cluster IP that used for aDR with customer’s on premises HCI cluster.
5

Details about the request

Example

本区域客户部署一套超融合集群,要配置aDR hybrid cloud容灾,对接本区域托管云。

本地SCP 691 EN 访问地址 https://192.168.0.20:4430 本地HCI版本 691

(on-premises HCI version, SCP version and IP address of SCP)

请协调SRE同事执行容灾对接后台的脚本,支持容灾集群对接,请找我获取该SCP admin的密码( provide admin and password of SCP in private)

6 Group chat “NOC创建”, do not change it.
7 Expected execution date
8 Submit the request
  1. Send the service request number to 托管云NOC技术支持 in Wecom and ask him to create a chat group and invite all related team members to join the group, add 常宏建 (23728) to the chat group and @ him.

  1. Customer Incident and issue

When an incident or issue occurs during MCS daily operation and management , you may be eager to seek help and support from HQ team. Before that, raising a ticket in ybgitsm about the incident or issue is the very first step.

  1. Role and Responsibility

Roles Key Responsibility
FAE

1) raise ticket in the system about the incident or issue.

2) provide basic information about the tenant, such as account name, resource pool IP and name, description of the incident or issues and its phenomenon, screenshot is preferred. If aDR service is involved, admin and password for local SCP, and remote access to both local SCP and SCP backend of MCS (normally the jumper server in MCS) are required too.

Coordinator

(托管云NOC技术支持)

1) Coordinate SRE member, RnD member from all areas to support fixing the incident or issue.

Note: can be founded in Wecom by “托管云NOC技术支持”

Support Engineer 1) Troubleshooting about the incident or issue, find the root cause of it, provide solution to fix it ASAP.
  1. Ticket template

  1. Firstly, login to the ybgitsm system, raise incident ticket as the figure details.

图形用户界面, 网站 描述已自动生成

Figure-9 raise incident ticket

  1. Secondly, fill the item contents for the ticket as the table details.

图形用户界面, 文本, 应用程序, 电子邮件 描述已自动生成

Figure-10 incident ticket description

Items Detailed description
1 Title of this incident, better include site name and customer name
2

details about incident or issue.

Must provide them as in template.

Note: If you cannot type in Chinese, please provide the original request in English, then provide Chinese translation below

3 Data center, this can be selected from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID, 香港一区.
4 Select 云平台相关/HCI/HCI from the drop-list.
5 P3 default
6 Select 海外托管云 from the drop-list.
7 Select 客户反馈 from the drop-list
8 Select 故障 from the drop-list
9 The name of tenant in MCS
10 When the incident or problem occurred
11 First response time
12 Select 二线交付专家组 from the drop-list
13 Select anyone from the drop-list
14 Choose the attachment

Template for item 2

【问题区域】// region and country

   XX海外托管云

XX can be selected from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID

【数据中心】 // data Center

this can be selected from 菲律宾/PH, 泰国/TH supernap, 马来西亚/MY,香港一区, 印度 尼西亚/ID

【集群名称与IP】// HCI version, Cluster name with IP of MCS site

  Cluster_aCloud 172.22.65.1 HCI 670R3EN.

【报障客户信息】// tenant account for the customer

【问题现象】 //phenomenon description

  1. 将 aarch 虚拟机从Azure迁移到本地HCI,系统服务不能正常启动;

  2. 截图如下

Screenshot is preferred.

业务影响情况】 //Business Impact

影响客户当前业务迁移进展

【客户诉求】 //Customer requirements

帮忙客户实现直接将虚拟机从Azure 迁移到MCS上;

【当前排查情况】// Current investigation situation

添加了串口和取消磁盘virtio仍无法启动.

【说明】 // special note or statement.

  1. Send the ticket number to 托管云NOC技术支持 in Wecom and ask him to create a chat group and invite all related team members to join the group, add 常宏建 (23728) to the chat group and @him.

    Remember, details about incident or issue is critical to get fast and quicky responsive from the right team member, otherwise a lot of time will be wasted in identifying the details of it.

  1. MCS upgrade

New function, feature of MCS will be released periodically, and the component of MCS, such as SCC, SCP and HCI need to be upgraded to support these new function and features. This can be achieved by raising ticket to SRE team too.

  1. Role and Responsibility

Roles Key Responsibility
FAE

1) raise ticket in the system for MCS upgrade.

2) notify MCS customers about the potential impact of MCS upgrades operation.

3) coordinate with MCS customer about the candidate date for upgrade and change time window.

4) provide on-site standby and support as requested by SRE Engineer.

5) check the status customer service after upgrade.

Coordinator

(托管云NOC技术支持)

1) coordinate SRE member to design the upgrade plan, perform the upgrade operation, and manage any issue that occurred during upgradation.

Note: can be founded in Wecom by “托管云NOC技术支持”

SRE Engineer

1) evaluate the possible impact of MCS upgrade.

2) design the upgrade plan, perform the upgrade operation, and manage any issue that occurred during upgradation.

Note: Team Leader can be founded in Wecom by 常宏建 (23728)

  1. Ticket template

Better raise a service request for upgrade at least 2 weeks in advance.

  1. Firstly, login to the ybgitsm system, raise service request ticket as the figure details.

    图形用户界面, 网站 描述已自动生成

    Figure 10-service request ticket

  1. Double-click “托管云SRE服务申请”.

    图片包含 图形用户界面 描述已自动生成

    Figure 11-service request type

  2. fill the item contents for the ticket as the table details.

    图形用户界面, 文本, 应用程序, 电子邮件 描述已自动生成

    Figure 12-service request description

Items Detailed description
1

Title of this ticket.

X1托管云升级

X1 can be 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID.

choose the right site name and provide full title for this task.

2 Data center, this can be selected from 菲律宾/PH, 泰国/TH, 马来西亚/MY, 印度尼西亚/ID, 香港一区.
3 Name of HCI clusters need upgrade
4 IP of HCI cluster needs upgrade
5

Details about the upgrade request

Example

请协调人员和资源对本区域的托管云进行升级。
当前托管云版本   SCC 2.2.3 SCP 6932  HCI 670R3
升级后版本          SCC 2.6.0  SCP 10.5.0  HCI 6.10.0

6 Group chat “NOC创建”, do not change it.
7 Expected execution date
  1. Send the service request number to 托管云NOC技术支持 in Wecom and ask him to create a chat group and invite all related team members to join the group, add 常宏建 (23728) and 赵宁(92760) to the chat group and @ them.

  2. Align with the MCS customer about the impact of upgrade, and setup a candidate date for the upgrade operation. Align with SRE team members who will in charge of executing the upgrade operation in the chat group about the date and time for upgrade, and what need to be prepared for the on-site standby and remote support during the upgrade change window.

Share this Article
Previous
POC Guide
Next
测试企微写的文档是否能兼容helplook
Last modified: 2025-03-05Powered by