<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Pod Issues on Huawei</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/</link><description>Recent content in Pod Issues on Huawei</description><generator>Hugo</generator><language>en</language><copyright>Copyright © 2025 Huawei Technologies Co., Ltd. All rights reserved.</copyright><atom:link href="https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/index.xml" rel="self" type="application/rss+xml"/><item><title>After a Worker Node in the Cluster Breaks Down and Recovers, Pod Failover Is Complete but the Source Host Where the Pod Resides Has Residual Drive Letters</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/after-a-worker-node-in-the-cluster-breaks-down-and-recovers-pod-failover-is-complete-but-the-source/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/after-a-worker-node-in-the-cluster-breaks-down-and-recovers-pod-failover-is-complete-but-the-source/</guid><description>&lt;h2 id="en-us_topic_0000001133091104_section1566717121452">Symptom&lt;/h2>
&lt;p>A Pod is running on worker node A, and an external block device is mounted to the Pod through CSI. After worker node A is powered off abnormally, the Kubernetes platform detects that the node is faulty and switches the Pod to worker node B. After worker node A recovers, the drive letters on worker node A change from normal to faulty.&lt;/p>
&lt;h2 id="en-us_topic_0000001133091104_section87566339513">Environment Configuration&lt;/h2>
&lt;p>Kubernetes version: 1.18 or later&lt;/p></description></item><item><title>When a Pod Is Created, the Pod Is in the ContainerCreating State</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/when-a-pod-is-created-the-pod-is-in-the-containercreating-state/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/when-a-pod-is-created-the-pod-is-in-the-containercreating-state/</guid><description>&lt;h2 id="en-us_topic_0000001163875516_section1566717121452">Symptom&lt;/h2>
&lt;p>A Pod is created. After a period of time, the Pod is still in the &lt;strong>ContainerCreating&lt;/strong> state. Check the log information (for details, see 
&lt;a href="https://huawei.github.io/css-docs/css-docs/en/docs/common-o-m-operations/collecting-information/viewing-huawei-csi-logs/">Viewing Huawei CSI Logs&lt;/a>
). The error message &amp;ldquo;Fibre Channel volume device not found&amp;rdquo; is displayed.&lt;/p>
&lt;h2 id="en-us_topic_0000001163875516_section1425013451056">Root Cause Analysis&lt;/h2>
&lt;p>This problem occurs because residual disks exist on the host node. As a result, disks fail to be found when a Pod is created next time.&lt;/p>
&lt;h2 id="en-us_topic_0000001163875516_section164471213145410">Solution or Workaround&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Use a remote access tool, such as PuTTY, to log in to any master node in the Kubernetes cluster through the management IP address.&lt;/p></description></item><item><title>A Pod Is in the ContainerCreating State for a Long Time When It Is Being Created</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-is-in-the-containercreating-state-for-a-long-time-when-it-is-being-created/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-is-in-the-containercreating-state-for-a-long-time-when-it-is-being-created/</guid><description>&lt;h2 id="en-us_topic_0000001279996521_section1566717121452">Symptom&lt;/h2>
&lt;p>When a Pod is being created, the Pod is in the &lt;strong>ContainerCreating&lt;/strong> state for a long time. Check the huawei-csi-node log (for details, see 
&lt;a href="https://huawei.github.io/css-docs/css-docs/en/docs/common-o-m-operations/collecting-information/viewing-huawei-csi-logs/">Viewing Huawei CSI Logs&lt;/a>
). No Pod creation information is recorded in the huawei-csi-node log. After the &lt;strong>kubectl get volumeattachment&lt;/strong> command is executed, the name of the PV used by the Pod is not displayed in the &lt;strong>PV&lt;/strong> column. After a long period of time (more than ten minutes), the Pod is normally created and the Pod status changes to &lt;strong>Running&lt;/strong>.&lt;/p></description></item><item><title>A Pod Fails to Be Created and the Log Shows That the Execution of the mount Command Times Out</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-the-log-shows-that-the-execution-of-the-mount-command-times-out/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-the-log-shows-that-the-execution-of-the-mount-command-times-out/</guid><description>&lt;h2 id="en-us_topic_0000001279996521_section1566717121452">Symptom&lt;/h2>
&lt;p>When a Pod is being created, the Pod keeps in the &lt;strong>ContainerCreating&lt;/strong> status. In this case, check the log information of huawei-csi-node (for details, see 
&lt;a href="https://huawei.github.io/css-docs/css-docs/en/docs/common-o-m-operations/collecting-information/viewing-huawei-csi-logs/">Viewing Huawei CSI Logs&lt;/a>
). The log shows that the execution of the mount command times out.&lt;/p>
&lt;h2 id="en-us_topic_0000001279996521_section1425013451056">Root Cause Analysis&lt;/h2>
&lt;p>Cause 1: The configured service IP address is disconnected. As a result, the &lt;strong>mount&lt;/strong> command execution times out and fails.&lt;/p>
&lt;p>Cause 2: For some operating systems, such as Kylin V10 SP1 and SP2, it takes a long time to run the &lt;strong>mount&lt;/strong> command in a container using NFSv3. As a result, the &lt;strong>mount&lt;/strong> command may time out and error message &amp;ldquo;error: exit status 255&amp;rdquo; is displayed. The possible cause is that the value of &lt;strong>LimitNOFILE&lt;/strong> of container runtime containerd is too large (over 1 billion).&lt;/p></description></item><item><title>A Pod Fails to Be Created and the Log Shows That the mount Command Fails to Be Executed</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-the-log-shows-that-the-mount-command-fails-to-be-executed/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-the-log-shows-that-the-mount-command-fails-to-be-executed/</guid><description>&lt;h2 id="section16564369537">Symptom&lt;/h2>
&lt;p>In NAS scenarios, when a Pod is being created, the Pod keeps in the &lt;strong>ContainerCreating&lt;/strong> status. In this case, check the log information of huawei-csi-node (for details, see 
&lt;a href="https://huawei.github.io/css-docs/css-docs/en/docs/common-o-m-operations/collecting-information/viewing-huawei-csi-logs/">Viewing Huawei CSI Logs&lt;/a>
). The log shows that the mount command fails to be executed.&lt;/p>
&lt;h2 id="section135642617536">Root Cause Analysis&lt;/h2>
&lt;p>The possible cause is that the NFS 4.0/4.1/4.2 protocol is not enabled on the storage side. After the NFS v4 protocol fails to be used for mounting, the host does not negotiate to use the NFS v3 protocol for mounting.&lt;/p></description></item><item><title>A Pod Fails to Be Created and Message "publishInfo doesn't exist" Is Displayed in the Events Log</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-message-publishinfo-doesn-t-exist-is-displayed-in-the-events-log/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-message-publishinfo-doesn-t-exist-is-displayed-in-the-events-log/</guid><description>&lt;h2 id="section16564369537">Symptom&lt;/h2>
&lt;p>When a Pod is being created, the Pod keeps in the &lt;strong>ContainerCreating&lt;/strong> state. It is found that the following alarm event is printed for the Pod: &lt;strong>rpc error: code = Internal desc = publishInfo doesn&amp;rsquo;t exist&lt;/strong>&lt;/p>
&lt;h2 id="section135642617536">Root Cause Analysis&lt;/h2>
&lt;p>As required by CSI, when a workload needs to use a PV, the Container Orchestration system (CO system, communicating with the CSI plug-in using RPC requests) invokes the ControllerPublishVolume interface (provided by huawei-csi-controller) in the 
&lt;a href="https://github.com/container-storage-interface/spec/blob/master/spec.md" target="_blank">CSI protocol&lt;/a>
 provided by the CSI plug-in to map the PV, and then invokes the NodeStageVolume interface (provided by huawei-csi-node) provided by the CSI plug-in to mount the PV. During a complete mounting operation, only the huawei-csi-node service receives the NodeStageVolume request. Before that, the huawei-csi-controller service does not receive the ControllerPublishVolume request. As a result, the huawei-csi-controller service does not map the PV volume and does not send the mapping information to the huawei-csi-node service. Therefore, error message &lt;strong>publishInfo doesn&amp;rsquo;t exist&lt;/strong> is reported.&lt;/p></description></item><item><title>After a Pod Fails to Be Created or kubelet Is Restarted, Logs Show That the Mount Point Already Exists</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/after-a-pod-fails-to-be-created-or-kubelet-is-restarted-logs-show-that-the-mount-point-already-exist/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/after-a-pod-fails-to-be-created-or-kubelet-is-restarted-logs-show-that-the-mount-point-already-exist/</guid><description>&lt;h2 id="section16564369537">Symptom&lt;/h2>
&lt;p>When a Pod is being created, the Pod is always in the &lt;strong>ContainerCreating&lt;/strong> state. Alternatively, after kubelet is restarted, logs show that the mount point already exists. Check the log information of huawei-csi-node (for details, see 
&lt;a href="https://huawei.github.io/css-docs/css-docs/en/docs/common-o-m-operations/collecting-information/viewing-huawei-csi-logs/">Viewing Huawei CSI Logs&lt;/a>
). The error information is: &lt;strong>The mount /var/lib/kubelet/pods/xxx/mount is already exist, but the source path is not /var/lib/kubelet/plugins/kubernetes.io/xxx/globalmount&lt;/strong>&lt;/p>
&lt;h2 id="section135642617536">Root Cause Analysis&lt;/h2>
&lt;p>The root cause of this problem is that Kubernetes performs repeated mounting operations.&lt;/p></description></item><item><title>"I/O error" Is Displayed When a Volume Directory Is Mounted to a Pod</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/i-o-error-is-displayed-when-a-volume-directory-is-mounted-to-a-pod/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/i-o-error-is-displayed-when-a-volume-directory-is-mounted-to-a-pod/</guid><description>&lt;h2 id="section16564369537">Symptom&lt;/h2>
&lt;p>When a Pod reads or writes a mounted volume, message &amp;ldquo;I/O error&amp;rdquo; is displayed.&lt;/p>
&lt;h2 id="section135642617536">Root Cause Analysis&lt;/h2>
&lt;p>When a protocol such as SCSI is used, if the Pod continuously writes data to the mount directory, the storage device will restart. As a result, the link between the device on the host and the storage device is interrupted, triggering an I/O error. When the storage device is restored, the mount directory is still read-only.&lt;/p></description></item><item><title>Failed to Create a Pod Because the iscsi_tcp Service Is Not Started Properly When the Kubernetes Platform Is Set Up for the First Time</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/failed-to-create-a-pod-because-the-iscsi_tcp-service-is-not-started-properly-when-the-kubernetes-pla/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/failed-to-create-a-pod-because-the-iscsi_tcp-service-is-not-started-properly-when-the-kubernetes-pla/</guid><description>&lt;h2 id="en-us_topic_0234872004_section1566717121452">Symptom&lt;/h2>
&lt;p>When you create a Pod, error &lt;strong>Cannot connect ISCSI portal *.*.*.*: libkmod: kmod_module_insert_module: could not find module by name=&amp;lsquo;iscsi_tcp&amp;rsquo;&lt;/strong> is reported in the &lt;strong>/var/log/huawei-csi-node&lt;/strong> log.&lt;/p>
&lt;h2 id="en-us_topic_0234872004_section1425013451056">Root Cause Analysis&lt;/h2>
&lt;p>The iscsi_tcp service may be stopped after the Kubernetes platform is set up and the iSCSI service is installed. You can run the following command to check whether the service is stopped.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#4c4f69;background-color:#eff1f5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-fallback" data-lang="fallback">&lt;span style="display:flex;">&lt;span>lsmod | grep iscsi | grep iscsi_tcp
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The following is an example of the command output.&lt;/p></description></item><item><title>A Pod Fails to Be Created and Logs Show That an Initiator Has Been Associated with Another Host</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-logs-show-that-an-initiator-has-been-associated-with-another-host/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-logs-show-that-an-initiator-has-been-associated-with-another-host/</guid><description>&lt;h2 id="en-us_topic_0234872004_section1566717121452">Symptom&lt;/h2>
&lt;p>When a Pod is created using SAN storage, the Pod is always in the &lt;strong>ContainerCreating&lt;/strong> status. The Pod logs report alarm event &amp;ldquo;rpc error: code = Internal desc = initiator xxx is already associated to another host&amp;rdquo;.&lt;/p>
&lt;h2 id="en-us_topic_0234872004_section1425013451056">Root Cause Analysis&lt;/h2>
&lt;p>Cause 1: CSI automatically creates hosts, host groups, and initiators based on certain rules. If the same resources exist on the storage side before CSI is used, conflicts will occur. The possible cause is that the same initiator has been added before CSI is used.&lt;/p></description></item><item><title>A Pod Fails to Be Created and Logs Show "Get DMDevice by alias: dm-x failed"</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-logs-show-get-dmdevice-by-alias-dm-x-failed/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/a-pod-fails-to-be-created-and-logs-show-get-dmdevice-by-alias-dm-x-failed/</guid><description>&lt;h2 id="en-us_topic_0234872004_section1566717121452">Symptom&lt;/h2>
&lt;p>When a Pod is created, the Pod is in the &lt;strong>ContainerCreating&lt;/strong> status for a long time. In addition, the following error message is reported in the logs of huawei-csi-node (for details, see 
&lt;a href="https://huawei.github.io/css-docs/css-docs/en/docs/common-o-m-operations/collecting-information/viewing-huawei-csi-logs/">Viewing Huawei CSI Logs&lt;/a>
):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#4c4f69;background-color:#eff1f5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-fallback" data-lang="fallback">&lt;span style="display:flex;">&lt;span>check device: dm-1 is a partition device failed. error: Get DMDevice by alias:dm-1 failed. error: Can not get DMDevice by alias: dm-1
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="en-us_topic_0234872004_section1425013451056">Root Cause Analysis&lt;/h2>
&lt;p>In the DM-Multipath configuration file, the &lt;strong>user_friendly_names&lt;/strong> parameter is not set to &lt;strong>yes&lt;/strong>.&lt;/p></description></item><item><title>After Pods on the Same Node Are Deleted in a Batch Using the NVMe Protocol, Residual NVMe Links Exist on the Node</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/after-pods-on-the-same-node-are-deleted-in-a-batch-using-the-nvme-protocol-residual-nvme-links-exist/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/after-pods-on-the-same-node-are-deleted-in-a-batch-using-the-nvme-protocol-residual-nvme-links-exist/</guid><description>&lt;h2 id="en-us_topic_0234872004_section1566717121452">Symptom&lt;/h2>
&lt;p>In the NVMe protocol scenario, when pods on the same node are deleted in a batch, the pods are successfully deleted, but the NVMe links on the node are not cleared.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#4c4f69;background-color:#eff1f5;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-fallback" data-lang="fallback">&lt;span style="display:flex;">&lt;span># nvme list-subsys
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>nvme-subsys0 - NQN=nqn.xxx.nvme:nvm-subsystem-sn-xxxxxxx
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>\
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> +- nvme0 tcp traddr=xxx.xxx.xxx.xxx,trsvcid=4420,src_addr=xxx.xxx.xxx.xxx live 
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> +- nvme1 tcp traddr=xxx.xxx.xxx.xxx,trsvcid=4420,src_addr=xxx.xxx.xxx.xxx live 
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="en-us_topic_0234872004_section1425013451056">Root Cause Analysis&lt;/h2>
&lt;p>If the NVMe protocol is used, the device paths on the host are cleared only after the host is unmapped from the storage resources. When multiple pods are mounted to the same volume and the pods are deleted in a batch, the CSI in the NodeUnstageVolume phase (unmount phase) cannot detect the device path cleanup in the subsequent ControllerUnpublishVolume phase (unmap phase). As a result, the NVMe links cannot be cleared in a timely manner.&lt;/p></description></item><item><title>In the SAN HyperMetro Scenario, the Subpath of the Aggregated Disks Corresponding to the Mounted Volume Is Lost</title><link>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/in-the-san-hypermetro-scenario-the-subpath-of-the-aggregated-disks-corresponding-to-the-mounted-volu/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://huawei.github.io/css-docs/en/v4.11.0/troubleshooting/pod-issues/in-the-san-hypermetro-scenario-the-subpath-of-the-aggregated-disks-corresponding-to-the-mounted-volu/</guid><description>&lt;h2 id="en-us_topic_0234872004_section1566717121452">Symptom&lt;/h2>
&lt;p>The subpath of the aggregated disks corresponding to the mounted resource is lost.&lt;/p>
&lt;h2 id="en-us_topic_0234872004_section1425013451056">Root Cause Analysis&lt;/h2>
&lt;p>&lt;strong>Figure 1&lt;/strong> SAN HyperMetro subpath loss&lt;a name="fig18307183116559">&lt;/a>&lt;br>
&lt;img src="https://huawei.github.io/css-docs/css-docs/figures/san-hypermetro-subpath-loss.png" title="san-hypermetro-subpath-loss">&lt;/p>
&lt;p>As shown in 
&lt;a href="#fig18307183116559">Figure 1&lt;/a>
, if the link between the host and the storage device is disconnected due to factors such as HBA/NIC exceptions, switch/network jitter, or storage array service port faults, the host restarts and triggers disk scanning again. In this case, the link to the faulty storage device is disconnected on the host. After the fault is rectified, the link information is lost after the host scans the disks again. As a result, the lost link will not be automatically restored.&lt;/p></description></item></channel></rss>