kdump失败问题排查

  1. 前言
  2. 原理
  3. 配置kdump
  4. kdump失败排查
    1. 背景
    2. 排查过程
  5. 后续

前言

当前,即使是stable的内核,也不是零bug的。既然有bug,就会在使用时导致问题,严重的甚至会导致宕机。

在Liunx系统中,kdump绝对是排查宕机的好帮手。它的作用是把系统宕机时的内存dump出来,使人们可以根据dump文件分析宕机时的情况,查找宕机原因。

原理

简单来说,就是系统panic的时候,启动一个小内核,这个小内核启动成功之后会把当前的内存进行dump,形成一个vmcore文件。然后就可以通过这个vmcore文件来分析宕机时的情况。

至于panic之后,怎么启动小内核这些代码细节,后续再说(好吧,我还没看)。

配置kdump

kdump的配置主要就是两方面:预留内存+kdump服务

对于centos来说,需要在cmdline添加crashkernel=XXX参数,以便在系统启动时能够预留一部分内存用于kdump小内核启动;系统启动后,需要开启kdump服务,kdump服务的作用之一时会在/boot下制作kdump.img,这是小内核启动时会用的initramfs

对于ubuntu来说,与上面相同,但是服务名字是叫kdump-tools

kdump失败排查

在实际使用中,也许会发现,即使按照上面的内容进行了配置,系统宕机后,也看不到vmcore的产生,因此需要排查看dump为什么会失败?

kdump失败可以先尝试这两个操作(本人遇到的情况中,大多与这两个有关):调大预留内存和重新制作kdump.img。如果排除这两个原因,再去继续尝试定位其他失败原因。

下面搞一个实际排查过程看一下:

背景

该问题的背景是,测试宕机的时候,发现没有生成vmcore,且宕机后不能恢复进到正常的系统中,看串口日志发现一直在下图中进行死循环

这是一张图片

排查过程

是否进入小内核?

首先需要确认的就是,这个死循环是大内核还是小内核的呢?

通过一些日志及其他手段能够确认(在BMC发重启,能够正常进入系统,说明进入大内核的过程没问题;另外,通过串口日志的cmdline也能发现端倪,为了让小内核使用尽可能少的预留内存,在配置kdump过程中,会在cmdline添加一些blacklist,禁止加载一些驱动之类,所以大小内核启动时的cmdline是不同的),这是进入小内核的死循环

调大预留内存和重新制作kdump.img尝试

确认是能够成功进入小内核了,就该直接尝试上面提到的两板斧了。

经过尝试,没用。

串口日志分析

[32m  OK  ] Started Create Static Device Nodes in /dev.
         Starting udev Kernel Device Manager...
[  OK  ] Started udev Kernel Device Manager.
         Starting udev Coldplug all Devices...
         Mounting Configuration File System...
[  OK  ] Mounted Configuration File System.
[  OK  ] Started udev Coldplug all Devices.
         Starting dracut initqueue hook...
[  OK  ] Started dracut initqueue hook.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Initrd Root File System.
         Starting Reload Configuration from the Real Root...
[  OK  ] Started Reload Configuration from the Real Root.
[  OK  ] Reached target Initrd File Systems.
[*     ] A start job is running for dev-disk...233403b7c.device (5s / 1min 30s)
[**    ] A start job is running for dev-disk...233403b7c.device (6s / 1min 30s)
[***   ] A start job is running for dev-disk...233403b7c.device (6s / 1min 30s)
[ ***  ] A start job is running for dev-disk...233403b7c.device (7s / 1min 30s)
[  *** ] A start job is running for dev-disk...233403b7c.device (7s / 1min 30s)
[   ***] A start job is running for dev-disk...233403b7c.device (8s / 1min 30s)
[    **] A start job is running for dev-disk...233403b7c.device (8s / 1min 30s)
[     *] A start job is running for dev-disk...233403b7c.device (9s / 1min 30s)
[    **] A start job is running for dev-disk...233403b7c.device (9s / 1min 30s)
[   ***] A start job is running for dev-disk...33403b7c.device (10s / 1min 30s)
[  *** ] A start job is running for dev-disk...33403b7c.device (10s / 1min 30s)
[ ***  ] A start job is running for dev-disk...33403b7c.device (11s / 1min 30s)
[***   ] A start job is running for dev-disk...33403b7c.device (11s / 1min 30s)
[**    ] A start job is running for dev-disk...33403b7c.device (12s / 1min 30s)
[*     ] A start job is running for dev-disk...33403b7c.device (12s / 1min 30s)
[**    ] A start job is running for dev-disk...33403b7c.device (13s / 1min 30s)
[***   ] A start job is running for dev-disk...33403b7c.device (13s / 1min 30s)
[ ***  ] A start job is running for dev-disk...33403b7c.device (14s / 1min 30s)
[  *** ] A start job is running for dev-disk...33403b7c.device (14s / 1min 30s)
[   ***] A start job is running for dev-disk...33403b7c.device (15s / 1min 30s)
[    **] A start job is running for dev-disk...33403b7c.device (15s / 1min 30s)
[     *] A start job is running for dev-disk...33403b7c.device (16s / 1min 30s)
[    **] A start job is running for dev-disk...33403b7c.device (16s / 1min 30s)
[   ***] A start job is running for dev-disk...33403b7c.device (17s / 1min 30s)
[  *** ] A start job is running for dev-disk...33403b7c.device (17s / 1min 30s)
[ ***  ] A start job is running for dev-disk...33403b7c.device (18s / 1min 30s)
[***   ] A start job is running for dev-disk...33403b7c.device (18s / 1min 30s)
[**    ] A start job is running for dev-disk...33403b7c.device (19s / 1min 30s)
[*     ] A start job is running for dev-disk...33403b7c.device (19s / 1min 30s)
[**    ] A start job is running for dev-disk...33403b7c.device (20s / 1min 30s)
[***   ] A start job is running for dev-disk...33403b7c.device (20s / 1min 30s)
[ ***  ] A start job is running for dev-disk...33403b7c.device (21s / 1min 30s)
[  *** ] A start job is running for dev-disk...33403b7c.device (21s / 1min 30s)
[   ***] A start job is running for dev-disk...33403b7c.device (22s / 1min 30s)
[    **] A start job is running for dev-disk...33403b7c.device (22s / 1min 30s)
[     *] A start job is running for dev-disk...33403b7c.device (23s / 1min 30s)
[    **] A start job is running for dev-disk...33403b7c.device (23s / 1min 30s)
[   ***] A start job is running for dev-disk...33403b7c.device (24s / 1min 30s)
[  *** ] A start job is running for dev-disk...33403b7c.device (24s / 1min 30s)
[ ***  ] A start job is running for dev-disk...33403b7c.device (25s / 1min 30s)
[***   ] A start job is running for dev-disk...33403b7c.device (25s / 1min 30s)
[**    ] A start job is running for dev-disk...33403b7c.device (26s / 1min 30s)

分析一手串口日志,可以发现,这个死循环似乎是在进行盘相关的工作?dev-disk…33403b7c.device??

接下来正常进入系统,查一下有没有这个id?

这是一张图片

是有的,而且正巧这个盘就是系统盘

这是一张图片

所以,这里我们就可以确定,这是小内核挂载系统盘失败,导致小内核启动失败,导致kdump失败

解包kdump.img查看

把异常机器的kdump.img解包,与正常能够kdump成功的机器上的kdump.img进行对比,发现没什么不同,所以这里就彻底排除了kdump.img的问题

那这里就有些怀疑,难不成,启动小内核的过程中,盘的uuid变了?导致fstab中的id识别不到了?

小内核断点

/etc/sysconfig/kdump中的cmdline添加rd.break=initqueue,然后再次触发宕机,这时候小内核应该就在死循环之前,进入emergency shell了,能够使用一些基础shell命令进行排查

在emergency shell中ls -l /dev/disk/by-uuid看一下,发现,根本就没有那个uuid对应的盘。

所以,不是系统盘的uuid变了,而是压根就没有识别到系统盘

为什么呢?因为我这个机器的系统盘是virtio-blk的云盘?

但是根据系统盘的pci bdf(从正常系统中可以拿到)能够看到已经加载了virtio-pci驱动,而且/sys/bus/pci/drivers可以看到virtio相关的驱动也都加载了

所以,这是udev的问题了?难不成是udev没有处理系统盘probe的时候发出的uevent?

重新绑定驱动实验

先看一个nvme盘的unbind bind的现象:可以看出,nvme盘unbind驱动之后,usr看不到了,重新bind之后,usr能看到这个设备了。符合预期

initqueue:/# ls /dev/disk/by-path -l
total 0
lrwxrwxrwx 1 root 0 13 Jul  3 03:51 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root 0  9 Jul  3 03:51 pci-0000:93:00.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root 0 10 Jul  3 03:51 pci-0000:93:00.0-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx 1 root 0 13 Jul  3 03:51 pci-0000:a7:00.0-nvme-1 -> ../../nvme1n1


initqueue:/sys/bus/pci/drivers/nvme# cd /sys/bus/pci/drivers/nvme
initqueue:/sys/bus/pci/drivers/nvme# ls
0000:01:00.0  0000:a7:00.0  bind  module  new_id  remove_id  uevent  unbind
initqueue:/sys/bus/pci/drivers/nvme# echo 0000:01:00.0 > unbind
initqueue:/sys/bus/pci/drivers/nvme# ls /dev/disk/by-path/ -l
total 0
lrwxrwxrwx 1 root 0  9 Jul  3 03:52 pci-0000:93:00.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root 0 10 Jul  3 03:52 pci-0000:93:00.0-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx 1 root 0 13 Jul  3 03:52 pci-0000:a7:00.0-nvme-1 -> ../../nvme1n1

initqueue:/sys/bus/pci/drivers/nvme# echo 0000:01:00.0 > bind
initqueue:/sys/bus/pci/drivers/nvme# ls /dev/disk/by-path/ -l
total 0
lrwxrwxrwx 1 root 0 13 Jul  3 03:54 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root 0  9 Jul  3 03:54 pci-0000:93:00.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root 0 10 Jul  3 03:54 pci-0000:93:00.0-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx 1 root 0 13 Jul  3 03:54 pci-0000:a7:00.0-nvme-1 -> ../../nvme1n1

再来看系统云盘的过程(bdf可以从正常系统中获得)

initqueue:/# cd /sys/bus/pci/drivers/virtio-pci
initqueue:/sys/bus/pci/drivers/virtio-pci# ls
0000:1d:00.0  0000:21:00.0  bind  module  new_id  remove_id  uevent  unbind
initqueue:/sys/bus/pci/drivers/virtio-pci# echo 0000:1d:00.0 > unbind
initqueue:/sys/bus/pci/drivers/virtio-pci# echo 0000:1d:00.0 > bind

initqueue:/sys/bus/pci/drivers/virtio-pci# ls /dev/disk/by-path -l
total 0
lrwxrwxrwx 1 root 0 13 Jul  3 03:55 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root 0  9 Jul  3 03:55 pci-0000:93:00.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root 0 10 Jul  3 03:55 pci-0000:93:00.0-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx 1 root 0 13 Jul  3 03:55 pci-0000:a7:00.0-nvme-1 -> ../../nvme1n1

手动bind驱动,也没有在usr层创建设备,为什么?

udevadm测试

先unbind virtio-pci驱动,然后使用udevadm monitor进行监控,然后再次bind驱动,发现,log中的内容是空的

initqueue:/# cd /sys/bus/pci/drivers/virtio-pci
initqueue:/sys/bus/pci/drivers/virtio-pci# echo 0000:1d:00.0 > unbind
    
udevadm monitor --property > log 2>&1 &
    
initqueue:/sys/bus/pci/drivers/virtio-pci# echo 0000:1d:00.0 > bind

对nvme盘进行上述相同的操作,可以抓到udev事件

说明udev就是没有收到uevent并处理,或者,驱动就没有发送uevent?

再看串口信息

initqueue:/# dmesg |grep -i blk -A 5 -B 5
[   57.333334] xhci_hcd 0000:ec:00.4: request interrupt 251 failed
[   57.333335] xhci_hcd 0000:ec:00.4: startup error -28
[   57.333336] xhci_hcd 0000:ec:00.4: USB bus 3 deregistered
[   57.334354] xhci_hcd 0000:ec:00.4: init 0000:ec:00.4 fail, -28
[   57.334356] xhci_hcd: probe of 0000:ec:00.4 failed with error -28
[   58.031020] virtio_blk virtio0: 1/0/0 default/read/poll queues    ###############这里是init_vq打印的log
[   58.031863] virtio_blk: probe of virtio0 failed with error -28
[   61.036705] systemd-udevd (414) used greatest stack depth: 12288 bytes left
[  145.292866] hugepage is reduced from total 503797 to 0
[  235.510818] hugepage is reduced from total 503797 to 0
[  325.783468] hugepage is reduced from total 503797 to 0

这里可以看到,之前忽略了一些log,virtio-blk驱动probe失败了!

但是驱动为什么probe失败呢?先看err code,-28,ENOSPC,No space left on device

但是,大概看了一下,实在不能确定是哪里返回了ENOSPC

添加debug信息重编virtio-blk

如此,只能往virtblk_probe函数中加点debug日志然后添加到host内核,然后重做kdump.img,看一下能不能缩小排查范围了,过程如下:

先在virtblk_probe函数中每个err后都添加debug信息,重编virtio-blk模块

到异常机器查找find /usr/lib/modules/uname -r -name virtio_blk.ko找到模块位置,备份,然后用新模块替换

删除旧的kdump.img并制作重启kdump服务以制作新的kdump.img

再次宕机进入小内核,查看dmesg

initqueue:/# dmesg |grep blk
[   56.909688] virtio_blk: loading out-of-tree module taints kernel.
[   56.910275] virtio_blk virtio0: 1/0/0 default/read/poll queues
[   56.911186] virtio_blk virtio0: init_vq err is -28
[   56.911188] virtio_blk: probe of virtio0 failed with error -28

如此,可以确定,就是virtblk_probe->init_vq这里失败了,导致驱动probe失败

追踪代码

通过查看init_vq函数体,可以明显发现,只有两处为err=赋返回值的,一处在virtio_cread_feature,一处在virtio_find_vqs

而virtio_cread_feature处失败是绝对不可能的,可以看到,如果这里失败的话,就不可能会在dmesg中打印后面的default/read/poll queues log了

err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
                   struct virtio_blk_config, num_queues,
                   &num_vqs);
    if (err)
        num_vqs = 1;

    if (!err && !num_vqs) {
        dev_err(&vdev->dev, "MQ advertised but zero queues reported\n");
        return -EINVAL;
    }
....
dev_info(&vdev->dev, "%d/%d/%d default/read/poll queues\n",
                vblk->io_queues[HCTX_TYPE_DEFAULT],
                vblk->io_queues[HCTX_TYPE_READ],
                vblk->io_queues[HCTX_TYPE_POLL]);

所以,真相只有一个,那就是在virtio_find_vqs失败了

static inline
int virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs,
            struct virtqueue *vqs[], vq_callback_t *callbacks[],
            const char * const names[],
            struct irq_affinity *desc)
{
    return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, NULL, desc);
}

而这里可以看到其实是调用的config->find_vqs,经过查找,这里应该是vp_find_vqs或vp_modern_find_vqs

大致看了下这俩函数,不管legacy还是modern的吧,出错应该都是在调用vp_find_vqs函数中失败的

int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
        struct virtqueue *vqs[], vq_callback_t *callbacks[],
        const char * const names[], const bool *ctx,
        struct irq_affinity *desc)
{
    int err;

    /* Try MSI-X with one vector per queue. */
    err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, true, ctx, desc);
    if (!err)
        return 0;
    /* Fallback: MSI-X with one vector for config, one shared for queues. */
    err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, false, ctx, desc);
    if (!err)
        return 0;
    /* Is there an interrupt? If not give up. */
    if (!(to_vp_device(vdev)->pci_dev->irq))
        return err;
    /* Finally fall back to regular interrupts. */
    return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
}

这个看上去应该是分配中断之类的?按照函数体,应该是vp_find_vqs_msix、vp_find_vqs_msix……一路失败,走到最后,只能return vp_find_vqs_intx,但是没想到这个也失败了,返回了ENOSPC

其实看这个意思,只要这些函数里能成功一个,能申请到中断就好,但是小内核中一个也没有成功

看一下这个函数

static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
        struct virtqueue *vqs[], vq_callback_t *callbacks[],
        const char * const names[], const bool *ctx)
{
    struct virtio_pci_device *vp_dev = to_vp_device(vdev);
    int i, err, queue_idx = 0;

    vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
    if (!vp_dev->vqs)
        return -ENOMEM;

    err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
            dev_name(&vdev->dev), vp_dev);
    if (err)
        goto out_del_vqs;

    vp_dev->intx_enabled = 1;
    vp_dev->per_vq_vectors = false;
    for (i = 0; i < nvqs; ++i) {
        if (!names[i]) {
            vqs[i] = NULL;
            continue;
        }
        vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
                     ctx ? ctx[i] : false,
                     VIRTIO_MSI_NO_VECTOR);
        if (IS_ERR(vqs[i])) {
            err = PTR_ERR(vqs[i]);
            goto out_del_vqs;
        }
    }

    return 0;
out_del_vqs:
    vp_del_vqs(vdev);
    return err;
}

很明显,只有可能是request_irq失败啦,而request_irq其实是调用request_threaded_irq

不过再往下又不好分析了

好在,debugfs还是能mount上的,mount -t debugfs none /sys/kernel/debug

然后function_graph抓一下request_threaded_irq,如下

initqueue:/# cat log
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)               |  request_threaded_irq() {
 0)   0.381 us    |    irq_to_desc();
 0)               |    kmalloc_trace() {
 0)               |      __kmem_cache_alloc_node() {
 0)   0.080 us    |        should_failslab();
 0)   0.240 us    |      }
 0)   0.491 us    |    }
 0)   0.080 us    |    irq_chip_pm_get();
 0)               |    __setup_irq() {
 0)   0.070 us    |      try_module_get();
 0)   0.091 us    |      mutex_lock();
 0)               |      _raw_spin_lock_irqsave() {
 0)   0.100 us    |        preempt_count_add();
 0)   0.250 us    |      }
 0)               |      irq_activate() {
 0)               |        irq_domain_activate_irq() {
 0)               |          __irq_domain_activate_irq() {
 0)               |            __irq_domain_activate_irq() {
 0)               |              __irq_domain_activate_irq() {
 0)               |                x86_vector_activate() {
 0)   0.060 us    |                  apic_chip_data.part.23();
 0)               |                  _raw_spin_lock_irqsave() {
 0)   0.070 us    |                    preempt_count_add();
 0)   0.200 us    |                  }
 0)   0.061 us    |                  apic_chip_data.part.23();
 0)               |                  assign_irq_vector_any_locked() {
 0)               |                    assign_vector_locked() {
 0)   0.070 us    |                      apic_chip_data.part.23();
 0)   0.120 us    |                      irq_matrix_alloc();
 0)   0.441 us    |                    }
 0)               |                    assign_vector_locked() {
 0)   0.071 us    |                      apic_chip_data.part.23();
 0)   0.090 us    |                      irq_matrix_alloc();
 0)   0.331 us    |                    }
 0)   1.022 us    |                  }
 0)               |                  _raw_spin_unlock_irqrestore() {
 0)   0.061 us    |                    preempt_count_sub();
 0)   0.191 us    |                  }
 0)   1.915 us    |                }
 0)   2.065 us    |              }
 0)   2.175 us    |            }
 0)   2.385 us    |          }
 0)   2.506 us    |        }
 0)   2.646 us    |      }
 0)               |      _raw_spin_unlock_irqrestore() {
 0)   0.090 us    |        preempt_count_sub();
 0)   0.250 us    |      }
 0)   0.080 us    |      mutex_unlock();
 0)   0.080 us    |      module_put();
 0)   4.189 us    |    }
 0)   0.091 us    |    irq_chip_pm_put();
 0)   0.090 us    |    kfree();
 0)               |    kfree() {
 0)   0.191 us    |      __kmem_cache_free();
 0)   0.341 us    |    }
 0)   7.898 us    |  }
 
 

    retval = __setup_irq(irq, desc, action);
    if (retval) {
        irq_chip_pm_put(&desc->irq_data);
        kfree(action->secondary);
        kfree(action);
    }

能够明显看到,__setup_irq之后去调用irq_chip_pm_put,这说明这时候retval肯定就是那个ENOSPC了

对着这个trace再看一下,这里看上去就是分irq分不出来了,看irq_matrix_alloc有没有ENOSPC?

int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
             bool reserved, unsigned int *mapped_cpu)
{
    unsigned int cpu, bit;
    struct cpumap *cm;

    /*
     * Not required in theory, but matrix_find_best_cpu() uses
     * for_each_cpu() which ignores the cpumask on UP .
     */
    if (cpumask_empty(msk))
        return -EINVAL;

    cpu = matrix_find_best_cpu(m, msk);
    if (cpu == UINT_MAX)
        return -ENOSPC;

    cm = per_cpu_ptr(m->maps, cpu);
    bit = matrix_alloc_area(m, cm, 1, false);
    if (bit >= m->alloc_end)
        return -ENOSPC;
    cm->allocated++;
    cm->available--;
    m->total_allocated++;
    m->global_available--;
    if (reserved)
        m->global_reserved--;
    *mapped_cpu = cpu;
    trace_irq_matrix_alloc(bit, cpu, m, cm);
    return bit;

}

再看一下当前的中断情况

initqueue:/# cat /proc/interrupts
           CPU0
  0:         67  IR-IO-APIC   2-edge      timer
  4:       7480  IR-IO-APIC   4-edge      ttyS0
  8:          1  IR-IO-APIC   8-edge      rtc0
  9:          0  IR-IO-APIC   9-fasteoi   acpi
 29:          0  IOMMU-MSI   0-edge      AMD-Vi
 30:          0  IOMMU-MSI   0-edge      AMD-Vi
 31:          0  IOMMU-MSI   0-edge      AMD-Vi
 32:          0  IOMMU-MSI   0-edge      AMD-Vi
 33:          0  IOMMU-MSI   0-edge      AMD-Vi
 34:          0  IOMMU-MSI   0-edge      AMD-Vi
 35:          0  IOMMU-MSI   0-edge      AMD-Vi
 36:          0  IOMMU-MSI   0-edge      AMD-Vi
 38:          0  IR-PCI-MSI 6144-edge      PCIe PME
 39:          0  IR-PCI-MSI 18432-edge      PCIe PME, pciehp
 40:          0  IR-PCI-MSI 20480-edge      PCIe PME, pciehp
 41:          0  IR-PCI-MSI 22528-edge      PCIe PME, pciehp
 42:          0  IR-PCI-MSI 24576-edge      PCIe PME, pciehp
 43:          0  IR-PCI-MSI 116736-edge      PCIe PME
 44:          0  IR-PCI-MSI 118784-edge      PCIe PME
 46:          0  IR-PCI-MSI 13637632-edge      PCIe PME
 47:          0  IR-PCI-MSI 13682688-edge      PCIe PME
 49:          0  IR-PCI-MSI 13748224-edge      PCIe PME
 51:          1  IR-PCI-MSI 14680064-edge      pciehp
 52:          1  IR-PCI-MSI 14696448-edge      pciehp
 54:          1  IR-PCI-MSI 14712832-edge      pciehp
 55:          6  IR-PCI-MSI 14729216-edge      pciehp
 56:          6  IR-PCI-MSI 14745600-edge      pciehp
 57:          6  IR-PCI-MSI 14761984-edge      pciehp
 58:          6  IR-PCI-MSI 14778368-edge      pciehp
 59:          6  IR-PCI-MSI 14794752-edge      pciehp
 60:          6  IR-PCI-MSI 14811136-edge      pciehp
 61:          6  IR-PCI-MSI 14827520-edge      pciehp
 62:          6  IR-PCI-MSI 14843904-edge      pciehp
 63:          6  IR-PCI-MSI 14860288-edge      pciehp
 64:          6  IR-PCI-MSI 14876672-edge      pciehp
 65:          6  IR-PCI-MSI 14893056-edge      pciehp
 66:          6  IR-PCI-MSI 14909440-edge      pciehp
 67:          1  IR-PCI-MSI 14925824-edge      pciehp
 68:          1  IR-PCI-MSI 14942208-edge      pciehp
 69:          6  IR-PCI-MSI 14958592-edge      pciehp
 70:          6  IR-PCI-MSI 14974976-edge      pciehp
 71:          6  IR-PCI-MSI 14991360-edge      pciehp
 72:          6  IR-PCI-MSI 15007744-edge      pciehp
 73:          6  IR-PCI-MSI 15024128-edge      pciehp
 74:          6  IR-PCI-MSI 15040512-edge      pciehp
 75:          6  IR-PCI-MSI 15056896-edge      pciehp
 76:          6  IR-PCI-MSI 15073280-edge      pciehp
 77:          6  IR-PCI-MSI 15089664-edge      pciehp
 78:          6  IR-PCI-MSI 15106048-edge      pciehp
 79:          6  IR-PCI-MSI 15122432-edge      pciehp
 80:          6  IR-PCI-MSI 15138816-edge      pciehp
 81:          6  IR-PCI-MSI 15155200-edge      pciehp
 82:          6  IR-PCI-MSI 15171584-edge      pciehp
 83:          6  IR-PCI-MSI 15187968-edge      pciehp
 84:          6  IR-PCI-MSI 26738688-edge      pciehp
 85:          6  IR-PCI-MSI 26755072-edge      pciehp
 86:          6  IR-PCI-MSI 26771456-edge      pciehp
 87:          6  IR-PCI-MSI 26787840-edge      pciehp
 88:          6  IR-PCI-MSI 26804224-edge      pciehp
 89:          6  IR-PCI-MSI 26820608-edge      pciehp
 90:          6  IR-PCI-MSI 26836992-edge      pciehp
 91:          6  IR-PCI-MSI 26853376-edge      pciehp
 92:          6  IR-PCI-MSI 26869760-edge      pciehp
 93:          6  IR-PCI-MSI 26886144-edge      pciehp
 94:          6  IR-PCI-MSI 26902528-edge      pciehp
 95:          6  IR-PCI-MSI 26918912-edge      pciehp
 96:          6  IR-PCI-MSI 26935296-edge      pciehp
 97:          6  IR-PCI-MSI 26951680-edge      pciehp
 98:          6  IR-PCI-MSI 26968064-edge      pciehp
 99:          6  IR-PCI-MSI 26984448-edge      pciehp
100:          6  IR-PCI-MSI 27000832-edge      pciehp
101:          6  IR-PCI-MSI 27017216-edge      pciehp
102:          6  IR-PCI-MSI 27033600-edge      pciehp
103:          6  IR-PCI-MSI 27049984-edge      pciehp
104:          6  IR-PCI-MSI 27066368-edge      pciehp
105:          6  IR-PCI-MSI 27082752-edge      pciehp
106:          6  IR-PCI-MSI 27099136-edge      pciehp
107:          6  IR-PCI-MSI 27115520-edge      pciehp
108:          6  IR-PCI-MSI 27131904-edge      pciehp
109:          6  IR-PCI-MSI 27148288-edge      pciehp
110:          6  IR-PCI-MSI 27164672-edge      pciehp
111:          6  IR-PCI-MSI 27181056-edge      pciehp
112:          6  IR-PCI-MSI 27197440-edge      pciehp
113:          6  IR-PCI-MSI 27213824-edge      pciehp
114:          6  IR-PCI-MSI 27230208-edge      pciehp
116:          0  IR-PCI-MSI 62396416-edge      PCIe PME
117:          0  IR-PCI-MSI 62441472-edge      PCIe PME, pciehp
118:          0  IR-PCI-MSI 62443520-edge      PCIe PME, pciehp
119:          0  IR-PCI-MSI 62445568-edge      PCIe PME, pciehp
120:          0  IR-PCI-MSI 62447616-edge      PCIe PME, pciehp
122:          0  IR-PCI-MSI 62507008-edge      PCIe PME
124:          0  IR-PCI-MSI 74979328-edge      PCIe PME
125:          0  IR-PCI-MSI 75059200-edge      PCIe PME
126:          0  IR-PCI-MSI 75089920-edge      PCIe PME
127:          0  IR-PCI-MSI 75091968-edge      PCIe PME
129:          0  IR-PCI-MSI 87037952-edge      PCIe PME
130:          0  IR-PCI-MSI 87050240-edge      PCIe PME, pciehp
131:          0  IR-PCI-MSI 87052288-edge      PCIe PME, pciehp
132:          0  IR-PCI-MSI 87054336-edge      PCIe PME, pciehp
133:          0  IR-PCI-MSI 87056384-edge      PCIe PME, pciehp
134:          0  IR-PCI-MSI 87148544-edge      PCIe PME
135:          0  IR-PCI-MSI 87150592-edge      PCIe PME
137:          0  IR-PCI-MSI 100145152-edge      PCIe PME
138:          0  IR-PCI-MSI 100194304-edge      PCIe PME, pciehp
139:          0  IR-PCI-MSI 100196352-edge      PCIe PME, pciehp
141:          0  IR-PCI-MSI 100255744-edge      PCIe PME
143:          0  IR-PCI-MSI 111679488-edge      PCIe PME
145:          0  IR-PCI-MSI 111790080-edge      PCIe PME
147:          0  IR-PCI-MSI 122165248-edge      PCIe PME
148:          0  IR-PCI-MSI 122179584-edge      PCIe PME, pciehp
149:          0  IR-PCI-MSI 122181632-edge      PCIe PME, pciehp
150:          0  IR-PCI-MSI 122275840-edge      PCIe PME
151:          0  IR-PCI-MSI 122277888-edge      PCIe PME
153:          8  IR-PCI-MSI 524288-edge      nvme0q0
155:          0  IR-PCI-MSI 3145728-edge      ahci0
156:          0  IR-PCI-MSI 3145729-edge      ahci1
157:          0  IR-PCI-MSI 3145730-edge      ahci2
158:          0  IR-PCI-MSI 3145731-edge      ahci3
159:          0  IR-PCI-MSI 3145732-edge      ahci4
160:          0  IR-PCI-MSI 3145733-edge      ahci5
161:          0  IR-PCI-MSI 3145734-edge      ahci6
162:          0  IR-PCI-MSI 3145735-edge      ahci7
172:          8  IR-PCI-MSI 87556096-edge      nvme1q0
173:       3165  IR-PCI-MSI 524289-edge      nvme0q1
174:       3165  IR-PCI-MSI 87556097-edge      nvme1q1
176:          0  IR-PCI-MSI 3147776-edge      ahci0
177:          0  IR-PCI-MSI 3147777-edge      ahci1
178:          0  IR-PCI-MSI 3147778-edge      ahci2
179:          0  IR-PCI-MSI 3147779-edge      ahci3
180:          0  IR-PCI-MSI 3147780-edge      ahci4
181:          0  IR-PCI-MSI 3147781-edge      ahci5
182:          0  IR-PCI-MSI 3147782-edge      ahci6
183:          0  IR-PCI-MSI 3147783-edge      ahci7
193:       1549  IR-PCI-MSI 77070336-edge      ahci0
195:          0  IR-PCI-MSI 77070338-edge      ahci2
196:          0  IR-PCI-MSI 77070339-edge      ahci3
197:          0  IR-PCI-MSI 77070340-edge      ahci4
198:          0  IR-PCI-MSI 77070341-edge      ahci5
199:          0  IR-PCI-MSI 77070342-edge      ahci6
200:          0  IR-PCI-MSI 77070343-edge      ahci7
210:          0  IR-PCI-MSI 77072384-edge      ahci0
211:          0  IR-PCI-MSI 77072385-edge      ahci1
212:          0  IR-PCI-MSI 77072386-edge      ahci2
213:          0  IR-PCI-MSI 77072387-edge      ahci3
214:          0  IR-PCI-MSI 77072388-edge      ahci4
215:          0  IR-PCI-MSI 77072389-edge      ahci5
216:          0  IR-PCI-MSI 77072390-edge      ahci6
217:          0  IR-PCI-MSI 77072391-edge      ahci7
227:          0  IR-PCI-MSI 90177536-edge      ahci0
228:          0  IR-PCI-MSI 90177537-edge      ahci1
229:          0  IR-PCI-MSI 90177538-edge      ahci2
230:          0  IR-PCI-MSI 90177539-edge      ahci3
231:          0  IR-PCI-MSI 90177540-edge      ahci4
232:          0  IR-PCI-MSI 90177541-edge      ahci5
233:          0  IR-PCI-MSI 90177542-edge      ahci6
234:          0  IR-PCI-MSI 90177543-edge      ahci7
244:          0  IR-PCI-MSI 90179584-edge      ahci[0000:ac:00.1]
246:          0  IR-PCI-MSI 124256256-edge      ahci[0000:ed:00.0]
248:          0  IR-PCI-MSI 124258304-edge      ahci[0000:ed:00.1]
250:          0  IR-PCI-MSI 2629632-edge      xhci_hcd
NMI:          2   Non-maskable interrupts
LOC:      87976   Local timer interrupts
SPU:          0   Spurious interrupts
PMI:          2   Performance monitoring interrupts
IWI:          0   IRQ work interrupts
RTR:          0   APIC ICR read retries
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
TRM:          0   Thermal event interrupts
THR:          0   Threshold APIC interrupts
DFR:          0   Deferred Error APIC interrupts
MCE:          0   Machine check exceptions
MCP:          0   Machine check polls
ERR:          0
MIS:          0
PIN:          0   Posted-interrupt notification event
NPI:          0   Nested posted-interrupt event
PIW:          0   Posted-interrupt wakeup event
vector_matrix = irq_alloc_matrix(NR_VECTORS, FIRST_EXTERNAL_VECTOR,
                     FIRST_SYSTEM_VECTOR);
                     
irq_alloc_matrix设置m->alloc_end为FIRST_SYSTEM_VECTOR
  
    
可以看到在irq_matrix_alloc的比较里,是比较的bit和m->alloc_end
  
  
#ifdef CONFIG_X86_LOCAL_APIC
#define FIRST_SYSTEM_VECTOR        LOCAL_TIMER_VECTOR
#else
#define FIRST_SYSTEM_VECTOR        NR_VECTORS
#endif

我们配置了CONFIG_X86_LOCAL_APIC,所以
#define LOCAL_TIMER_VECTOR        0xec


也就是m->alloc_end是0xec,即236?

所以,这里就可以得出结论:virtio blk设备申请中断,分配失败,返回了ENOSPC,导致系统盘没有成功挂上,导致小内核启动失败。而中断向量不足的根因就是:kdump小内核的cmdline中配置了nr_cpus=1即使用一个cpu启动,所以所有设备中断全都分配到这一个cpu上,导致本cpu的中断向量不足够分配了

所以解决方法也简单,多使用几个cpu启动小内核就好了

后续

一个有用的网站:https://fedoraproject.org/wiki/How_to_debug_Dracut_problems

介绍了一些debug dracut的方法,看上去比较有用的有这几种:

##把这个添加到cmdline,如果找不到根,会显示shell
rd.shell		

##这个是真滴强,感谢这个,让我定位了上面的问题
##把这个加到cmdline中,可以在指定的阶段断点,进入emergency shell中,进行调试
##可以使用egrep 'rd.?break' /usr/lib/dracut/modules.d/99base/init.sh来确定支持的断点
rd.break=[cmdline|pre-udev|pre-trigger|initqueue|pre-mount|mount|pre-pivot|cleanup]

另外,对于kdump的调试,比如,进入断点的shell中只能使用一些基础命令,如果想使用一些高级命令该怎么做呢?这个可以看一下/etc/kdump.conf中的extra_bins,这个看上去是可以把一些binary打包到kdump.img中,看上去挺实用的,虽然我尝试传入lspci失败了


转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 857879363@qq.com