侵入式链表(Intrusive Linked List)

设计思想

常用的链表是非侵入式链表，它的每个节点包含数据和指向下一个节点的指针（和一个指向前一个节点的指针）

struct ListNode {
    int data;
    struct ListNode *next;
    struct ListNode *prev;
}ListNode;

在这种链表结构中，data是固定的，即一个链表中，每个节点存储的数据类型必须相同，这样的链表泛化能力比较差。

注：C++语言可以使用模板实现通用：
1
2
3
4
5
template <typename T>
struct ListNode {
        struct ListNode *next; // link区域
        T data; // data区域
};
但这只不过是将重写代码的工作交给编译器完成，本质上数据和链表仍然是耦合的。

侵入式链表是在其内部直接包含链表节点：

struct ListLink {
    struct ListLink *next, *prev;
}ListLink;

struct ListNode {
    data_type data;
    struct ListLink list;
}ListNode;

侵入式指的是对象的内部状态由外部对象管理

在非侵入式链表中，每个节点都是一个独立的对象，包含指向下一个节点的指针，链表本身维护各节点的连接关系（节点自己不知道它在链表中的位置）。因此，无需了解节点的内部结构，只需调用链表提供的方法来插入、删除或遍历节点。

而在侵入式链表中，每个节点都包含指向下一个节点的指针，链接关系由节点维护。因此在访问链表时，必须直接访问节点结构体来获取指向下一个节点的指针。虽然这种做法可能增加了代码的复杂性，但它相较于非侵入式链表提供了更高的性能和泛化能力。

数据的存储与访问

Linux内核代码/include/linux/types.h中定义了一种没有数据域的链表

1
2
3

struct list_head {
	struct list_head *next, *prev;
};

可以很容易的将这种链表结构体包含在任意数据的结构体中，统一组织一个链表中的所有节点，各节点中可以存储不同的数据，例如

struct ListNodeInt {
    int data;
    list_head link;
}ListNodeInt;

struct ListNodeDouble {
    double data;
    list_head link;
}ListNodeDouble;

初始化

完整代码见/include/linux/list.h

/*
 * Circular doubly linked list implementation.
 *
 * Some of the internal functions ("__xxx") are useful when
 * manipulating whole lists rather than single entries, as
 * sometimes we already know the next/prev entries and we can
 * generate better code by using them directly rather than
 * using the generic single-entry routines.
 */

#define LIST_HEAD_INIT(name) { &(name), &(name) }

// 定义并初始化一个链表头
#define LIST_HEAD(name) \
	struct list_head name = LIST_HEAD_INIT(name)

/**
 * INIT_LIST_HEAD - Initialize a list_head structure
 * @list: list_head structure to be initialized.
 *
 * Initializes the list_head to point to itself.  If it is a list header,
 * the result is an empty list.
 */

// 运行时链表初始化，链表头的指针指向自己
static inline void INIT_LIST_HEAD(struct list_head *list)
{
	WRITE_ONCE(list->next, list);
	WRITE_ONCE(list->prev, list);
}

inline将函数内容直接嵌入至函数的调用点，代替跳转。对于频繁调用的函数，通过嵌入可以减少函数调用的开销, 提高程序效率，但只适合代码量较小的函数。

删除节点

在/include/linux/poison.h文件中定义了两个用户不可访问的地址LIST_POISON1和LIST_POISON2

/*
 * Architectures might want to move the poison pointer offset
 * into some well-recognized area such as 0xdead000000000000,
 * that is also not mappable by user-space exploits:
 */
#ifdef CONFIG_ILLEGAL_POINTER_VALUE
# define POISON_POINTER_DELTA _AC(CONFIG_ILLEGAL_POINTER_VALUE, UL)
#else
# define POISON_POINTER_DELTA 0
#endif

/*
 * These are non-NULL pointers that will result in page faults
 * under normal circumstances, used to verify that nobody uses
 * non-initialized list entries.
 */
#define LIST_POISON1  ((void *) 0x100 + POISON_POINTER_DELTA)
#define LIST_POISON2  ((void *) 0x122 + POISON_POINTER_DELTA)

在删除节点后，将其指针指向这两个位置，用来在调试时检测非法操作。

/*
 * Delete a list entry by making the prev/next entries
 * point to each other.
 *
 * This is only for internal list manipulation where we know
 * the prev/next entries already!
 */
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
	next->prev = prev;
	WRITE_ONCE(prev->next, next);
}

/*
 * Delete a list entry and clear the 'prev' pointer.
 *
 * This is a special-purpose list clearing method used in the networking code
 * for lists allocated as per-cpu, where we don't want to incur the extra
 * WRITE_ONCE() overhead of a regular list_del_init(). The code that uses this
 * needs to check the node 'prev' pointer instead of calling list_empty().
 */
static inline void __list_del_clearprev(struct list_head *entry)
{
	__list_del(entry->prev, entry->next);
	entry->prev = NULL;
}

static inline void __list_del_entry(struct list_head *entry)
{
	if (!__list_del_entry_valid(entry))
		return;

	__list_del(entry->prev, entry->next);
}

/**
 * list_del - deletes entry from list.
 * @entry: the element to delete from the list.
 * Note: list_empty() on entry does not return true after this, the entry is
 * in an undefined state.
 */
static inline void list_del(struct list_head *entry)
{
	__list_del_entry(entry);
	entry->next = LIST_POISON1;
	entry->prev = LIST_POISON2;
}

查找节点

因为这种链表的链域仅保存了前后节点的地址，对这种链表进行操作时，无法直接得知内部成员变量的地址偏移量。

Linux内核代码中给出两个函数，用于地址偏移量计算：

offsetof：计算结构体成员变量相对于结构体的偏移
1
#define offsetof(type, member) (size_t) &((type*)0)->member
container_of：通过成员变量，结构体成员的地址以及结构体的类型来获取结构体的首地址
1
2
3
#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \
    (type *)( (char *)__mptr - offsetof(type,member) );})

/**
 * list_entry - get the struct for this entry
 * @ptr:	the &struct list_head pointer.
 * @type:	the type of the struct this is embedded in.
 * @member:	the name of the list_head within the struct.
 */
#define list_entry(ptr, type, member) \
	container_of(ptr, type, member)

/**
 * list_for_each_entry - iterate over list of given type
 * @pos: the type * to use as a loop cursor.
 * @head: the head for your list.
 * @member: the name of the list_struct within the struct.
 *
 * 正向遍历整个链表, 得到list
 */
#define list_for_each_entry(pos, head, member)    \
 for (pos = list_entry((head)->next, typeof(*pos), member); \
      &pos->member != (head);  \
      pos = list_entry(pos->member.next, typeof(*pos), member))

/**
* list_for_each_entry_reverse - iterate backwards over list of given type.
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the list_struct within the struct.
 *
 * 反向遍历整个链表, 得到list
*/
#define list_for_each_entry_reverse(pos, head, member)   \
   for (pos = list_entry((head)->prev, typeof(*pos), member); \
        &pos->member != (head);  \
        pos = list_entry(pos->member.prev, typeof(*pos), member))

应用

在通信场景中，每个消息携带的数据量可能不一样。可以考虑使用链域和不定长的数据域拼接，用于管理不定长消息。

template <unsigned int N>
struct Message {
    int size;
    char data[N];

    static void init(void *Message) {
        reinterpret_cast<Message *>(Message)->size = N;
    }
};

int main() {
    auto node1 = (list_head *) malloc(sizeof(list_head) + sizeof(Message<1024 * 1>));
    auto node2 = (list_head *) malloc(sizeof(list_head) + sizeof(Message<1024 * 2>));
    auto node3 = (list_head *) malloc(sizeof(list_head) + sizeof(Message<1024 * 3>));

    Message<1024 * 1>::init(node1 + 1);
    Message<1024 * 3>::init(node2 + 1);
    Message<1024 * 2>::init(node3 + 1);

    list_head Message_list;

	// ...
    
    free(node1);
    free(node2);
    free(node3);
    return 0;
}

在使用malloc分配内存时, 申请的大小是sizeof(link) + sizeof(data)，这样在物理上数据和只有link的侵入式链表的节点在一块连续的内存上。在链表视角相当于在每个节点下面挂载了一个隐藏的消息数据

        +-------+    +-------+           +-------+
List:   | next  | -> | next  | -> ... -> | next  |
        +-------+    +-------+           +-------+
        +-------+    +-------+           +-------+
Data:   |payload|    |payload|           |payload|
        +-------+    |       |           |       |
                     |       |           +-------+
                     +-------+