Table of Contents 1 Introduction to the Linux Kernel Along Came Liuns:Introduction to Linux Overview of Operation Systems and Kernels Linux Versrs Classic Unix Kernels Linux Kerner Verisions The Linux Kernel Development Community Before We Begin 2 Gettion Started with the Kernel Obtaining the Kernel Source Installing the Kernel Source Using Patches The Kernel Source Tree Building the Kernel Mini8mixing Build Noise Spawning Multiple Build Jobs Installing the Kernel A Beast of a Different Nature No Libc GNUC No Memory Protection No(Easy)Use of Floating Point Small,Fixed-Size Stack Synchronization and Concurrency Portability Is Important So Here We Are 3 Process Management Process Descriptor and the Task Structure Alloacting the Process Descriptor Storing the Process Descriptor Process State Manipulation the Current Process State Process Context The Process Family Tree Process Creation Copy-on-Write fork() vford() The Linux Implementation of Threads Kernel Threads Process Termination Removal of the Process Descriptor The Dilemma of the Parentless Task Process Wrap Up 4 Process Scheduling Policy I/O-Bound Versus Processor-Bound Processes Process Priority Timeslice Process Preemption The Scheduling Policy in Action The Linux Scheduling Algorithm Rnnqueues schedule() Calculating Priority and Timeslice Sleeping and Waking Up The Load Balancer Preemption and Context Switching User Preemption Kernel Preemption Real-Time Scheduler-Related System Calls Scheduling Policy and Priority-Related System Calls Yielding Processor Time 62 Scheduler Finale 62 System Calls 63 APIs, POSIX, and the C Library 64 Syscalls 65 System Call Numbers 65 System Call Performance 66 System Call Handler 66 Denoting the Correct System Call 67 Parameter Passing 67 System Call Implementation 68 Verifying the Parameters 68 System Call Context 70 Final Steps in Binding a System Call 71 Accessing the System Call from User-Space 72 Why Not to Implement a System Call 73 System Calls in Conclusion 74 Interrupts and Interrupt Handlers 75 Interrupts 75 Interrupt Handlers 76 Top Halves Versus Bottom Halves 77 Registering an Interrupt Handler 77 Freeing an Interrupt Handler 79 Writing an Interrupt Handler 80 Shared Handlers 81 A Real-Life Interrupt Handler 82 Interrupt Context 84 Implementation of Interrupt Handling 85 /proc/interrupts 87 Interrupt Control 88 Disabling and Enabling Interrupts 89 Disabling a Specific Interrupt Line 90 Status of the Interrupt System 91 Don't Interrupt Me;We're Almost Done! 92 Bottom Halves and Deferring Work 93 Bottom Halves 94 Why Bottom Halves? 94 A World of Bottom Halves 95 Softirqs 97 Implementation of Softirqs 97 Using Softirqs 100 Tasklets 101 Implementation of Tasklets 101 UsingTasklets 104 ksoftirqd 105 The Old BH Mechanism 107 Work Queues 108 Implementation of Work Queues 108 UsingWork Queues 111 The Old Task Queue Mechanism 114 Which Bottom Half Should I Use? 115 Locking Between the Bottom Halves 116 Disabling Bottom Halves 116 The Bottom of Bottom-Half Processing 118 Kernel Synchronization Introduction 119 Critical Regions and Race Conditions 120 Why Do We Need Protection? 120 Locking 122 What Causes Concurrency, Anyway? 124 So, How Do I Know What Needs Protecting?125 Deadlocks 126 Contention and Scalability 128 Locking andYour Code 130 9 Kernel Synchronization Methods 131 Atomic Operations 131 Atomic Integer Operations 132 Atomic Bitwise Operations 134 Spin Locks 137 Other Spin Lock Methods 140 Spin Locks and Bottom Halves 140 Reader-Writer Spin Locks 141 Semaphores 143 Creating and Initializing Semaphores 145 Using Semaphores 145 Reader-Writer Semaphores 146 Spin Locks Versus Semaphores 148 Completion Variables 148 BKL:The Big Kernel Lock 149 Seq Locks 150 Preemption Disabling 151 Ordering and Barriers 153 Synchronization Summarization 156 10 Timers and Time Management 157 Kernel Notion of Time 158 The Tick Rate: HZ 158 The Ideal HZValue 160 Jiffies 162 Internal Representation of Jiffies 163 Jiffies Wraparound 164 User-Space and HZ 165 Hardware Clocks and Timers 166 Real-Time Clock 166 System Timer 167 The Timer Interrupt Handler 167 TheTime of Day 169 Timers 171 Using Timers 172 Timer Race Conditions 173 The Timer Implementation 174 Delaying Execution 174 Busy Looping 175 Small Delays 176 schedule_timeout0 177 Out of Time 179 11 Memory Management 181 Pages 381 Zones 183 Getting Pages 185 Getting Zeroed Pages 186 Freeing pages 186 kmalloc() 187 gfp_mask Flags 188 kfree0 192 vmalloc() 193 Slab Layer 194 Design of the Slab Layer 195 Slab Allocator Interface 198 Statically Allocating on the Stack 201 Playing Fair on the Stack 202 High Memory Mappings 202 Permanent Mappings 203 Temporary Mappings 203 Per-CPU Allocations 204 The New percpu Interface 205 Per-CPU Data at Compile-Time 205 Per-CPU Data at Runtime 206 Reasons for Using Per-CPU Data 207 Which Allocation Method Should I Use? 208 12 The Virtual Filesystem 209 Common Filesystem Interface 209 Filesystem Abstraction Layer 210 Unix Filesystems 211 VFS Objects andTheir Data Structures 212 OtherVFS Objects 213 The Superblock Object 213 Superblock Operations 214 The lnode Object 217 Inode Operations 219 The Dentry Object 222 Dentry State 223 The Dentry Cache 223 Dentry Operations 224 The File Object 226 File Operations 227 Data Structures Associated with Filesystems 231 Data Structures Associated with a Process 232 Filesystems in Linux 234 13 The Block I/O Layer 235 Anatomy of a Block Device 236 Buffers and Buffer Heads 237 The bio structure 239 The OldVersus the New 242 Request Queues 242 Requests 243 I/O Schedulers 243 The Job of an I/O Scheduler 244 The Linus Elevator 244 The Deadline I/O Scheduler 245 The Anticipatory I/O Scheduler 247 The Complete Fair Queuing I/O Scheduler 248 The Noop I/O Scheduler 249 I/O Scheduler Selection 249 Summary 250 14 The Process Address Space 251 The Memory Descriptor 252 Allocating a Memory Descriptor 254 Destroying a Memory Descriptor 255 The mm_struct and KernelThreads 255 Memory Areas 255 VMA Flags 257 VMA Operations 258 Lists and Trees of Memory Areas 259 Memory Areas in ILeal Life 260 Manipulating Memory Areas 261 find_vma0 262 find_vma_prev0 263 find_vma_intersection0 263 mmap0 and do_mmap0: Creating an Address Interval 264 The mmap0 System Call 265 munmap0 and do_munmap0: P,.emoving an Address Interval 266 The munmap0 System Call 266 Page Tables 266 Conclusion 268 15 The Page Cache and Page Writeback 269 Page Cache 270 The address_space Object 270 P~adix Tree 273 The Old Page Hash Table 273 The Buffer Cache 273 The pdflush Daemon 274 Laptop Mode 275 bdflush and kupdated 276 Congestion Avoidance: Why We Have Multiple Threads 276 To Make a Long Story Short 277 16 Modules 279 Hello, World! 279 Building Modules 281 At Home in the Source Tree 281 Living Externally 282 Installing Modules 283 Generating Module Dependencies 283 Loading Modules 283 Managing Configuration Options 284 Module Parameters 286 Exported Symbols 288 Wrapping Up Modules 289 17 kobjects and sysfs 291 kobjects 292 ktypes 293 ksets 293 Subsystems 294 Structure Confusio 294 Managing and Manipulating kobjects 295 Reference Counts 296 krefs 297 sysfs 298 Adding and Removing kobjects from sysfs 300 Adding Files to sysfs 300 The Kernel Events Layer 303 kobjects and sysfs in a Nutshell 305 18 Debugging 307 WhatYou Need to Start 307 Bugs in the Kernel 308 printk0 308 The Robustness of printk0 309 Loglevels 309 The Log Buffer 310 syslogd and ldogd 311 A Note About printk0 and Kernel Hacking311 Oops 311 ksymoops 313 kallsyms 313 Kernel Debugging Options 313 Atomicity Debugging 314 Asserting Bugs and Dumping Information 314 Magic SysRq Key 315 The Saga of a Kernel Debugger 316 gdb 316 kgdb 317 kdb 317 Poking and Probing the System 318 Using UID as a Conditional 318 Using Condition Variables 318 Using Statistics 318 Rate LimitingYour Debugging 319 Binary Searching to Find the Culprit Change 320 When M1 Else Fails:The Community 320 19 Portability 321 History of Portability in Linux 322 Word Size and Data Types 323 Opaque Types 325 Special Types 326 Explicitly Sized Types 326 Signedness of Chars 327 Data Alignment 328 Avoiding Alignment Issues 328 Alignment of Nonstandard Types 329 Structure Padding 329 Byte Order 330 History of Big- and Little-Endian 332 Byte Ordering in the Kernel 332 Time 332 Page Size 333 Processor Ordering 334 SME Kernel Preemption, and High Memory 334 Portability Is Fun 334 20 Patches, Hacking, and the Community 335 The Community 335 Linux Coding Style 33.6 Indention 336 Braces 336 Line Size 337 Naming 338 Functions 338 Comments 338 Typedefs 339 Using What Is Already Provided 339 No ifdefs in the Source 340 Structure Initializers 340 Fixing Code Up Ex Post Facto 340 Chain of Command 341 Submitting Bug Reports 341 Generating Patches 342 Submitting Patches 343 Conclusion 343 A Linked Lists 345 Circular Linked Lists 346 Moving Through a Linked List 347 The Linux Kernel's Implementation 347 The Linked-List Structure 348 Manipulating Linked Lists 349 Traversing Linked Lists 351 B Kernel Random Number Generator 353 Design and Implementation 354 The Dilemma of System Startup 356 Interfaces to Input Entropy 356 Interfaces to Output Entropy 357 C Algorithmic Complexity 359 Algorithms 359 Big-O Notation 359 Big Theta Notation 360 Putting It All Together 360 Perils of Time Complexity 361 Bibliography and Reading List 363 Books on Operating System Design 363 Books on Unix Kernels 364 Books on Linux Kernels 364 Books on Other Kernels 364 Books on the UnixAPI 365 Books on the C Programnming Language 365 Other Works 365 Websites 366 Index 367