No tags found on this site yet.

Operating Systems 101: Linux Internals and Process Management

Operating Systems 101: Linux Internals and Process Management


User Space vs. Kernel Space

Linux operates in two main execution contexts:

  • User Space: Where user applications run. Processes here are restricted—they can’t directly interact with hardware or other processes’ memory.
  • Kernel Space: The privileged space where the Linux kernel runs. It controls hardware, manages resources, enforces security, and handles communication between processes and devices.

To interact with hardware (like opening files or creating sockets), user-space programs must go through the kernel.


System Calls: The Interface Between Worlds

System calls (syscalls) are the gateway between user space and kernel space.

Example:

  • When you run ls, it eventually uses syscalls like openat(), getdents64(), and write() to list and print directory contents.
  • These syscalls are typically wrapped in C library functions (glibc), like opendir() or read().

Analogy: User space is the restaurant guest; kernel space is the kitchen. Syscalls are how you place an order—you can’t go cook it yourself.


Privilege Rings (CPU Security Levels)

Linux uses hardware-supported privilege levels (rings):

Ring 0: Kernel code (Most privileged)
Ring 1–2: Device drivers (rare) (Intermediate)
Ring 3: User applications (Least privileged)

Only Ring 0 can execute privileged CPU instructions. All syscalls switch from Ring 3 to Ring 0 temporarily.


Process Management: The Lifecycle

What is a Process?

A process is an instance of a program running in memory. It has:

  • A unique Process ID (PID)
  • Two stacks: user and kernel
  • Four memory segments: text (code), data, heap, and stack

Process Creation: fork() and execve()

When you run ls, here’s what happens:

  1. The shell calls fork() to create a child process.
  2. The child calls execve("/bin/ls", ["ls"], envp) to replace its memory with the ls binary.
  3. The parent waits with wait() until the child exits.

Copy-on-Write (CoW): fork() doesn’t duplicate memory immediately—only when parent/child write to it.

Process Termination

  • exit() cleans up a process and marks it as zombie until the parent collects its status via wait().
  • If the parent doesn’t call wait(), the zombie remains until adopted by systemd (PID 1).

Signals and kill()

Sending a signal (e.g., via kill command) results in the kernel delivering that signal to the target process, if permitted.


Process States

States you might see in ps or top:

R - Running or runnable: TASK_RUNNING
S - Sleeping (interruptible): TASK_INTERRUPTIBLE
D - Uninterruptible sleep (I/O wait): TASK_UNINTERRUPTIBLE
Z - Zombie (terminated, not reaped): EXIT_ZOMBIE
T - Stopped or traced (e.g., SIGSTOP): TASK_STOPPED / TASK_TRACED


Process Memory Layout

Linux process memory is organized as:

  • Text segment: Read-only, contains executable code
  • Data segment: Global/static variables
  • Heap: Dynamically allocated memory (malloc)
  • Stack: Function call frames, local variables

Stack grows downward, heap grows upward. Collision = stack overflow.


Threads vs Processes

| Feature     | Threads                       | Processes                  |
|-------------|-------------------------------|----------------------------|
| Memory      | Shared                        | Isolated                   |
| Overhead    | Low (lightweight)             | Higher                     |
| Security    | Lower (shared memory)         | Higher                     |
| Switching   | Fast                          | Slower (context switching) |

Threads are great for performance, processes are better for isolation.



Memory Management

Types of Memory

  • Anonymous: Allocated by programs (malloc)
  • File-backed: Used for caching disk files
  • Active/Inactive: Reflect whether memory is currently being accessed

Check with:

cat /proc/meminfo

Virtual Memory and Paging

Linux provides each process its own virtual memory space, mapped to physical memory via page tables.

  • Page Faults: Triggered when a program accesses a page not currently in RAM.
    • Valid but swapped out → kernel loads from disk.
    • Invalid → segmentation fault (SIGSEGV).
    • Unauthorized → protection fault.

Swapping

When memory is low:

  • kswapd frees memory
  • pdflush writes dirty pages
  • Pages are moved to swap space and restored later

File and Device Access

Devices as Files

Linux treats devices as files under /dev.

$ ls -l /dev
crw-rw---- 1 root tty 4, 1 /dev/tty1 # Character device
brw-rw---- 1 root disk 8, 0 /dev/sda # Block device
  • c: Character device (e.g., keyboard)
  • b: Block device (e.g., disks)
  • Major/Minor numbers identify device driver and instance

Permissions and SUID/SGID

When a syscall like open() is made, the kernel checks:

  • EUID/EGID of the process
  • UID/GID and permission bits of the file
  • Access Control Lists (ACLs) if present

SUID/SGID Execution:

  • If the file has SUID/SGID bits:
    • Kernel sets process’s EUID/EGID to match file’s owner/group before execution
    • Enables temporary privilege escalation

Inter-Process Communication (IPC)

Methods:

  1. Files: Simple but inefficient
  2. Pipes: Stream data between processes (ps aux | grep ssh)
  3. Shared Memory: Fastest, requires synchronization
  4. Message Queues: Structured message passing
  5. Sockets: Network communication or local IPC

Synchronization: Mutexes and Semaphores

  • Mutex: Ensures only one thread/process accesses a resource at a time
  • Semaphore: Allows controlled access by multiple entities using a counter

I/O Types and Monitoring

  • Sequential I/O: Efficient for large reads (e.g., logs, backups)
  • Random I/O: Frequent small reads/writes (e.g., databases)

Monitor with:

iostat -x

Tools: strace for Syscall Tracing

strace shows you what syscalls a process makes:

strace ls
strace -e open,read,write cat /etc/hosts
strace -p <PID>

Useful for debugging file access, permission issues, and syscall failures (EACCES, ENOENT, etc.).


Hands-On Challenges

  1. Use strace -o trace.txt ls -l /etc and identify:
    • execve, openat, getdents64, write
  2. Try to create a permission-denied error:
    strace touch /root/test
    

    Look for EACCES.

  3. Trace a pipeline:
    strace -f bash -c "ps aux | grep systemd"
    

    Identify use of pipe(), fork(), execve(), dup2().

Further Reading

  • Linux /proc Documentation: https://www.kernel.org/doc/html/latest/filesystems/proc.html
  • Peeking into Linux Kernel with /proc: https://tanelpoder.com/2013/02/21/peeking-into-linux-kernel-land-using-proc-filesystem-for-quickndirty-troubleshooting/
  • man 2 fork, man 2 execve, man strace, man proc