Intro to C
Compiled Languages
- C compilers (
gcc,llvm, etc.) map C code to architecture-specific machine code before the code can be run- Compare:
- Java converts into architecture independent bytecode, which is compiled as the code runs (JIT)
- Python converts into bytecode at runtime
- Compare:
- C compilation procedure
- Compiling .c files to .o files (machine code object files)
- Linking .o files into excutables (machine code)
- Assembling in the procedure
Advantages
- Fast (optimized for structures)
- Compilation can be optimized for speed
make(make -jto parallel)- Cached unchanged files
- Parallelize compilation
- Python libraries are written in C
C Pre-Processor
C source files first pass through macro processor (CPP) before compiler sees code.
- CPP replaces comments with a single space
- CPP commands
#include "file.h" /* Inserts file.h into output */#include <stdio.h> /* Looks for file in standard location (no difference) */#define PI (3.14159) (Define constant)#if/#endif (Conditionally include text)- This can be used to include lib for different system enviroment
$ gcc --save-temps file.cMacros
CPP macros can be defined to create small "functions".
- Fact: All
#definedoes string replacement #define min(X,Y) ((X) < (Y) ? (X) : (Y))
Macro defined functions can cause problems if it calls another function (and it has a side-effect).
- In the above example,
Xif is a function is called twice
C Language Feature
Hello, World!
#include <stdio.h>
int main(void) {
printf("Hello, World!");
return 0;
}main(void)main function has argumentreturn 0main function return value0means success- other value means failure
| Language | C | Java |
|---|---|---|
| Type of Language | Function Oriented | Object Oriented |
| Programming Unit | Function | Class (Abstract Data Type) |
| Compilation | gcc hello.cCreates machine language code | javac Hello.javaCreates Java virtual machine language bytecode |
| Execution | ./a.outLoads and executes program | java HelloInterprets bytecodes |
| Storage | Manual (malloc, free) | Automatic |
| Libraries | #include <stdio.h> | import java.io.File |
| Comments | /* block comment */// line comment (C99) | Same as C |
| Variables | At beginning of a block | Before use |
| Operators | ... | ... |
| Constants | #defineconst | final |
| Naming Conventions | snake_case | camelCase |
C99 Update
gcc -std=c99printf("%ld\n", __STDC_VERSION__); get 199901
- Declarations in
forloops - Java-like
//comments (to end of line) - Variable-length non-global arrays!
<inttypes.h>: explicit integer types<stdbool.h>: for boolean logic def's
C11 (C18) Update
- Multi-thread support
- Unicode strings and constants
- Removal of
gets() - Type-generic Macros (dispatch based on type)
- Support for complex values
- Static assertions, Execlusive create-and-open, ...
C Syntax
main
To get the main function to accept arguments:
int main(int argc, char *argv[])argccontains the number of strings on the command line- Executable itself counts as one
argvis a pointer to an array containing the arguments as stringsargv[0]to get the own name of program- To system call itself, ...
- To print executable's name (usage string)
#include <stdio.h>
int main(int argc, char *argv[]) {
for (int i = 0; i < argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}$ gcc arguments.c
$ ./a.out 1 hi!
./a.out
1
hi!argc = 3argv = {"./a.out", "1", "hi!"}
Control Flow
Same as Java, C++. (Java, C++ learned from C)
gotoDon't use it.forloops- ANSI C does not allow
for (int i = 0; ... - C99 corrects and allows it
- ANSI C does not allow
Variables and Types
C is a strongly-typed programming language.
- Variables cannot have their types changed after declaration.
- Types help the compiler and your computer determine how to read values
- Also gives more information about the data
- Memory size
- Operation
- All variable declarations must appear before they are used
- All must be at the begining of a block
- A variable may be initialized in its definition
- If not, it holds garbage
Undefined Behaviours
UBs are often characterized as "Heisenbugs"
- Bugs that seem random/hard to reproduce, and seem to disappear or change when debugging
- Cf. "Bohrbugs" are repeatable
Example:
- Variables don't have default values
Boolean
- False values
0(integer)NULL(pointer)- Boolean type
stdbool.h
- True values
- Everything else
Integer
The number of bytes in an int depends on the computer.
intshould b integer type that target processor works with most efficiently- Standard
sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long) sizeof(short) >= 16,sizeof(long) >= 32
Use intN_t and uintN_t instead (stdint.h).
Consts and Enums
Constant is assigned a typed value onece in the declaration. Value can't change during entire execution of program.
const float golden_ratio = 1.618;
const int days_in_week = 7;Enumerate is a group of related integer constants.
enum color {RED, GREEN, BLUE};enum TYPE {VALUES}- To use, declare a variable as
TYPE, and assign it value inVALUES
Typed Functions
- Return type
- Can be any C variable type (integers, pointers, references, ...)
- Can be
void(no return value)
- Parameter type
Variables and functions must be declared before used.
Struct
https://stackoverflow.com/questions/1675351/typedef-struct-vs-struct-definitions Typedef allows you to define new types.
typedef uint8_t BYTE;
BYTE b1, b2;Structs are structured groups of variables.
typedef struct {
int length_in_seconds;
int year_recorded;
} SONG;
SONG song1;
song1.length_in_seconds = 114;
song1.year_recorded = 2007;Bitwise Operations
Bitwise operations perform logical operations for every bit in a number.
&bitwise AND|bitwise OR^bitwise XOR~bitwise NOT
Example
Bitwise shifting move all bits of LHS to left/right by RHS bits.
a << nleft shifta >> nright shift
Example
0b 0110 >> 2
= 0b __01
= 0b 0001Pointers
A pointer stores a memory address and references a variable of its type.
- Usage of
*int *pdeclares thatpis a pointer*pdereference the pointer*pgets the value thatppoints to*p =changes the value thatppoints to
- Usage of
&&xget the address of variablex
Declaring a pointer does not allocate space for something to be pointed to.
int *x;
int y;
x = &y; // Totally legit
*x = 1; // Crashes!Passing values
C does not have reference types C do not have pass by reference.
C always passes parameters by value.
To achieve the effect of passing by reference, use pointers.
void addOne(int *x) {
*x += 1;
}Data Types of Pointers
- Pointers can be any data type
- Pointer can only point to the variable of the same type
- Generic pointer
void *
Function Pointers
int (*fn) (void *, void *) = &foo;fnis a function that accepts twovoid *pointers and returns anint- Initialized to point to
foo
(*fn)(x, y); // call the functionNULL Pointer
char *p = NULLp: 0x00000000
Tests for NULL pointer:
if (!p) {/* p is a null pointer*/ }
if (q) {/* q is not a null pointer*/ }- Read/write from a null pointer crashes the program
Pointer to Structs
typedef struct {
int x;
int y;
} Coord;
Coord coord, *ptr = &coord;
int k;
k = ptr->x;
k = (*ptr).x; // equivlalentArrays
An array is a block of memory.
int arr[2]; // Declares (random value)
int arr_filled[] = {1, 2}; // Declares and initialize
int x = arr_filled[0]; // Array access
x = *(arr_filled + 1) // Pointer arithmeticThe array's signature is a pointer constant, whose value and address is the same.
int *p = arr;- Array bounds are not checked during element access
- Segmentation fault: Address being accessed is invalid
- Bus error: Accessing wrong bit length
- An array is passed to a function as a pointer
void sort(int32_t arr[], uint_32 size);- Declared arrays are only allocated while the scope is valid
char *foo() {
char str[32]; ...
return str; // str is gone
}Specifying the ARRAY_SIZE
Single source of truth
const int ARRAY_SIZE = 10;
int i, a[ARRAY_SIZE];
for (i = 0; i < ARRAY_SIZE; i++) { ... }Pointer Arithmetic
ptr + n- Add
n * sizeof(*ptr)to memory address
- Add
ptr - n- Subtracts
n * sizeof(*ptr)to memory address
- Subtracts
a[i]equivalents*(a + i)
Function to Change Pointers
In order to mutate a pointer in functions, we need a pointer to a pointer.
- Declared as
data_type **h
void increment_ptr(int32_t **h) {
*h = *h + 1;
}C Strings
A C string is an array of characters, followed by a null terminator.
- Null terminator: the byte 0 (number), the
\0character char str[] = "abc"equivchar str[] = {'a', 'b', 'c', '\0'}
The standard C library string.h assumes null-terminated strings.
Memory, Address and Word Alignment
- Modern machines are "byte-addressable"
- Word size: number of bits in an address
- A 64b architecture has 8-byte words
sizeof(int *) == sizeof(char *) == 8
- A C pointer is an abstracted memory address
- Pointer type declaration tells the compiler how many bytes to fetch on each dereference
- Word alignment: only allowing addressing in 8-byte boundaries
- This is also how the memory in
structworks (padding)
- This is also how the memory in
A requirement that data be aligned in memory on natural boundaries. With this restriction, memory access that are not aligned are not allowed.
- x86, RISC-V: No restriction
- MIPS: Does
+-------------------------------------------------------+
| int32_t * | 0xFFFFFFFFFFFFFFF8
+-------------------------------------------------------+
| short * | 0xFFFFFFFFFFFFFFF0
+-------------------------------------------------------+
| char * | 0xFFFFFFFFFFFFFFE8
+-------------------------------------------------------+
| 64bit integer stored in 4-bytes | 0xFFFFFFFFFFFFFFE0
+-------------------------------------------------------+
| 16bit short | | | | | | | 0xFFFFFFFFFFFFFFD8
+-------------------------------------------------------+
| char | | | | | | | | 0xFFFFFFFFFFFFFFD0
+------+------+------+------+------+------+------+------+
| | | | | | | | | ...
+------+------+------+------+------+------+------+------+Endianess
Reference: https://en.wikipedia.org/wiki/Endianness
- Little endian
- The least significant byte of a value is stored first
- Big endian
- The most significant byte of a value is stored first
int32_t 0x12345678:
+------+------+------+------+------+------+------+------+
| 0x78 | 0x56 | 0x34 | 0x12 | | | | |
+------+------+------+------+------+------+------+------+