Project Overview: AsmBB (a forum engine) and FreshLib (its underlying standard library) are written in pure Flat Assembler (FASM) code. Unlike compiled languages (C++, Go, Rust) where a compiler decides how to translate logic into machine code, the authors (JohnFound and team) manually orchestrate every CPU cycle, register allocation, and memory access.
This approach allows for a specific set of optimizations categorized below, ranging from single-instruction micro-optimizations to architectural memory strategies.
1. Instruction-Level Micro-Optimizations
These techniques focus on reducing code size (cache locality) and execution latency by selecting specific CPU instructions.
A. Register Clearing (XOR vs MOV)
The codebase almost universally avoids using MOV to zero out a register.
Technique:
xor eax, eaxinstead ofmov eax, 0.Why it works:
Size:
XORis 2 bytes;MOVis 5 bytes. Smaller code fits better in the CPU instruction cache.
Pipeline: Modern CPUs recognize
xor reg, regas a dependency breaker, allowing out-of-order execution to proceed without waiting for previous values of that register.
B. The LEA Math Trick
The LEA (Load Effective Address) instruction is designed to calculate memory addresses, but FreshLib leverages it for general-purpose arithmetic.
Technique: Performing addition and multiplication in one cycle without affecting EFLAGS.
Code Example:
; Traditional Math
mov eax, ecx ; Move value
imul eax, 4 ; Multiply
add eax, ebx ; Add other register
add eax, 8 ; Add constant
; FreshLib Optimization
lea eax, [ebx + ecx*4 + 8] ; All in ONE instruction
C. String Instruction Compression
The code utilizes x86 specific string instructions (LODSB, STOSD, SCASB) to combine memory access and pointer arithmetic.
Code Example (
strlib.asm):
; Instead of:
mov al, byte [esi]
inc esi
; The code uses:
lodsb ; Loads byte to AL and increments ESI automatically
This reduces code density, allowing more logic to fit into the L1 cache.
2. The FreshLib String Engine (Handle-Based System)
Standard libraries (libc, STL) typically represent strings as character arrays. FreshLib treats strings as immutable objects identified by unique handles. This is the most significant architectural optimization in the library.
O(1) String Comparison
Concept: When a string is created, it is hashed and stored in a central table. If the string "admin" exists, any new request for "admin" returns the existing handle (pointer).
Optimization: String equality checks become integer comparisons rather than character-by-character loops.
Source Code Logic:
; --- Standard C-style Approach (Slow O(n)) ---
; loops through "admin", checking 'a', 'd', 'm'...
invoke strcmp, [user_input], "admin"
; --- FreshLib Approach (Fast O(1)) ---
; The handle for "admin" is known at compile/load time.
mov eax, [user_input_handle]
cmp eax, [hStringAdmin] ; Simple 32-bit integer compare
je .is_admin ; Branch instantly
3. Algorithmic Efficiency & Data Structures
AsmBB avoids linear scanning of data structures, implementing custom algorithms tailored for assembly.
A. Binary Search in Counter Arrays
In counter_array.asm, the system maintains sorted arrays to track statistics.
Technique: Uses
__SearchCountArrayto perform a binary search.Impact: Reduces lookup time for a user list of 1,000,000 entries from 1,000,000 checks (worst case) to roughly 20 checks.
B. Hash Trees (hashtree.asm)
Instead of standard hash maps (which may handle collisions via slow linked lists), FreshLib implements Hash Trees.
Technique: Uses bits of the hash to navigate a tree structure (Trie).
Impact: guarantees near-constant access time regardless of collision density.
C. Gap Buffers for Text Editing
Found in buffergap.asm, this structure is used for text fields and editors.
Concept: A dynamic array that maintains a "hole" (gap) at the cursor position.
Visual:
[Text_Start] [____GAP____] [Text_End]Optimization: Inserting a character simply writes into the gap and shrinks it. No memory needs to be shifted/copied, making typing O(1).
4. Memory Management & "Zero-Copy"
Memory allocation (malloc/free) is slow. AsmBB minimizes its use through smart buffering.
A. Zero-Copy Parsing
In render2.asm and minimag.asm (the markup parser), the code parses text without duplicating it.
Technique: The parser reads the source buffer and writes directly to the output buffer (HTML). It uses pointers into the original text for analyzing tags rather than extracting substrings into new variables.
Impact: Eliminates memory fragmentation and the CPU cost of allocation during high-traffic request processing.
B. Object Recycling (Graphics)
In graphics/recycler.asm, the library implements an object pool for images.
Technique: When an image is "freed," it isn't returned to the OS. It is pushed to a
RecycledImageslist.Optimization:
proc GetRecycledImage, .width, .height
; Check if a pre-allocated image of this size exists in the pool
; If yes, return it (Instant).
; If no, only then call the slow system allocator.
endp
5. SIMD and Graphics Acceleration
FreshLib includes a custom graphics library that processes pixels in parallel using MMX (Multimedia Extensions) instructions.
Alpha Blending
In graphics/images.asm, alpha blending (transparency) is math-heavy: Result = (Alpha * Source + (255-Alpha) * Dest) / 255.
Optimization: The code uses SIMD instructions to calculate 4 pixels (or color channels) simultaneously.
Source Logic:
; Example of MMX usage in BlendImage
movq mm0, [esi] ; Load 8 bytes (multiple pixels/channels)
punpcklbw mm0, mm7 ; Unpack bytes to words
pmullw mm0, mm2 ; Parallel Multiply 4 values at once
paddusw mm0, mm1 ; Parallel Add with saturation
psrlw mm0, 8 ; Shift right (divide by 256)
6. System & Calling Convention Optimizations
A. Custom Register Passing
While stdcall is used for external Windows/Linux APIs, internal functions use a "FastCall" style convention.
Technique: Arguments are passed in
EAX,ECX,EDXrather than pushed onto the stack.Impact: Removes the overhead of Stack Frame setup (
push ebp/mov ebp, esp) and memory writes (push arg).Code Example:
; Typical FreshLib internal call
mov eax, [ptrString]
mov ecx, [newLength]
call StrSetCapacity ; Reads EAX/ECX directly
B. FastCGI Implementation
AsmBB does not run as a CGI script (which spawns a new process for every web request). It implements the FastCGI protocol directly in assembly (fcgi.asm).
Optimization: The application stays resident in memory. It accepts a connection, processes the request, and waits for the next one without restarting. Combined with the optimizations above, this allows AsmBB to serve thousands of requests per second on minimal hardware.