Tutorial 04: TText - Gap Buffer Basics
Overview
This tutorial introduces TText, FreshLib's gap buffer implementation for efficient text editing. Gap buffers are the data structure used by most text editors (including Emacs, Vi, and many others) because they provide O(1) insertion and deletion at the cursor position.
Topics Covered
Gap Buffer Concept - Understanding the gap buffer data structure
TText Structure - Memory layout and fields
Coordinate Systems - Offset, Position, and Index
Creating and Freeing - TextCreate, TextFree, TextDup
Inserting Text - TextAddString, TextAddBytes, TextAddChar
Deleting Text - TextDelChar, TextCompact
Gap Management - TextMoveGap, TextSetGapSize
Prerequisites
Completion of Tutorial 01: StrLib Basics
Understanding of memory management
Familiarity with pointers and data structures
What is a Gap Buffer?
A gap buffer is a dynamic array with a "gap" (empty space) that moves to where edits occur. This makes insertion and deletion at the cursor position very fast.
Traditional Array (Slow)
Insert 'X' at position 5 in "Hello World":
Before: [H][e][l][l][o][ ][W][o][r][l][d]
After: [H][e][l][l][o][X][ ][W][o][r][l][d]
└── Must shift 6 characters ──┘
Gap Buffer (Fast)
Gap is at position 5:
Before: [H][e][l][l][o][_][_][_][_][W][o][r][l][d]
└── gap ──┘
After: [H][e][l][l][o][X][_][_][_][W][o][r][l][d]
Just write 'X' at gap position - O(1)!
When the cursor moves, the gap moves with it. Text on either side of the gap is shifted to maintain continuity.
TText Structure
TText uses a backward structure - the structure fields come BEFORE the data:
Memory Layout:
┌────────────┬───────────┬─────────┬─────────────┬──────────────────┐
│ .Length │ .GapBegin │ .GapEnd │ .struc_size │ Text Data... │
│ (dd) │ (dd) │ (dd) │ (dd) │ │
└────────────┴───────────┴─────────┴─────────────┴──────────────────┘
↑ ↑
Structure base (negative offsets) Data starts here
Fields
| Field | Type | Description |
.Length | dd | Total buffer length in bytes (including gap) |
.GapBegin | dd | Offset where gap starts |
.GapEnd | dd | Offset where gap ends |
.struc_size | dd | Size of structure header (sizeof.TText) |
Important: The pointer returned by TextCreate points to the DATA, not the structure start. Access fields using negative offsets: [eax+TText.Length].
Three Coordinate Systems
TText uses three different ways to reference positions in text:
1. Offset (Bytes)
Raw byte position in the buffer
Includes the gap
Used internally by TText functions
Range: 0 to Length-1
2. Position (Bytes)
Logical byte position in the text (gap excluded)
What you see when reading the text
Used by most TText functions
Range: 0 to (Length - GapSize)
3. Index (Characters)
UTF-8 character position
Counts Unicode characters, not bytes
Used for cursor positioning
Range: 0 to character count
Example
Buffer: "Héllo" with gap at position 2
Offset: [H][é][é][_][_][_][l][l][o]
0 1 2 3 4 5 6 7 8
Position:0 1 2 3 4 5 (gap excluded)
Index: 0 1 2 3 4 (UTF-8 chars)
Note: 'é' is 2 bytes in UTF-8, so offset != position != index
Key Functions
Creation and Cleanup
; TextCreate
TextCreate .struc_size
Returns: EAX = pointer to TText data (or 0 if error)
Note: Pass sizeof.TText for .struc_size
; TextFree
TextFree .pText
Returns: Nothing
Note: Frees memory allocated by TextCreate
; TextDup
TextDup .pText
Returns: EAX = pointer to duplicate TText
Note: Creates a complete copy, must free separately
Insertion
; TextAddString - Insert StrLib string at position
TextAddString .pText, .position, .hString
Position: -1 for end of text
Returns: EAX = new gap end position
; TextAddBytes - Insert raw bytes
TextAddBytes .pText, .position, .pData, .DataLen
Note: Doesn't need null-terminated data
; TextAddChar - Insert UTF-8 character
TextAddChar .pText, .char
Note: Inserts at current gap position
Deletion
; TextDelChar - Delete character at gap position
TextDelChar .pText
Returns: EAX = deleted character code
Note: Expands gap by deletion
; TextCompact - Close gap and null-terminate
TextCompact .pText
Returns: Continuous null-terminated string
Note: Useful before displaying entire text
Gap Management
; TextMoveGap - Move gap to position
TextMoveGap .pText, .position
Note: Essential before insertion/deletion
; TextSetGapSize - Ensure minimum gap size
TextSetGapSize .pText, .desired_size
Note: Reallocates if needed
Demo Programs
Demo 13: Creating and Inspecting TText
File: demo13_ttext_create.asm
Demonstrates:
Creating TText with
TextCreateInspecting structure with
TextDebugInfoUnderstanding gap position
Duplicating with
TextDupCleaning up with
TextFreeKey Concepts:
Initial gap is at position 0
Default gap size (gapDefaultSize = 256 bytes)
Structure fields are negative offsets
Demo 14: Inserting Text
File: demo14_ttext_insert.asm
Demonstrates:
Inserting strings with
TextAddStringInserting raw bytes with
TextAddBytesInserting characters with
TextAddCharMoving gap with
TextMoveGapUsing position -1 for append
Key Concepts:
Gap moves to insertion point
Subsequent insertions at same position are O(1)
Gap grows as needed
Demo 15: Deleting and Compacting
File: demo15_ttext_delete.asm
Demonstrates:
Deleting characters with
TextDelCharCompacting text with
TextCompactSetting gap size with
TextSetGapSizeReading resulting text
Key Concepts:
Deletion expands the gap
Compact creates continuous string
After compact, gap is at end
Common Patterns
Creating and Using TText
; Create
stdcall TextCreate, sizeof.TText
test eax, eax
jz .error
mov [pText], eax
; Add content
stdcall StrDupMem, <"Hello World">
stdcall TextAddString, [pText], -1, eax
stdcall StrDel, eax ; Clean up string
; Use text...
; Cleanup
stdcall TextFree, [pText]
Inserting at Specific Position
; Move gap to position
stdcall TextMoveGap, [pText], 5
; Insert at gap
stdcall StrDupMem, <"INSERT">
stdcall TextAddString, [pText], 5, eax
stdcall StrDel, eax
Reading Complete Text
; Compact to get continuous string
stdcall TextCompact, [pText]
; Now pText points to null-terminated string
stdcall FileWriteString, [STDOUT], [pText]
Important Notes
Gap Position
The gap is always between characters, never inside a UTF-8 sequence
Position 0 = before first character
Position -1 = after last character
Memory Management
TextCreateallocates memoryTextDupcreates a separate copyTextFreemust be called for each allocationAfter
TextCompact, don't callTextFreeon the result (same pointer)
Position vs Offset
Never use offsets in application code
Always use positions (which exclude the gap)
Let TText functions handle offset conversion
UTF-8 Awareness
TextAddCharhandles multi-byte UTF-8 correctlyTextDelChardeletes complete UTF-8 charactersGap is positioned between character boundaries
Building and Running
cd 04-ttext-basics
./build.sh
This will compile and test all 3 demos.
Important Discoveries During Implementation
1. TextAddString Returns Updated Pointer in EDX
Problem: After calling TextAddString, the TText buffer may be reallocated to accommodate new text, invalidating the old pointer.
Discovery: TextAddString uses pushad/popad and returns the potentially reallocated pointer via mov [esp+4*regEDX], edx before popad. This means EDX contains the updated pointer after the call.
Solution: Always update your pText variable with EDX after each TextAddString call:
; WRONG - loses updated pointer after reallocation:
stdcall TextAddString, [pText], -1, cHello
; [pText] may now be invalid!
; CORRECT - capture updated pointer:
stdcall TextAddString, [pText], -1, cHello
mov [pText], edx
Why This Matters: If the internal buffer needs to grow, TextSetGapSize (called by TextAddString) will use ResizeMem, which can move the entire buffer to a new memory location.
2. TextSetGapSize Also Returns Updated Pointer in EDX
Problem: TextSetGapSize can reallocate the buffer when enlarging the gap.
Discovery: Like TextAddString, it returns the updated pointer in EDX via the same mechanism.
Solution: Always update pText after calling TextSetGapSize:
stdcall TextSetGapSize, [pText], 512
jc .error
mov [pText], edx ; CRITICAL - buffer may have moved
Why This Matters: Demo 15 crashed when trying to read TText fields after TextSetGapSize without updating the pointer. The old pointer was invalid, causing segmentation faults.
3. Inline Angle Bracket Strings Cause Crashes with TextAddString
Problem: Using FASM's inline string syntax <"text"> with TextAddString causes segmentation faults.
Discovery: Through debugging, found that StrPtr (called internally by TextAddString) was returning nonsensical values (like 0x14 = 20 decimal) when passed inline strings. The exact root cause in FASM's code generation is unclear, but inline strings generated by <> don't work correctly with TextAddString.
Solution: Always define text constants in the iglobal section using the text macro:
; WRONG - causes crash:
stdcall TextAddString, [pText], -1, <" ">
stdcall TextAddString, [pText], -1, <"Hello World">
; CORRECT - define constants:
iglobal
cSpace text " "
cHelloWorld text "Hello World"
endg
stdcall TextAddString, [pText], -1, cSpace
stdcall TextAddString, [pText], -1, cHelloWorld
Debug Evidence:
hString = F5E45000 ; Valid StrLib handle
StrLen(hString) = 20 ; Wrong! "Hello" is 5 bytes
StrPtr(hString) = 14 ; Invalid pointer (0x14 = 20 decimal!)
This suggests the inline string mechanism interferes with StrLib's handle-to-pointer conversion.
4. TextDelChar Deletes AFTER Gap, Not BEFORE
Problem: Initial assumption was that TextDelChar deletes the character before the gap (like backspace). The demo showed "Hello World" unchanged after attempting to delete "World".
Discovery: Reading buffergap.asm:594, TextDelChar reads the byte at [edx+TText.GapEnd] and increments GapEnd, meaning it deletes the character AFTER the gap, expanding the gap forward.
Solution: Position the gap BEFORE the text you want to delete:
; To delete "World" from "Hello World":
; "Hello World" has 11 characters, position 6 = after "Hello "
; WRONG - gap after "World":
stdcall TextMoveGap, [pText], 11
stdcall TextDelChar, [pText] ; Deletes nothing (at end)
; CORRECT - gap before "World":
stdcall TextMoveGap, [pText], 6
stdcall TextDelChar, [pText] ; Deletes 'W'
stdcall TextDelChar, [pText] ; Deletes 'o'
stdcall TextDelChar, [pText] ; Deletes 'r'
stdcall TextDelChar, [pText] ; Deletes 'l'
stdcall TextDelChar, [pText] ; Deletes 'd'
Mental Model: Think of the gap as your cursor. TextDelChar acts like the DELETE key (deletes forward), not BACKSPACE (deletes backward).
5. TextCompact Should Only Be Called at End of Operations
Problem: Calling TextCompact in the middle of operations and then trying to continue using the TText caused crashes.
Discovery: TextCompact calls TextSetGapSize which can reallocate the buffer, but TextCompact itself doesn't return the updated pointer - it just does return without setting any return value.
Solution: Only call TextCompact when you're done with all modifications and just want to display or read the final result:
; WRONG - compact in middle of operations:
stdcall TextAddString, [pText], -1, cHello
mov [pText], edx
stdcall TextCompact, [pText] ; May reallocate
; [pText] might now be invalid!
stdcall TextMoveGap, [pText], 5 ; CRASH
; CORRECT - compact only at end:
stdcall TextAddString, [pText], -1, cHello
mov [pText], edx
stdcall TextMoveGap, [pText], 5
stdcall TextDelChar, [pText]
; ... all modifications done ...
stdcall TextCompact, [pText] ; Now safe to display
stdcall FileWriteString, [STDOUT], [pText]
Alternative: If you must inspect the text in the middle of operations, use TextDup to create a copy, compact the copy, and discard it.
6. Position -1 Works Correctly for Append Operations
Discovery: When testing TextAddString with position -1, it correctly appends to the end of the text.
How It Works: Looking at buffergap.asm:490 (TextMoveGap), when position is -1 (0xFFFFFFFF unsigned), the comparison cmp ecx, eax treats it as a huge number, and cmova ecx, eax clamps it to the maximum valid position.
; These are equivalent for appending:
stdcall TextAddString, [pText], -1, cWorld
; Same as:
stdcall TextAddString, [pText], [current_text_length], cWorld
This is a convenient shorthand that doesn't require calculating the current text length.
Summary of Best Practices
Always update pText after functions that might reallocate:
TextAddString→mov [pText], edxTextSetGapSize→mov [pText], edx
Define all text constants in iglobal:
Never use
<"inline strings">with TextAddStringUse
textmacro for all string constants
Understand TextDelChar direction:
Deletes AFTER gap (forward direction)
Position gap BEFORE text to delete
Call TextCompact only at the end:
Don't modify TText after compacting
Use for final display/output only
Use position -1 for appending:
Convenient shorthand for "end of text"
No need to track current length
Next Steps
After completing this tutorial, continue to:
Tutorial 05: TText Advanced (searching, coordinates, Unicode)
Reference
FreshLib Source:
~/Documents/fossil/FreshIDE/freshlib/data/buffergap.asmGap Buffer Algorithm: https://en.wikipedia.org/wiki/Gap_buffer
AsmBB Usage:
~/Documents/fossil/asmbb/source/*.asm