Guest

27.12.25 16:19

#113

Tutorial 05: TText - Advanced Operations

Overview

This tutorial covers advanced TText operations that make text manipulation powerful and efficient. You'll learn about:

KMP Search Algorithm - Fast substring searching with O(n+m) complexity
UTF-8 Character Handling - Working with multi-byte Unicode characters
Coordinate Conversion - Converting between different text position systems

These operations are essential for building text editors, search functionality, and handling international text.

Topics Covered

1. KMP Search Algorithm

Understanding the Knuth-Morris-Pratt algorithm
Preparing search patterns efficiently
Finding all occurrences of a substring
Case-sensitive and case-insensitive search

2. UTF-8 Character Handling

UTF-8 encoding basics
Converting between byte positions and character indices
Working with multi-byte characters
Handling international text correctly

3. Coordinate Conversion

Three coordinate systems in TText
Converting between offset, position, and index
Understanding when to use each coordinate system
Practical examples of conversion

Prerequisites

Before starting this tutorial, you should have completed:

Tutorial 04: TText - Gap Buffer Basics
Understanding of TText structure and basic operations
Knowledge of UTF-8 encoding (helpful but not required)

Demos

Demo	Filename	Functions Covered
16	`demo16_ttext_search.asm`	TextPrepareSearch, TextSearch
17	`demo17_ttext_unicode.asm`	TextIndexToPos, TextPosToIndex
18	`demo18_ttext_coords.asm`	TextPosToOffset, TextOffsetToPos

Function Reference

Search Functions

TextPrepareSearch - Preprocess pattern for KMP search
TextSearch - Find substring using KMP algorithm

UTF-8 Functions

TextIndexToPos - Convert character index to byte position
TextPosToIndex - Convert byte position to character index

Coordinate Conversion Functions

TextPosToOffset - Convert position to offset (including gap)
TextOffsetToPos - Convert offset to position (excluding gap)

Building and Running

cd 05-ttext-advanced
./build.sh

This will compile and test all 3 demos.

Important Discoveries During Implementation

1. Bash Variable `SECONDS` Conflicts with Build Script Parsing

Problem: The build script showed arithmetic syntax errors like invalid arithmetic operator: error token is ".1"

Discovery: The bash built-in variable SECONDS was being used in the script to capture FASM's timing output. This caused issues with the arithmetic context.

Solution: Rename the variable to avoid conflict:

# WRONG - conflicts with bash built-in:
SECONDS=$(echo "$OUTPUT" | grep "passes" | awk '{print $3}')

# CORRECT - use different name:
TIME_VAL=$(echo "$OUTPUT" | grep "passes" | awk '{print $3}')

2. FASM Angle Bracket Strings with Quotes Cause Parse Errors

Problem: Inline strings like <" ['> or <"']"> cause "missing end quote" errors in FASM.

Discovery: The FASM preprocessor has trouble with double-quotes inside angle brackets when certain quote/bracket combinations appear. The parser gets confused about string boundaries.

Solutions (in order of preference):

Option 1: Use single quotes as outer delimiter (BEST - simplest):

; WRONG - causes parse error:
stdcall FileWriteString, [STDOUT], <" ['>

; CORRECT - swap to single quotes:
stdcall FileWriteString, [STDOUT], <' ['>
stdcall FileWriteString, [STDOUT], <'] '>

Option 2: Define constants in iglobal (good for reusable strings):

iglobal
  cQuoteOpen  text " ['"
  cQuoteClose text "']"
endg

stdcall FileWriteString, [STDOUT], cQuoteOpen
stdcall FileWriteString, [STDOUT], cQuoteClose

Option 3: String concatenation (for complex cases):

; Mix quote types safely:
stdcall FileWriteString, [STDOUT], <'"Hello"'>

Pattern:

Use single quotes (<'text'>) when string contains double quotes
Use double quotes (<"text") when string contains single quotes
Define constants in iglobal for complex or reusable strings
Never use <> with same quote type inside that matches the outer delimiter

3. NumToStr Requires 32-bit Register, Not 8-bit

Problem: Code like stdcall NumToStr, al, ntsHex causes pushd al error "invalid size of operand"

Discovery: NumToStr uses stdcall which tries to push arguments on the stack. You cannot push 8-bit registers (AL, BL, CL, DL).

Solution: Always zero-extend 8-bit values to 32-bit before calling NumToStr:

; WRONG - tries to push 8-bit register:
stdcall NumToStr, al, ntsHex

; CORRECT - extend to 32-bit first:
movzx   eax, al      ; Convert 8-bit to 32-bit
stdcall NumToStr, eax, ntsHex

Why This Matters: The stdcall macro expands to push arguments, and x86 can only push 16-bit or 32-bit values, not 8-bit.

4. TextPrepareSearch Returns Memory That Must Be Freed

Problem: Forgetting to free the index table returned by TextPrepareSearch causes memory leaks.

Discovery: TextPrepareSearch allocates memory for the KMP prefix table and returns a pointer in EAX. This must be freed with FreeMem when done.

Solution: Always track and free the index table:

stdcall TextPrepareSearch, [hSearch], tsfCaseSensitive
mov     [pIndexTable], eax
test    eax, eax
jz      .error

; ... use the table for searches ...

; CRITICAL - free when done:
stdcall FreeMem, [pIndexTable]

5. TextSearch Requires Matching Flags Between Prepare and Search

Problem: Searching with different flags than used in TextPrepareSearch gives incorrect results.

Discovery: The KMP prefix table is built based on case-sensitivity settings. Using mismatched flags causes the search to fail.

Solution: Use the same flags consistently:

; Prepare with case-insensitive:
stdcall TextPrepareSearch, [hSearch], tsfCaseIgnore
mov     [pIndexTable], eax

; Search MUST also use case-insensitive:
stdcall TextSearch, [pText], [hSearch], 0, [pIndexTable], tsfCaseIgnore

6. TextIndexToPos and TextPosToIndex Handle UTF-8 Correctly

Discovery: These functions automatically account for UTF-8 multi-byte characters. You don't need to manually count UTF-8 bytes.

Example:

; For text "H€llö" (11 bytes, 5 characters):
; Index 3 ('l') = Position 6 (after "H€ll")
; Index 4 ('ö') = Position 9 (after "H€llö")

stdcall TextIndexToPos, [pText], 3   ; Returns position 6
stdcall TextPosToIndex, [pText], 6   ; Returns index 3

7. Coordinate System Conversion Depends on Gap Position

Discovery: TextPosToOffset and TextOffsetToPos results vary based on where the gap is located. Offset includes the gap, position excludes it.

Example:

Text: "Hello World" (11 bytes)
Gap at position 6 (after "Hello ")

Position 6 = Offset 6   (before gap)
Position 7 = Offset 18  (after gap, gap is 12 bytes)

Why This Matters: When working with raw memory addresses (offsets), you must account for the gap. Use positions for logical text operations.

8. TextGetChar Returns Character in AL Register

Discovery: TextGetChar returns the UTF-8 first byte in AL, not a full handle or pointer. For multi-byte characters, you need to read additional bytes.

Pattern:

stdcall TextGetChar, [pText]   ; Returns first byte in AL
cmp     al, $80                ; Check if multi-byte
jb      .ascii_char            ; Single byte if < $80
; ... handle multi-byte UTF-8 ...

Summary of Best Practices

Swap quote types when needed in FASM inline strings
- Use <'text with "quotes"'> when you need double quotes inside
- Use "text with 'quotes'" when you need single quotes inside
- Only define constants in iglobal for complex/reusable strings

Zero-extend 8-bit values before stdcall
- Use movzx eax, al before calling functions with byte values
- Remember stdcall pushes arguments on stack

Free memory from TextPrepareSearch
- Track the returned index table pointer
- Call FreeMem when done searching

Match search flags consistently
- Use same flags in Prepare and Search calls
- Case sensitivity must match

Use position for logical operations
- Position excludes gap - what you usually want
- Offset includes gap - for raw memory access only
- Index for UTF-8 character counting

Handle UTF-8 multi-byte characters correctly
- Check if byte >= $80 for multi-byte
- Use TextIndexToPos/TextPosToIndex for conversion

Next Steps

After completing this tutorial, continue to:

Tutorial 06: TText - Real-World Patterns

Guest

27.12.25 16:19

#114

; Demo 16: TText Search with KMP Algorithm
; Demonstrates fast substring searching using Knuth-Morris-Pratt algorithm

include "%lib%/freshlib.inc"

LINUX_INTERPRETER equ './ld-musl-i386.so'

@BinaryType console, compact
LIB_MODE equ NOGUI

options.DebugMode = 0

include "%lib%/freshlib.asm"

; ========================================
; Data Section
; ========================================
uglobal
  pText          dd ?
  pIndexTable    dd ?
  hSearch        dd ?
  hText          dd ?
  hResults       dd ?
  nCount         dd ?
  nLastPos       dd ?
endg

iglobal
  cCRLF      text 13, 10

  cTitle     text "=== Demo 16: TText Search with KMP ===", 13, 10, 13, 10

  cLabel1    text "1. Basic substring search:", 13, 10
  cLabel2    text "2. Case-insensitive search:", 13, 10
  cLabel3    text "3. Multiple occurrences:", 13, 10
  cLabel4    text "4. Pattern with wildcards:", 13, 10

  cText1     text "The quick brown fox jumps over the lazy dog"
  cText2     text "Hello World! HELLO world! hello WORLD!"
  cText3     text "ababababababababababab"
  cText4     text "test*pattern?matching*test"

  cPattern1  text "fox"
  cPattern2  text "HELLO"
  cPattern3  text "abab"
  cPattern4  text "test*"

  cFound     text "  Found at position: "
  cNotFound  text "  Pattern not found", 13, 10
  cCount     text "  Occurrences: "
  cLastPos   text "  Last position: "

  cDone      text 13, 10, "Demo 16 complete!", 13, 10
  cError     text 13, 10, "Error!", 13, 10
endg

; ========================================
; Entry Point
; ========================================
start:
        InitializeAll

        stdcall FileWriteString, [STDOUT], cTitle

; ========================================
; Demo 1: Basic substring search
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel1

; Create TText
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText1
        mov     [pText], edx

; Prepare search pattern
        stdcall StrDupMem, cPattern1
        test    eax, eax
        jz      .error
        mov     [hSearch], eax

; Prepare KMP search table
        stdcall TextPrepareSearch, [hSearch], tsfCaseSensitive
        test    eax, eax
        jz      .error
        mov     [pIndexTable], eax

; Search for "fox"
        stdcall TextSearch, [pText], [hSearch], 0, 0, tsfCaseSensitive
        test    eax, eax
        jz      .not_found1

        stdcall FileWriteString, [STDOUT], cFound
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF
        jmp     .demo1_done

.not_found1:
        stdcall FileWriteString, [STDOUT], cNotFound

.demo1_done:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 2: Case-insensitive search
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel2

; Create new text
        stdcall TextFree, [pText]
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText2
        mov     [pText], edx

; Free old search
        stdcall StrDel, [hSearch]
        stdcall FreeMem, [pIndexTable]

; Prepare case-insensitive search
        stdcall StrDupMem, cPattern2
        test    eax, eax
        jz      .error
        mov     [hSearch], eax

        stdcall TextPrepareSearch, [hSearch], tsfCaseIgnore
        test    eax, eax
        jz      .error
        mov     [pIndexTable], eax

; Search for "HELLO" (case-insensitive)
        stdcall TextSearch, [pText], [hSearch], 0, 0, tsfCaseIgnore
        test    eax, eax
        jz      .not_found2

; Show first occurrence
        stdcall FileWriteString, [STDOUT], <"  Found 'HELLO' at position: ">
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Count all occurrences
        mov     [nCount], 0
        mov     [nLastPos], 0

.find_all:
        stdcall TextSearch, [pText], [hSearch], [nLastPos], 0, tsfCaseIgnore
        test    eax, eax
        jz      .done_count

        inc     [nCount]
        mov     [nLastPos], eax
        inc     [nLastPos]  ; Start search after this position
        jmp     .find_all

.done_count:
        stdcall FileWriteString, [STDOUT], cCount
        stdcall NumToStr, [nCount], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF
        jmp     .demo2_done

.not_found2:
        stdcall FileWriteString, [STDOUT], cNotFound

.demo2_done:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 3: Multiple occurrences
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel3

; Create new text
        stdcall TextFree, [pText]
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText3
        mov     [pText], edx

; Free old search
        stdcall StrDel, [hSearch]
        stdcall FreeMem, [pIndexTable]

; Prepare new search
        stdcall StrDupMem, cPattern3
        test    eax, eax
        jz      .error
        mov     [hSearch], eax

        stdcall TextPrepareSearch, [hSearch], tsfCaseSensitive
        test    eax, eax
        jz      .error
        mov     [pIndexTable], eax

; Find all occurrences of "abab"
        mov     [nCount], 0
        mov     [nLastPos], 0

.find_all3:
        stdcall TextSearch, [pText], [hSearch], [nLastPos], 0, tsfCaseSensitive
        test    eax, eax
        jz      .done_count3

; Show this occurrence
        stdcall FileWriteString, [STDOUT], <"  Found at position: ">
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nCount]
        mov     [nLastPos], eax
        inc     [nLastPos]
        jmp     .find_all3

.done_count3:
        stdcall FileWriteString, [STDOUT], cCount
        stdcall NumToStr, [nCount], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 4: Pattern with wildcards
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel4
        stdcall FileWriteString, [STDOUT], <"  (Wildcards not implemented in basic TText search)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  Use StrLib pattern matching instead", 13, 10>

; ========================================
; Cleanup
; ========================================
        stdcall TextFree, [pText]
        stdcall StrDel, [hSearch]
        stdcall FreeMem, [pIndexTable]

        stdcall FileWriteString, [STDOUT], cDone

.finish:
        FinalizeAll
        stdcall TerminateAll, 0

.error:
        stdcall FileWriteString, [STDOUT], cError
        FinalizeAll
        stdcall TerminateAll, 1

; ========================================
; FUNCTION REFERENCE
; ========================================
;
; TextPrepareSearch - Prepare pattern for KMP search
;   Arguments: .hSubstr - String handle containing search pattern
;              .flags - Search flags (tsfCaseSensitive or tsfCaseIgnore)
;   Returns:   EAX - Pointer to KMP prefix table (must be freed with FreeMem)
;              CF=1 on error
;   Notes:     Must be called before TextSearch
;
; TextSearch - Find substring using KMP algorithm
;   Arguments: .pText - TText pointer to search in
;              .hSubstr - String handle containing search pattern
;              .From - Starting position (0 for beginning)
;              .pIndex - Pointer to KMP table from TextPrepareSearch
;              .flags - Search flags (must match TextPrepareSearch)
;   Returns:   EAX - Position of match, or 0 if not found
;              CF=0 on success, CF=1 on not found
;   Notes:     For multiple searches, increment .From after each match
;

Guest

27.12.25 16:19

#115

; Demo 17: TText UTF-8 Character Handling
; Demonstrates multi-byte UTF-8 character operations

include "%lib%/freshlib.inc"

LINUX_INTERPRETER equ './ld-musl-i386.so'

@BinaryType console, compact
LIB_MODE equ NOGUI

options.DebugMode = 0

include "%lib%/freshlib.asm"

; ========================================
; Data Section
; ========================================
uglobal
  pText          dd ?
  hChars         dd ?
  nIndex         dd ?
  nPos           dd ?
  nCount         dd ?
endg

iglobal
  cCRLF      text 13, 10

  cTitle     text "=== Demo 17: TText UTF-8 Character Handling ===", 13, 10, 13, 10

  cLabel1    text "1. ASCII text (single-byte UTF-8):", 13, 10
  cLabel2    text "2. European text (2-byte UTF-8):", 13, 10
  cLabel3    text "3. Asian text (3-byte UTF-8):", 13, 10
  cLabel4    text "4. Mixed text with emojis:", 13, 10

; UTF-8 encoded strings
  cASCII      text "Hello World"
  cEuropean   text "H€llö Wörld"
  cAsian      text "你好世界"
  cMixed      text "Hello 👋 World 🌍 こんにちは"

  cBytePos    text "  Byte position: "
  cCount      text "  Character count: "
  cInfo       text "  [byte] -> [character index] -> [UTF-8 character]"
  cArrow      text " -> "
  cTextColon  text "  Text: "
  cChars      text "  Characters:", 13, 10
  cCharPrefix text "  Character "
  cColonSpace text ": "
  cUTF8Multi  text "[UTF-8 multi-byte] "
  cSpace      text " "
  cAtPosition text " at position "
  cAtByte     text " at byte "
  cNote1      text "  Note: This contains", 13, 10
  cNote2      text "  - ASCII characters (1 byte)", 13, 10
  cNote3      text "  - Emoji characters (4 bytes each in UTF-8)", 13, 10
  cNote4      text "  - CJK characters (3 bytes each)", 13, 10

  cDone       text 13, 10, "Demo 17 complete!", 13, 10
  cError      text 13, 10, "Error!", 13, 10
endg

; ========================================
; Entry Point
; ========================================
start:
        InitializeAll

        stdcall FileWriteString, [STDOUT], cTitle

; ========================================
; Demo 1: ASCII text (single-byte UTF-8)
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel1

; Create TText with ASCII text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cASCII
        mov     [pText], edx

; Show byte positions and character indices
        stdcall FileWriteString, [STDOUT], cInfo
        stdcall FileWriteString, [STDOUT], cCRLF

; Iterate through text
        mov     [nPos], 0
.iter_ascii:
; Check if position is within text
        stdcall TextCompact, [pText]
        stdcall StrLen, [pText]
        push    eax
        stdcall FileWriteString, [STDOUT], [pText]
        pop     eax
        push    eax
        stdcall StrLen, [pText]
        pop     ecx
        mov     eax, [nPos]
        cmp     eax, ecx
        jae     .done_ascii

; Show byte position
        stdcall FileWriteString, [STDOUT], cBytePos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

; Convert to character index
        stdcall TextPosToIndex, [pText], [nPos]
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

; Get character at this position
        stdcall TextGetChar, [pText]
        cmp     al, 0
        je      .done_ascii

; Show character (ASCII only)
        push    eax
        movzx   eax, al      ; Convert 8-bit to 32-bit for NumToStr
        stdcall NumToStr, eax, ntsHex
        push    eax
        stdcall FileWriteString, [STDOUT], <' ['>
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], <'] '>
        pop     eax
        stdcall StrDel, eax
        pop     eax

        stdcall FileWriteString, [STDOUT], cCRLF

; Next position
        inc     [nPos]
        jmp     .iter_ascii

.done_ascii:
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall TextFree, [pText]

; ========================================
; Demo 2: European text (2-byte UTF-8)
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel2

; Create TText with European text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cEuropean
        mov     [pText], edx

; Show character mapping
        stdcall FileWriteString, [STDOUT], cTextColon
        stdcall TextCompact, [pText]
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], cCRLF

; Show individual characters
        stdcall FileWriteString, [STDOUT], cChars

; Convert each character index to position
        mov     [nIndex], 0
.iter_europe:
; Convert index to position
        stdcall TextIndexToPos, [pText], [nIndex]
        mov     [nPos], eax
        test    eax, eax
        jz      .done_europe

; Get character at position
        stdcall TextGetChar, [pText]
        cmp     al, 0
        je      .done_europe

; Show character info
        stdcall FileWriteString, [STDOUT], cCharPrefix
        stdcall NumToStr, [nIndex], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cColonSpace

; Show UTF-8 byte sequence
        mov     ecx, [nPos]
        stdcall TextGetChar, [pText]

; Simple output for 2-byte UTF-8
        cmp     al, $80
        jb      .single_byte_e

; Multi-byte character
        stdcall FileWriteString, [STDOUT], cUTF8Multi
        jmp     .show_code_e

.single_byte_e:
; Single byte character
        stdcall FileWriteString, [STDOUT], <'['>
; Note: We're simplifying here - real UTF-8 handling is complex
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], <'] '>

.show_code_e:
; Show character codes
        movzx   eax, al
        stdcall NumToStr, eax, ntsHex
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cSpace

; Show position
        stdcall FileWriteString, [STDOUT], cAtPosition
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Next character
        inc     [nIndex]
        jmp     .iter_europe

.done_europe:
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall TextFree, [pText]

; ========================================
; Demo 3: Asian text (3-byte UTF-8)
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel3

; Create TText with Asian text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cAsian
        mov     [pText], edx

; Show text
        stdcall FileWriteString, [STDOUT], cTextColon
        stdcall TextCompact, [pText]
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], cCRLF

; Count characters by converting position to index
        stdcall StrLen, [pText]
        mov     [nCount], 0
        mov     [nPos], 0

.count_chars:
; Convert position to character index
        stdcall TextPosToIndex, [pText], [nPos]
        test    eax, eax
        jz      .count_done

; Show this character
        stdcall FileWriteString, [STDOUT], cCharPrefix
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cAtByte
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Move to next byte position (simplified - real UTF-8 would track byte length)
        inc     [nCount]
        inc     [nPos]
        jmp     .count_chars

.count_done:
        stdcall FileWriteString, [STDOUT], cCount
        stdcall NumToStr, [nCount], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall TextFree, [pText]

; ========================================
; Demo 4: Mixed text with emojis
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel4

; Create TText with mixed text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cMixed
        mov     [pText], edx

; Show text
        stdcall FileWriteString, [STDOUT], cTextColon
        stdcall TextCompact, [pText]
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], cCRLF

; Note about mixed content
        stdcall FileWriteString, [STDOUT], cNote1
        stdcall FileWriteString, [STDOUT], cNote2
        stdcall FileWriteString, [STDOUT], cNote3
        stdcall FileWriteString, [STDOUT], cNote4

; ========================================
; Cleanup
; ========================================
        stdcall TextFree, [pText]

        stdcall FileWriteString, [STDOUT], cDone

.finish:
        FinalizeAll
        stdcall TerminateAll, 0

.error:
        stdcall FileWriteString, [STDOUT], cError
        FinalizeAll
        stdcall TerminateAll, 1

; ========================================
; FUNCTION REFERENCE
; ========================================
;
; TextIndexToPos - Convert character index to byte position
;   Arguments: .pText - TText pointer
;              .index - Character index (0-based)
;   Returns:   EAX - Byte position, or high value if index > text length
;              CF=0 on success, CF=1 if index beyond text
;   Notes:     Takes UTF-8 encoding into account
;
; TextPosToIndex - Convert byte position to character index
;   Arguments: .pText - TText pointer
;              .Pos - Byte position
;   Returns:   EAX - Character index (0-based)
;   Notes:     Converts byte position to UTF-8 character count
;

Guest

27.12.25 16:20

#116

; Demo 18: TText Coordinate Conversion
; Demonstrates conversion between offset, position, and index coordinate systems

include "%lib%/freshlib.inc"

LINUX_INTERPRETER equ './ld-musl-i386.so'

@BinaryType console, compact
LIB_MODE equ NOGUI

options.DebugMode = 0

include "%lib%/freshlib.asm"

; ========================================
; Data Section
; ========================================
uglobal
  pText          dd ?
  nOffset        dd ?
  nPos           dd ?
  nIndex         dd ?
  nLength        dd ?
endg

iglobal
  cCRLF      text 13, 10

  cTitle     text "=== Demo 18: TText Coordinate Conversion ===", 13, 10, 13, 10

  cLabel1    text "1. Understanding the three coordinate systems:", 13, 10
  cLabel2    text "2. Position to Offset conversion:", 13, 10
  cLabel3    text "3. Offset to Position conversion:", 13, 10
  cLabel4    text "4. Complete conversion chain:", 13, 10

  cText1     text "Hello World"
  cText2     text "The quick brown fox"

  cOffset    text "  Offset: "
  cPos       text "  Position: "
  cIndex     text "  Index: "

  cArrow     text " -> "

  cDone      text 13, 10, "Demo 18 complete!", 13, 10
  cError     text 13, 10, "Error!", 13, 10
endg

; ========================================
; Entry Point
; ========================================
start:
        InitializeAll

        stdcall FileWriteString, [STDOUT], cTitle

; ========================================
; Demo 1: Understanding the three coordinate systems
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel1

; Explain the coordinate systems
        stdcall FileWriteString, [STDOUT], <"  OFFSET - Raw byte position in buffer (includes gap)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  POSITION - Logical byte position (excludes gap)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  INDEX   - UTF-8 character position", 13, 10>
        stdcall FileWriteString, [STDOUT], cCRLF

; Create TText
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText1
        mov     [pText], edx

; Show initial state
        stdcall FileWriteString, [STDOUT], <"  Text: 'Hello World'", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  Length: 11 bytes, 11 characters (ASCII)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  Gap: at position 11 (end of text)", 13, 10>

; Show structure fields
        mov     esi, [pText]
        stdcall FileWriteString, [STDOUT], <"  Structure:", 13, 10>
        stdcall FileWriteString, [STDOUT], <"    .Length = ">
        stdcall NumToStr, [esi+TText.Length], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], <"    .GapBegin = ">
        stdcall NumToStr, [esi+TText.GapBegin], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], <"    .GapEnd = ">
        stdcall NumToStr, [esi+TText.GapEnd], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 2: Position to Offset conversion
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel2

; Before moving gap, position equals offset
        stdcall FileWriteString, [STDOUT], <"  With gap at end:", 13, 10>

        mov     [nPos], 0
.convert_loop1:
        cmp     [nPos], 12
        jae     .done_convert1

        stdcall TextPosToOffset, [pText], [nPos]
        mov     [nOffset], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nPos]
        jmp     .convert_loop1

.done_convert1:
        stdcall FileWriteString, [STDOUT], cCRLF

; Now move gap to middle
        stdcall FileWriteString, [STDOUT], <"  Move gap to position 6:", 13, 10>
        stdcall TextMoveGap, [pText], 6

; Show new structure
        mov     esi, [pText]
        stdcall FileWriteString, [STDOUT], <"    .GapBegin = ">
        stdcall NumToStr, [esi+TText.GapBegin], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], <"    .GapEnd = ">
        stdcall NumToStr, [esi+TText.GapEnd], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Now position != offset
        stdcall FileWriteString, [STDOUT], <"  Position to Offset (gap at 6):", 13, 10>

        mov     [nPos], 0
.convert_loop2:
        cmp     [nPos], 12
        jae     .done_convert2

        stdcall TextPosToOffset, [pText], [nPos]
        mov     [nOffset], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nPos]
        jmp     .convert_loop2

.done_convert2:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 3: Offset to Position conversion
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel3

        stdcall FileWriteString, [STDOUT], <"  Offset to Position (gap at 6):", 13, 10>

        mov     [nOffset], 0
.convert_loop3:
        cmp     [nOffset], 18
        jae     .done_convert3

        stdcall TextOffsetToPos, [pText], [nOffset]
        mov     [nPos], eax

        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nOffset]
        jmp     .convert_loop3

.done_convert3:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 4: Complete conversion chain
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel4

; Create new text
        stdcall TextFree, [pText]
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cText2
        mov     [pText], edx

        stdcall FileWriteString, [STDOUT], <"  Text: 'The quick brown fox'", 13, 10>
        stdcall FileWriteString, [STDOUT], cCRLF

; Show conversions for character at index 5
        stdcall FileWriteString, [STDOUT], <"  Character index 5 ('q' in 'quick'):", 13, 10>

; Index to Position
        stdcall TextIndexToPos, [pText], 5
        mov     [nPos], eax

        stdcall FileWriteString, [STDOUT], cIndex
        stdcall NumToStr, 5, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Position to Offset
        stdcall TextPosToOffset, [pText], [nPos]
        mov     [nOffset], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Reverse: Offset to Position
        stdcall TextOffsetToPos, [pText], [nOffset]
        mov     [nPos], eax

        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Position to Index
        stdcall TextPosToIndex, [pText], [nPos]
        mov     [nIndex], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cIndex
        stdcall NumToStr, [nIndex], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Cleanup
; ========================================
        stdcall TextFree, [pText]

        stdcall FileWriteString, [STDOUT], cDone

.finish:
        FinalizeAll
        stdcall TerminateAll, 0

.error:
        stdcall FileWriteString, [STDOUT], cError
        FinalizeAll
        stdcall TerminateAll, 1

; ========================================
; FUNCTION REFERENCE
; ========================================
;
; TextPosToOffset - Convert position to offset (including gap)
;   Arguments: .pText - TText pointer
;              .pos - Byte position (excluding gap)
;   Returns:   EAX - Offset in buffer (including gap)
;   Notes:     Used when you need raw memory position
;
; TextOffsetToPos - Convert offset to position (excluding gap)
;   Arguments: .pText - TText pointer
;              .offs - Offset in buffer (including gap)
;   Returns:   EAX - Byte position (excluding gap)
;   Notes:     Reverse of TextPosToOffset
;
; TextIndexToPos - Convert character index to byte position
;   Arguments: .pText - TText pointer
;              .index - Character index (UTF-8 aware)
;   Returns:   EAX - Byte position
;   Notes:     Takes UTF-8 encoding into account
;
; TextPosToIndex - Convert byte position to character index
;   Arguments: .pText - TText pointer
;              .Pos - Byte position
;   Returns:   EAX - Character index
;   Notes:     UTF-8 character counting
;