Tutorial 05: TText - Advanced Operations

0
#
26
27.12.25 16:19

Tutorial 05: TText - Advanced Operations

Overview

This tutorial covers advanced TText operations that make text manipulation powerful and efficient. You'll learn about:

  • KMP Search Algorithm - Fast substring searching with O(n+m) complexity

  • UTF-8 Character Handling - Working with multi-byte Unicode characters

  • Coordinate Conversion - Converting between different text position systems

These operations are essential for building text editors, search functionality, and handling international text.

Topics Covered

1. KMP Search Algorithm

  • Understanding the Knuth-Morris-Pratt algorithm

  • Preparing search patterns efficiently

  • Finding all occurrences of a substring

  • Case-sensitive and case-insensitive search

2. UTF-8 Character Handling

  • UTF-8 encoding basics

  • Converting between byte positions and character indices

  • Working with multi-byte characters

  • Handling international text correctly

3. Coordinate Conversion

  • Three coordinate systems in TText

  • Converting between offset, position, and index

  • Understanding when to use each coordinate system

  • Practical examples of conversion

Prerequisites

Before starting this tutorial, you should have completed:

  • Tutorial 04: TText - Gap Buffer Basics

  • Understanding of TText structure and basic operations

  • Knowledge of UTF-8 encoding (helpful but not required)

Demos

Demo Filename Functions Covered
16 demo16_ttext_search.asm TextPrepareSearch, TextSearch
17 demo17_ttext_unicode.asm TextIndexToPos, TextPosToIndex
18 demo18_ttext_coords.asm TextPosToOffset, TextOffsetToPos

Function Reference

Search Functions

  • TextPrepareSearch - Preprocess pattern for KMP search

  • TextSearch - Find substring using KMP algorithm

UTF-8 Functions

  • TextIndexToPos - Convert character index to byte position

  • TextPosToIndex - Convert byte position to character index

Coordinate Conversion Functions

  • TextPosToOffset - Convert position to offset (including gap)

  • TextOffsetToPos - Convert offset to position (excluding gap)

Building and Running

cd 05-ttext-advanced
./build.sh

This will compile and test all 3 demos.

Important Discoveries During Implementation

1. Bash Variable SECONDS Conflicts with Build Script Parsing

Problem: The build script showed arithmetic syntax errors like invalid arithmetic operator: error token is ".1"

Discovery: The bash built-in variable SECONDS was being used in the script to capture FASM's timing output. This caused issues with the arithmetic context.

Solution: Rename the variable to avoid conflict:

# WRONG - conflicts with bash built-in:
SECONDS=$(echo "$OUTPUT" | grep "passes" | awk '{print $3}')

# CORRECT - use different name:
TIME_VAL=$(echo "$OUTPUT" | grep "passes" | awk '{print $3}')

2. FASM Angle Bracket Strings with Quotes Cause Parse Errors

Problem: Inline strings like <" ['> or <"']"> cause "missing end quote" errors in FASM.

Discovery: The FASM preprocessor has trouble with double-quotes inside angle brackets when certain quote/bracket combinations appear. The parser gets confused about string boundaries.

Solutions (in order of preference):

Option 1: Use single quotes as outer delimiter (BEST - simplest):

; WRONG - causes parse error:
stdcall FileWriteString, [STDOUT], <" ['>

; CORRECT - swap to single quotes:
stdcall FileWriteString, [STDOUT], <' ['>
stdcall FileWriteString, [STDOUT], <'] '>

Option 2: Define constants in iglobal (good for reusable strings):

iglobal
  cQuoteOpen  text " ['"
  cQuoteClose text "']"
endg

stdcall FileWriteString, [STDOUT], cQuoteOpen
stdcall FileWriteString, [STDOUT], cQuoteClose

Option 3: String concatenation (for complex cases):

; Mix quote types safely:
stdcall FileWriteString, [STDOUT], <'"Hello"'>

Pattern:

  • Use single quotes (<'text'>) when string contains double quotes

  • Use double quotes (<"text") when string contains single quotes

  • Define constants in iglobal for complex or reusable strings

  • Never use <> with same quote type inside that matches the outer delimiter

3. NumToStr Requires 32-bit Register, Not 8-bit

Problem: Code like stdcall NumToStr, al, ntsHex causes pushd al error "invalid size of operand"

Discovery: NumToStr uses stdcall which tries to push arguments on the stack. You cannot push 8-bit registers (AL, BL, CL, DL).

Solution: Always zero-extend 8-bit values to 32-bit before calling NumToStr:

; WRONG - tries to push 8-bit register:
stdcall NumToStr, al, ntsHex

; CORRECT - extend to 32-bit first:
movzx   eax, al      ; Convert 8-bit to 32-bit
stdcall NumToStr, eax, ntsHex

Why This Matters: The stdcall macro expands to push arguments, and x86 can only push 16-bit or 32-bit values, not 8-bit.

4. TextPrepareSearch Returns Memory That Must Be Freed

Problem: Forgetting to free the index table returned by TextPrepareSearch causes memory leaks.

Discovery: TextPrepareSearch allocates memory for the KMP prefix table and returns a pointer in EAX. This must be freed with FreeMem when done.

Solution: Always track and free the index table:

stdcall TextPrepareSearch, [hSearch], tsfCaseSensitive
mov     [pIndexTable], eax
test    eax, eax
jz      .error

; ... use the table for searches ...

; CRITICAL - free when done:
stdcall FreeMem, [pIndexTable]

5. TextSearch Requires Matching Flags Between Prepare and Search

Problem: Searching with different flags than used in TextPrepareSearch gives incorrect results.

Discovery: The KMP prefix table is built based on case-sensitivity settings. Using mismatched flags causes the search to fail.

Solution: Use the same flags consistently:

; Prepare with case-insensitive:
stdcall TextPrepareSearch, [hSearch], tsfCaseIgnore
mov     [pIndexTable], eax

; Search MUST also use case-insensitive:
stdcall TextSearch, [pText], [hSearch], 0, [pIndexTable], tsfCaseIgnore

6. TextIndexToPos and TextPosToIndex Handle UTF-8 Correctly

Discovery: These functions automatically account for UTF-8 multi-byte characters. You don't need to manually count UTF-8 bytes.

Example:

; For text "H€llö" (11 bytes, 5 characters):
; Index 3 ('l') = Position 6 (after "H€ll")
; Index 4 ('ö') = Position 9 (after "H€llö")

stdcall TextIndexToPos, [pText], 3   ; Returns position 6
stdcall TextPosToIndex, [pText], 6   ; Returns index 3

7. Coordinate System Conversion Depends on Gap Position

Discovery: TextPosToOffset and TextOffsetToPos results vary based on where the gap is located. Offset includes the gap, position excludes it.

Example:

Text: "Hello World" (11 bytes)
Gap at position 6 (after "Hello ")

Position 6 = Offset 6   (before gap)
Position 7 = Offset 18  (after gap, gap is 12 bytes)

Why This Matters: When working with raw memory addresses (offsets), you must account for the gap. Use positions for logical text operations.

8. TextGetChar Returns Character in AL Register

Discovery: TextGetChar returns the UTF-8 first byte in AL, not a full handle or pointer. For multi-byte characters, you need to read additional bytes.

Pattern:

stdcall TextGetChar, [pText]   ; Returns first byte in AL
cmp     al, $80                ; Check if multi-byte
jb      .ascii_char            ; Single byte if < $80
; ... handle multi-byte UTF-8 ...

Summary of Best Practices

  1. Swap quote types when needed in FASM inline strings

    • Use <'text with "quotes"'> when you need double quotes inside

    • Use "text with 'quotes'" when you need single quotes inside

    • Only define constants in iglobal for complex/reusable strings

  1. Zero-extend 8-bit values before stdcall

    • Use movzx eax, al before calling functions with byte values

    • Remember stdcall pushes arguments on stack

  1. Free memory from TextPrepareSearch

    • Track the returned index table pointer

    • Call FreeMem when done searching

  1. Match search flags consistently

    • Use same flags in Prepare and Search calls

    • Case sensitivity must match

  1. Use position for logical operations

    • Position excludes gap - what you usually want

    • Offset includes gap - for raw memory access only

    • Index for UTF-8 character counting

  1. Handle UTF-8 multi-byte characters correctly

    • Check if byte >= $80 for multi-byte

    • Use TextIndexToPos/TextPosToIndex for conversion


Next Steps

After completing this tutorial, continue to:

  • Tutorial 06: TText - Real-World Patterns

25
27.12.25 16:19
; Demo 16: TText Search with KMP Algorithm
; Demonstrates fast substring searching using Knuth-Morris-Pratt algorithm

include "%lib%/freshlib.inc"

LINUX_INTERPRETER equ './ld-musl-i386.so'

@BinaryType console, compact
LIB_MODE equ NOGUI

options.DebugMode = 0

include "%lib%/freshlib.asm"

; ========================================
; Data Section
; ========================================
uglobal
  pText          dd ?
  pIndexTable    dd ?
  hSearch        dd ?
  hText          dd ?
  hResults       dd ?
  nCount         dd ?
  nLastPos       dd ?
endg

iglobal
  cCRLF      text 13, 10

  cTitle     text "=== Demo 16: TText Search with KMP ===", 13, 10, 13, 10

  cLabel1    text "1. Basic substring search:", 13, 10
  cLabel2    text "2. Case-insensitive search:", 13, 10
  cLabel3    text "3. Multiple occurrences:", 13, 10
  cLabel4    text "4. Pattern with wildcards:", 13, 10

  cText1     text "The quick brown fox jumps over the lazy dog"
  cText2     text "Hello World! HELLO world! hello WORLD!"
  cText3     text "ababababababababababab"
  cText4     text "test*pattern?matching*test"

  cPattern1  text "fox"
  cPattern2  text "HELLO"
  cPattern3  text "abab"
  cPattern4  text "test*"

  cFound     text "  Found at position: "
  cNotFound  text "  Pattern not found", 13, 10
  cCount     text "  Occurrences: "
  cLastPos   text "  Last position: "

  cDone      text 13, 10, "Demo 16 complete!", 13, 10
  cError     text 13, 10, "Error!", 13, 10
endg

; ========================================
; Entry Point
; ========================================
start:
        InitializeAll

        stdcall FileWriteString, [STDOUT], cTitle

; ========================================
; Demo 1: Basic substring search
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel1

; Create TText
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText1
        mov     [pText], edx

; Prepare search pattern
        stdcall StrDupMem, cPattern1
        test    eax, eax
        jz      .error
        mov     [hSearch], eax

; Prepare KMP search table
        stdcall TextPrepareSearch, [hSearch], tsfCaseSensitive
        test    eax, eax
        jz      .error
        mov     [pIndexTable], eax

; Search for "fox"
        stdcall TextSearch, [pText], [hSearch], 0, 0, tsfCaseSensitive
        test    eax, eax
        jz      .not_found1

        stdcall FileWriteString, [STDOUT], cFound
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF
        jmp     .demo1_done

.not_found1:
        stdcall FileWriteString, [STDOUT], cNotFound

.demo1_done:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 2: Case-insensitive search
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel2

; Create new text
        stdcall TextFree, [pText]
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText2
        mov     [pText], edx

; Free old search
        stdcall StrDel, [hSearch]
        stdcall FreeMem, [pIndexTable]

; Prepare case-insensitive search
        stdcall StrDupMem, cPattern2
        test    eax, eax
        jz      .error
        mov     [hSearch], eax

        stdcall TextPrepareSearch, [hSearch], tsfCaseIgnore
        test    eax, eax
        jz      .error
        mov     [pIndexTable], eax

; Search for "HELLO" (case-insensitive)
        stdcall TextSearch, [pText], [hSearch], 0, 0, tsfCaseIgnore
        test    eax, eax
        jz      .not_found2

; Show first occurrence
        stdcall FileWriteString, [STDOUT], <"  Found 'HELLO' at position: ">
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Count all occurrences
        mov     [nCount], 0
        mov     [nLastPos], 0

.find_all:
        stdcall TextSearch, [pText], [hSearch], [nLastPos], 0, tsfCaseIgnore
        test    eax, eax
        jz      .done_count

        inc     [nCount]
        mov     [nLastPos], eax
        inc     [nLastPos]  ; Start search after this position
        jmp     .find_all

.done_count:
        stdcall FileWriteString, [STDOUT], cCount
        stdcall NumToStr, [nCount], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF
        jmp     .demo2_done

.not_found2:
        stdcall FileWriteString, [STDOUT], cNotFound

.demo2_done:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 3: Multiple occurrences
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel3

; Create new text
        stdcall TextFree, [pText]
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText3
        mov     [pText], edx

; Free old search
        stdcall StrDel, [hSearch]
        stdcall FreeMem, [pIndexTable]

; Prepare new search
        stdcall StrDupMem, cPattern3
        test    eax, eax
        jz      .error
        mov     [hSearch], eax

        stdcall TextPrepareSearch, [hSearch], tsfCaseSensitive
        test    eax, eax
        jz      .error
        mov     [pIndexTable], eax

; Find all occurrences of "abab"
        mov     [nCount], 0
        mov     [nLastPos], 0

.find_all3:
        stdcall TextSearch, [pText], [hSearch], [nLastPos], 0, tsfCaseSensitive
        test    eax, eax
        jz      .done_count3

; Show this occurrence
        stdcall FileWriteString, [STDOUT], <"  Found at position: ">
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nCount]
        mov     [nLastPos], eax
        inc     [nLastPos]
        jmp     .find_all3

.done_count3:
        stdcall FileWriteString, [STDOUT], cCount
        stdcall NumToStr, [nCount], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 4: Pattern with wildcards
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel4
        stdcall FileWriteString, [STDOUT], <"  (Wildcards not implemented in basic TText search)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  Use StrLib pattern matching instead", 13, 10>

; ========================================
; Cleanup
; ========================================
        stdcall TextFree, [pText]
        stdcall StrDel, [hSearch]
        stdcall FreeMem, [pIndexTable]

        stdcall FileWriteString, [STDOUT], cDone

.finish:
        FinalizeAll
        stdcall TerminateAll, 0

.error:
        stdcall FileWriteString, [STDOUT], cError
        FinalizeAll
        stdcall TerminateAll, 1

; ========================================
; FUNCTION REFERENCE
; ========================================
;
; TextPrepareSearch - Prepare pattern for KMP search
;   Arguments: .hSubstr - String handle containing search pattern
;              .flags - Search flags (tsfCaseSensitive or tsfCaseIgnore)
;   Returns:   EAX - Pointer to KMP prefix table (must be freed with FreeMem)
;              CF=1 on error
;   Notes:     Must be called before TextSearch
;
; TextSearch - Find substring using KMP algorithm
;   Arguments: .pText - TText pointer to search in
;              .hSubstr - String handle containing search pattern
;              .From - Starting position (0 for beginning)
;              .pIndex - Pointer to KMP table from TextPrepareSearch
;              .flags - Search flags (must match TextPrepareSearch)
;   Returns:   EAX - Position of match, or 0 if not found
;              CF=0 on success, CF=1 on not found
;   Notes:     For multiple searches, increment .From after each match
;
24
27.12.25 16:19
; Demo 17: TText UTF-8 Character Handling
; Demonstrates multi-byte UTF-8 character operations

include "%lib%/freshlib.inc"

LINUX_INTERPRETER equ './ld-musl-i386.so'

@BinaryType console, compact
LIB_MODE equ NOGUI

options.DebugMode = 0

include "%lib%/freshlib.asm"

; ========================================
; Data Section
; ========================================
uglobal
  pText          dd ?
  hChars         dd ?
  nIndex         dd ?
  nPos           dd ?
  nCount         dd ?
endg

iglobal
  cCRLF      text 13, 10

  cTitle     text "=== Demo 17: TText UTF-8 Character Handling ===", 13, 10, 13, 10

  cLabel1    text "1. ASCII text (single-byte UTF-8):", 13, 10
  cLabel2    text "2. European text (2-byte UTF-8):", 13, 10
  cLabel3    text "3. Asian text (3-byte UTF-8):", 13, 10
  cLabel4    text "4. Mixed text with emojis:", 13, 10

; UTF-8 encoded strings
  cASCII      text "Hello World"
  cEuropean   text "H€llö Wörld"
  cAsian      text "你好世界"
  cMixed      text "Hello 👋 World 🌍 こんにちは"

  cBytePos    text "  Byte position: "
  cCount      text "  Character count: "
  cInfo       text "  [byte] -> [character index] -> [UTF-8 character]"
  cArrow      text " -> "
  cTextColon  text "  Text: "
  cChars      text "  Characters:", 13, 10
  cCharPrefix text "  Character "
  cColonSpace text ": "
  cUTF8Multi  text "[UTF-8 multi-byte] "
  cSpace      text " "
  cAtPosition text " at position "
  cAtByte     text " at byte "
  cNote1      text "  Note: This contains", 13, 10
  cNote2      text "  - ASCII characters (1 byte)", 13, 10
  cNote3      text "  - Emoji characters (4 bytes each in UTF-8)", 13, 10
  cNote4      text "  - CJK characters (3 bytes each)", 13, 10

  cDone       text 13, 10, "Demo 17 complete!", 13, 10
  cError      text 13, 10, "Error!", 13, 10
endg

; ========================================
; Entry Point
; ========================================
start:
        InitializeAll

        stdcall FileWriteString, [STDOUT], cTitle

; ========================================
; Demo 1: ASCII text (single-byte UTF-8)
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel1

; Create TText with ASCII text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cASCII
        mov     [pText], edx

; Show byte positions and character indices
        stdcall FileWriteString, [STDOUT], cInfo
        stdcall FileWriteString, [STDOUT], cCRLF

; Iterate through text
        mov     [nPos], 0
.iter_ascii:
; Check if position is within text
        stdcall TextCompact, [pText]
        stdcall StrLen, [pText]
        push    eax
        stdcall FileWriteString, [STDOUT], [pText]
        pop     eax
        push    eax
        stdcall StrLen, [pText]
        pop     ecx
        mov     eax, [nPos]
        cmp     eax, ecx
        jae     .done_ascii

; Show byte position
        stdcall FileWriteString, [STDOUT], cBytePos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

; Convert to character index
        stdcall TextPosToIndex, [pText], [nPos]
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

; Get character at this position
        stdcall TextGetChar, [pText]
        cmp     al, 0
        je      .done_ascii

; Show character (ASCII only)
        push    eax
        movzx   eax, al      ; Convert 8-bit to 32-bit for NumToStr
        stdcall NumToStr, eax, ntsHex
        push    eax
        stdcall FileWriteString, [STDOUT], <' ['>
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], <'] '>
        pop     eax
        stdcall StrDel, eax
        pop     eax

        stdcall FileWriteString, [STDOUT], cCRLF

; Next position
        inc     [nPos]
        jmp     .iter_ascii

.done_ascii:
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall TextFree, [pText]

; ========================================
; Demo 2: European text (2-byte UTF-8)
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel2

; Create TText with European text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cEuropean
        mov     [pText], edx

; Show character mapping
        stdcall FileWriteString, [STDOUT], cTextColon
        stdcall TextCompact, [pText]
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], cCRLF

; Show individual characters
        stdcall FileWriteString, [STDOUT], cChars

; Convert each character index to position
        mov     [nIndex], 0
.iter_europe:
; Convert index to position
        stdcall TextIndexToPos, [pText], [nIndex]
        mov     [nPos], eax
        test    eax, eax
        jz      .done_europe

; Get character at position
        stdcall TextGetChar, [pText]
        cmp     al, 0
        je      .done_europe

; Show character info
        stdcall FileWriteString, [STDOUT], cCharPrefix
        stdcall NumToStr, [nIndex], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cColonSpace

; Show UTF-8 byte sequence
        mov     ecx, [nPos]
        stdcall TextGetChar, [pText]

; Simple output for 2-byte UTF-8
        cmp     al, $80
        jb      .single_byte_e

; Multi-byte character
        stdcall FileWriteString, [STDOUT], cUTF8Multi
        jmp     .show_code_e

.single_byte_e:
; Single byte character
        stdcall FileWriteString, [STDOUT], <'['>
; Note: We're simplifying here - real UTF-8 handling is complex
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], <'] '>

.show_code_e:
; Show character codes
        movzx   eax, al
        stdcall NumToStr, eax, ntsHex
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cSpace

; Show position
        stdcall FileWriteString, [STDOUT], cAtPosition
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Next character
        inc     [nIndex]
        jmp     .iter_europe

.done_europe:
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall TextFree, [pText]

; ========================================
; Demo 3: Asian text (3-byte UTF-8)
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel3

; Create TText with Asian text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cAsian
        mov     [pText], edx

; Show text
        stdcall FileWriteString, [STDOUT], cTextColon
        stdcall TextCompact, [pText]
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], cCRLF

; Count characters by converting position to index
        stdcall StrLen, [pText]
        mov     [nCount], 0
        mov     [nPos], 0

.count_chars:
; Convert position to character index
        stdcall TextPosToIndex, [pText], [nPos]
        test    eax, eax
        jz      .count_done

; Show this character
        stdcall FileWriteString, [STDOUT], cCharPrefix
        stdcall NumToStr, eax, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cAtByte
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Move to next byte position (simplified - real UTF-8 would track byte length)
        inc     [nCount]
        inc     [nPos]
        jmp     .count_chars

.count_done:
        stdcall FileWriteString, [STDOUT], cCount
        stdcall NumToStr, [nCount], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall FileWriteString, [STDOUT], cCRLF
        stdcall TextFree, [pText]

; ========================================
; Demo 4: Mixed text with emojis
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel4

; Create TText with mixed text
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cMixed
        mov     [pText], edx

; Show text
        stdcall FileWriteString, [STDOUT], cTextColon
        stdcall TextCompact, [pText]
        stdcall FileWriteString, [STDOUT], [pText]
        stdcall FileWriteString, [STDOUT], cCRLF

; Note about mixed content
        stdcall FileWriteString, [STDOUT], cNote1
        stdcall FileWriteString, [STDOUT], cNote2
        stdcall FileWriteString, [STDOUT], cNote3
        stdcall FileWriteString, [STDOUT], cNote4

; ========================================
; Cleanup
; ========================================
        stdcall TextFree, [pText]

        stdcall FileWriteString, [STDOUT], cDone

.finish:
        FinalizeAll
        stdcall TerminateAll, 0

.error:
        stdcall FileWriteString, [STDOUT], cError
        FinalizeAll
        stdcall TerminateAll, 1

; ========================================
; FUNCTION REFERENCE
; ========================================
;
; TextIndexToPos - Convert character index to byte position
;   Arguments: .pText - TText pointer
;              .index - Character index (0-based)
;   Returns:   EAX - Byte position, or high value if index > text length
;              CF=0 on success, CF=1 if index beyond text
;   Notes:     Takes UTF-8 encoding into account
;
; TextPosToIndex - Convert byte position to character index
;   Arguments: .pText - TText pointer
;              .Pos - Byte position
;   Returns:   EAX - Character index (0-based)
;   Notes:     Converts byte position to UTF-8 character count
;
23
27.12.25 16:20
; Demo 18: TText Coordinate Conversion
; Demonstrates conversion between offset, position, and index coordinate systems

include "%lib%/freshlib.inc"

LINUX_INTERPRETER equ './ld-musl-i386.so'

@BinaryType console, compact
LIB_MODE equ NOGUI

options.DebugMode = 0

include "%lib%/freshlib.asm"

; ========================================
; Data Section
; ========================================
uglobal
  pText          dd ?
  nOffset        dd ?
  nPos           dd ?
  nIndex         dd ?
  nLength        dd ?
endg

iglobal
  cCRLF      text 13, 10

  cTitle     text "=== Demo 18: TText Coordinate Conversion ===", 13, 10, 13, 10

  cLabel1    text "1. Understanding the three coordinate systems:", 13, 10
  cLabel2    text "2. Position to Offset conversion:", 13, 10
  cLabel3    text "3. Offset to Position conversion:", 13, 10
  cLabel4    text "4. Complete conversion chain:", 13, 10

  cText1     text "Hello World"
  cText2     text "The quick brown fox"

  cOffset    text "  Offset: "
  cPos       text "  Position: "
  cIndex     text "  Index: "

  cArrow     text " -> "

  cDone      text 13, 10, "Demo 18 complete!", 13, 10
  cError     text 13, 10, "Error!", 13, 10
endg

; ========================================
; Entry Point
; ========================================
start:
        InitializeAll

        stdcall FileWriteString, [STDOUT], cTitle

; ========================================
; Demo 1: Understanding the three coordinate systems
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel1

; Explain the coordinate systems
        stdcall FileWriteString, [STDOUT], <"  OFFSET - Raw byte position in buffer (includes gap)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  POSITION - Logical byte position (excludes gap)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  INDEX   - UTF-8 character position", 13, 10>
        stdcall FileWriteString, [STDOUT], cCRLF

; Create TText
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

; Add text
        stdcall TextAddString, [pText], -1, cText1
        mov     [pText], edx

; Show initial state
        stdcall FileWriteString, [STDOUT], <"  Text: 'Hello World'", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  Length: 11 bytes, 11 characters (ASCII)", 13, 10>
        stdcall FileWriteString, [STDOUT], <"  Gap: at position 11 (end of text)", 13, 10>

; Show structure fields
        mov     esi, [pText]
        stdcall FileWriteString, [STDOUT], <"  Structure:", 13, 10>
        stdcall FileWriteString, [STDOUT], <"    .Length = ">
        stdcall NumToStr, [esi+TText.Length], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], <"    .GapBegin = ">
        stdcall NumToStr, [esi+TText.GapBegin], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], <"    .GapEnd = ">
        stdcall NumToStr, [esi+TText.GapEnd], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 2: Position to Offset conversion
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel2

; Before moving gap, position equals offset
        stdcall FileWriteString, [STDOUT], <"  With gap at end:", 13, 10>

        mov     [nPos], 0
.convert_loop1:
        cmp     [nPos], 12
        jae     .done_convert1

        stdcall TextPosToOffset, [pText], [nPos]
        mov     [nOffset], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nPos]
        jmp     .convert_loop1

.done_convert1:
        stdcall FileWriteString, [STDOUT], cCRLF

; Now move gap to middle
        stdcall FileWriteString, [STDOUT], <"  Move gap to position 6:", 13, 10>
        stdcall TextMoveGap, [pText], 6

; Show new structure
        mov     esi, [pText]
        stdcall FileWriteString, [STDOUT], <"    .GapBegin = ">
        stdcall NumToStr, [esi+TText.GapBegin], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        stdcall FileWriteString, [STDOUT], <"    .GapEnd = ">
        stdcall NumToStr, [esi+TText.GapEnd], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Now position != offset
        stdcall FileWriteString, [STDOUT], <"  Position to Offset (gap at 6):", 13, 10>

        mov     [nPos], 0
.convert_loop2:
        cmp     [nPos], 12
        jae     .done_convert2

        stdcall TextPosToOffset, [pText], [nPos]
        mov     [nOffset], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nPos]
        jmp     .convert_loop2

.done_convert2:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 3: Offset to Position conversion
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel3

        stdcall FileWriteString, [STDOUT], <"  Offset to Position (gap at 6):", 13, 10>

        mov     [nOffset], 0
.convert_loop3:
        cmp     [nOffset], 18
        jae     .done_convert3

        stdcall TextOffsetToPos, [pText], [nOffset]
        mov     [nPos], eax

        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax

        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

        inc     [nOffset]
        jmp     .convert_loop3

.done_convert3:
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Demo 4: Complete conversion chain
; ========================================
        stdcall FileWriteString, [STDOUT], cLabel4

; Create new text
        stdcall TextFree, [pText]
        stdcall TextCreate, sizeof.TText
        test    eax, eax
        jz      .error
        mov     [pText], eax

        stdcall TextAddString, [pText], -1, cText2
        mov     [pText], edx

        stdcall FileWriteString, [STDOUT], <"  Text: 'The quick brown fox'", 13, 10>
        stdcall FileWriteString, [STDOUT], cCRLF

; Show conversions for character at index 5
        stdcall FileWriteString, [STDOUT], <"  Character index 5 ('q' in 'quick'):", 13, 10>

; Index to Position
        stdcall TextIndexToPos, [pText], 5
        mov     [nPos], eax

        stdcall FileWriteString, [STDOUT], cIndex
        stdcall NumToStr, 5, ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Position to Offset
        stdcall TextPosToOffset, [pText], [nPos]
        mov     [nOffset], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Reverse: Offset to Position
        stdcall TextOffsetToPos, [pText], [nOffset]
        mov     [nPos], eax

        stdcall FileWriteString, [STDOUT], cOffset
        stdcall NumToStr, [nOffset], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; Position to Index
        stdcall TextPosToIndex, [pText], [nPos]
        mov     [nIndex], eax

        stdcall FileWriteString, [STDOUT], cPos
        stdcall NumToStr, [nPos], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cArrow
        stdcall FileWriteString, [STDOUT], cIndex
        stdcall NumToStr, [nIndex], ntsDec or ntsUnsigned
        push    eax
        stdcall FileWriteString, [STDOUT], eax
        pop     eax
        stdcall StrDel, eax
        stdcall FileWriteString, [STDOUT], cCRLF

; ========================================
; Cleanup
; ========================================
        stdcall TextFree, [pText]

        stdcall FileWriteString, [STDOUT], cDone

.finish:
        FinalizeAll
        stdcall TerminateAll, 0

.error:
        stdcall FileWriteString, [STDOUT], cError
        FinalizeAll
        stdcall TerminateAll, 1

; ========================================
; FUNCTION REFERENCE
; ========================================
;
; TextPosToOffset - Convert position to offset (including gap)
;   Arguments: .pText - TText pointer
;              .pos - Byte position (excluding gap)
;   Returns:   EAX - Offset in buffer (including gap)
;   Notes:     Used when you need raw memory position
;
; TextOffsetToPos - Convert offset to position (excluding gap)
;   Arguments: .pText - TText pointer
;              .offs - Offset in buffer (including gap)
;   Returns:   EAX - Byte position (excluding gap)
;   Notes:     Reverse of TextPosToOffset
;
; TextIndexToPos - Convert character index to byte position
;   Arguments: .pText - TText pointer
;              .index - Character index (UTF-8 aware)
;   Returns:   EAX - Byte position
;   Notes:     Takes UTF-8 encoding into account
;
; TextPosToIndex - Convert byte position to character index
;   Arguments: .pText - TText pointer
;              .Pos - Byte position
;   Returns:   EAX - Character index
;   Notes:     UTF-8 character counting
;

Tutorial 05: TText - Advanced Operations

0
#