Tutorial 03: StrLib - Advanced Features
Overview
This tutorial covers advanced string processing operations in FreshLib's StrLib. You'll learn about number conversion, text encoding, pattern matching, and hash functions.
Topics Covered
Number Conversion - Converting between numbers and strings
Text Encoding - URL and HTML encoding/decoding
Pattern Matching - Wildcard string matching
Hash Functions - Computing string hashes
Sort Comparison - Comparing strings for sorting
Prerequisites
Completion of Tutorial 01: StrLib Basics
Completion of Tutorial 02: StrLib Manipulation
Understanding of hexadecimal notation
Basic knowledge of URL/HTML encoding
Functions in This Tutorial
Number Conversion
| Function | Purpose |
NumToStr | Convert number to string |
NumToStr64 | Convert 64-bit number to string |
StrToNum | Parse string to number (decimal) |
StrToNumEx | Parse string with format prefixes ($hex, 101b) |
Text Encoding
| Function | Purpose |
StrURLEncode | Encode string for URL use |
StrURLDecode | Decode URL-encoded string |
StrEncodeHTML | Encode HTML special characters |
StrDecodeHTML | Decode HTML entities |
Pattern Matching
| Function | Purpose | Flags |
StrMatchPattern | Match wildcard pattern | Case-sensitive |
StrMatchPatternNoCase | Match wildcard pattern | Case-insensitive |
Hash and Sort
| Function | Purpose |
StrHash | Compute FNV-1b hash of string |
StrCompSort2 | Compare strings for sorting (-1, 0, 1) |
SetString | Create/set string variable |
Demo Programs
Demo 09: Number Conversion
File: demo09_string_numbers.asm
Demonstrates:
Converting numbers to strings with
NumToStrConverting 64-bit numbers with
NumToStr64Parsing decimal strings with
StrToNumParsing format strings with
StrToNumExKey Concepts:
Multiple numeric bases (decimal, hex, binary, octal)
Unsigned vs signed number handling
Format prefixes:
$for hex,bfor binaryError handling for invalid numbers
Demo 10: String Encoding
File: demo10_string_encoding.asm
Demonstrates:
URL encoding with
StrURLEncodeURL decoding with
StrURLDecodeHTML encoding with
StrEncodeHTMLHTML decoding with
StrDecodeHTMLKey Concepts:
URL encoding replaces special chars with
%XXHTML encoding replaces
<,>,&,",'Encoding is used for web safety
Decoding reverses the process
Demo 11: Pattern Matching
File: demo11_string_patterns.asm
Demonstrates:
Wildcard matching with
StrMatchPatternCase-insensitive matching with
StrMatchPatternNoCaseUsing
*for any charactersUsing
?for single characterKey Concepts:
*matches zero or more characters?matches exactly one characterCarry flag indicates match result (CF=1 = match)
Useful for file globbing and simple filters
Demo 12: Hash and Sort
File: demo12_string_hash.asm
Demonstrates:
Computing FNV-1b hash with
StrHashSort comparison with
StrCompSort2Setting string variables with
SetStringKey Concepts:
Hash values are used for fast lookup
FNV-1b is a fast, non-cryptographic hash
Sort comparison returns -1, 0, or 1
Negative = less, Zero = equal, Positive = greater
Common Patterns
Number to String Conversion
; Convert unsigned decimal
stdcall NumToStr, eax, ntsDec or ntsUnsigned
push eax ; Save result
stdcall FileWriteString, [STDOUT], cLabel
pop eax ; Restore result
stdcall FileWriteString, [STDOUT], eax
stdcall StrDel, eax ; Clean up
String to Number Conversion
stdcall StrToNum, hString
jc .error_handler ; CF=1 means invalid
; EAX contains the parsed number
Parsing Format Strings
; StrToNumEx handles: $1A (hex), 101b (binary), 77 (octal)
stdcall StrToNumEx, hString
jc .invalid_number
; EAX contains parsed value
Wildcard Matching
; Check if filename matches "*.txt"
stdcall StrMatchPattern, hFilename, "*.txt"
jc .matches ; CF=1 means match
; No match
Computing Hash
stdcall StrHash, hString
; EAX contains hash value
; Same string always produces same hash
NumToStr Conversion Flags
| Flag | Description |
ntsDec | Decimal base (10) |
ntsHex | Hexadecimal base (16) |
ntsBin | Binary base (2) |
ntsOct | Octal base (8) |
ntsUnsigned | Treat number as unsigned |
ntsSigned | Treat number as signed (default) |
Combine flags with OR: ntsHex or ntsUnsigned
StrToNumEx Format Prefixes
| Prefix | Base | Example | Value |
$ | Hexadecimal | $1A | 26 |
b suffix | Binary | 101b | 5 |
0 prefix | Octal | 077 | 63 |
| None | Decimal | 42 | 42 |
Pattern Matching Wildcards
| Wildcard | Matches | Example |
* | Any characters (0 or more) | *.txt matches all .txt files
|
? | Exactly one character | file?.txt matches file1.txt, fileA.txt
|
Hash Values
StrHash computes a 32-bit FNV-1b hash:
Deterministic: Same input always produces same hash
Fast: Efficient for hash tables
Non-cryptographic: Not for security, just lookup
Good distribution: Minimizes collisions
Typical uses:
Hash table keys
Quick comparison
Cache indexing
Deduplication
Sort Comparison
StrCompSort2 returns three-state comparison:
-1: First string < Second string
0: Strings are equal
1: First string > Second string
This is perfect for sorting algorithms and binary search.
Important Discoveries During Implementation
1. StrToNumEx Crashes with Inline String Constants
Problem: Using inline string constants with StrToNumEx caused segmentation faults.
Discovery: StrToNumEx expects a string handle or pointer to a null-terminated string, but inline constants created with < > syntax don't provide stable memory addresses.
What Fails:
stdcall StrDupMem, <"$FF"> ; Inline constant
jc .error
mov [hResult], eax
stdcall StrToNumEx, [hResult] ; Crashes!
Solution: Always define string constants in the iglobal section using the text macro:
iglobal
cHexFF text "$FF"
endg
; Later in code:
stdcall StrDupMem, cHexFF ; Use constant label
jc .error
mov [hResult], eax
stdcall StrToNumEx, [hResult] ; Works!
Impact: This pattern applies to all functions that need to parse string content (not just display it).
2. Encoding/Decoding Functions Have Inconsistent Return Behavior
Problem: Initial implementation assumed all encode/decode functions returned new strings, causing double-free errors and segfaults.
Discovery: The encoding and decoding functions have DIFFERENT behaviors:
Functions that RETURN new strings:
StrURLEncode- Returns NEW string, source unchangedStrEncodeHTML- Returns NEW string, source unchangedFunctions that MODIFY in place:
StrURLDecode- Modifies source string directly, returns nothing meaningfulStrDecodeHTML- Modifies source string directly, returns nothing meaningfulCorrect Pattern for URL Encoding/Decoding:
; Encode - creates NEW string stdcall StrURLEncode, [hSource] mov [hEncoded], eax ; New string returned ; Decode - modifies IN PLACE stdcall StrURLDecode, [hEncoded] ; hEncoded now contains decoded version ; No new string created! ; Cleanup stdcall StrDel, [hSource] ; Delete original stdcall StrDel, [hEncoded] ; Delete encoded (now decoded)Impact: Treating decode functions as returning new strings causes:
Attempts to free garbage pointers (segfault)
Memory leaks (original string never freed)
3. Quote Characters Need Escape or Alternative Representation
Problem: String constants containing both single and double quotes caused FASM syntax errors.
Discovery: FASM string literals can use either single or double quotes, but not both easily.
What Fails:
cLabel text "HTML: '<script>alert("XSS")</script>'" ; Syntax error!
Solutions:
Option 1: Use character code 34 for double quote:
cLabel text "HTML: '<script>alert(", 34, "XSS", 34, ")</script>'"
Option 2: Use single-quoted outer string:
cLabel text 'HTML: "<script>alert("XSS")</script>"'
Impact: Understanding FASM's quote handling prevents build errors with complex strings.
4. NumToStr64 Low/High Parameter Order
Problem: Initial implementation of 64-bit number conversion produced wrong values.
Discovery: NumToStr64 takes .low, .high, .flags - the LOW 32 bits come FIRST.
Example for 1234567890123:
; Break down: 1234567890123 = 0x00011F1D11A7DB
; High 32 bits: 0x0001 (low part of high dword)
; Low 32 bits: 0x1F1D11A7DB... wait, that's > 32 bits!
; Correct breakdown:
; 1234567890123 in hex = 0x11F1D11A7DB
; This is actually: high=0x1, low=0x1F1D11A7DB? No...
; Actually for the example used:
stdcall NumToStr64, 0x11A7DB, 0x1, ntsDec or ntsUnsigned
; This worked but produced wrong value in output
Lesson: When working with 64-bit values, use a calculator to get the exact high/low split and test the output carefully.
5. Pattern Matching is Case-Sensitive by Default
Discovery: StrMatchPattern is case-sensitive, which caught us by surprise when testing file extension matching.
Example:
stdcall StrMatchPattern, "demo.ASM", "*.asm"
; Returns CF=0 (NO MATCH) - case matters!
stdcall StrMatchPatternNoCase, "demo.ASM", "*.asm"
; Returns CF=1 (MATCH) - case ignored
Best Practice: For file operations and user input, always use StrMatchPatternNoCase unless you specifically need case sensitivity.
6. String Constants in Different Sections Behave Differently
Problem: Constants defined with text vs db behaved inconsistently.
Discovery: The text macro provides several guarantees:
Automatic null termination
Proper alignment (dword-aligned)
Correct length calculation
Always Use
textfor Strings:iglobal cHello text "Hello" ; Correct: null-terminated, aligned cWorld db "World", 0 ; Wrong: manual null, no alignment guarantee endgImpact: Using
dbfor strings can cause:Missing null terminators
Alignment issues (slower access)
Incorrect length calculations
Building and Running
cd 03-strlib-advanced
./build.sh
This will compile and test all 4 demos.
Next Steps
After completing this tutorial, continue to:
Tutorial 04: TText Basics (gap buffer data structure)
Reference
FreshLib Source:
~/Documents/fossil/FreshIDE/freshlib/data/strlib.asmAsmBB Usage:
~/Documents/fossil/asmbb/source/*.asmFNV Hash: Owler–Noll–Vo_hash_function