How the TranslateMiniMag function work?

0
#
29
23.11.2025

The TranslateMiniMag function, written in assembly language, is designed to convert text formatted with a custom markup language called "MiniMag" into HTML code. It processes the input text in two main passes: first to identify and process link definitions, and second to parse and translate the remaining structure and inline formatting into HTML.

Here is a breakdown of how the function works, section by section:

1. Initialization and Setup

The function sets up the environment before processing:

  • State Variables: Initializes various local variables, including flags (to control formatting behavior), whspc (whitespace count), and hdr (header level).

  • Data Structures:

    • Allocates a memory block for link definitions (.pLinks), which acts as a hash table.

    • Creates a dynamic stack array (.pStack) to manage the nesting of MiniMag and HTML tags, storing TMDItem structures (containing tag type and possibly indentation).

  • Gap Positioning: Uses TextMoveGap to position a "gap" in the text buffer at the starting position specified by .from. This gap is where the generated HTML output will be inserted without shifting the text in memory immediately, which is a common technique in certain text buffer implementations.

2. First Pass: Link Definition Processing

The first pass (.fp_new_line to .second_pass) focuses on identifying and extracting link definitions, which are expected to appear at the start of a line and use the format [label] url.

  • Line-by-Line Scan: It iterates through the text character by character, primarily looking for the starting bracket [ at the beginning of a line.

  • Link Label Extraction and Hashing:

    • Upon finding [, it moves the text gap to that position (TextMoveGap).

    • It then calculates a hash value for the label (the text between [ and ]). This hash is used to quickly look up the link's definition later.

    • It extracts the label content into a temporary string (.tmp_link).

  • URL Extraction: After the closing ], it skips any whitespace and then extracts the URL which is expected to follow on the same line.

  • Storing the Link: The full link definition (label and URL) is concatenated into .tmp_link. It then uses the calculated hash to store the pointer to this string in the .pLinks hash table, handling potential collisions by overwriting an existing entry (or simply deleting it if the slot is already taken and full, which is a crude form of collision resolution).

  • Deleting Definition: The entire line containing the processed link definition is "deleted" from the output text by extending the text gap ([edx+TText.GapEnd]).

  • HTML Encoding Check: If a character is not part of a link definition and needs HTML encoding (like < or &), it performs the encoding by replacing the original character with its HTML entity within the text gap.

3. Second Pass: Structure and Tag Formatting

The second pass (.second_pass onwards) processes the remaining text, now stripped of link definitions, to convert MiniMag block and inline elements into HTML.

A. Block-Level Element Identification (Start of Line)

The processing is done line by line, starting at .start_of_line. It determines the type of block element at the start of the line:

  • MiniMag Commands (;...):

    • Checks for block commands like ;spoiler, ;quote, ;table, ;begin (for code/non-formatted block), ;ulist, ;olist, and ;end.

    • For opening tags (e.g., ;quote), it calls .close_all_non_minimag to close any non-MiniMag block tags (like <p>) before opening the new tag (e.g., <blockquote>). It then pushes the new tag type onto the .pStack and inserts the corresponding HTML prefix and suffix (e.g., <blockquote...>).

    • For the ;end command, it calls .pop_one_tag to close the last opened block tag.

    • The command and any following arguments/whitespace up to the EOL are consumed (deleted via [edx+TText.GapEnd]).

  • Horizontal Ruler (;---): If found, it checks the stack for table-related tags (<td>, <th>, <tr>, <table>). If inside a table, it may close row/cell tags and open a new row (<tr>); otherwise, it inserts an <hr> tag.

  • Header (#): Counts consecutive # characters to determine the header level (1-6). If a valid header, it sets the .hdr variable.

  • List Item (*, -, +): If a list item marker is found followed by a space, it triggers list logic (in the Markdown-related function, but here it simply checks for *).

  • Whitespace and Indent: Counts leading whitespace (.whspc).

  • Paragraph/Block Tag Opening (.add_leaf): If the line contains content and no block tag has been opened yet, it determines the appropriate block tag:

    • If inside a <table>, it opens a <tr> and then a <td> or <th> (if a header was detected).

    • If a header was detected (.hdr is set), it opens the corresponding header tag (<h1> to <h6>).

    • Otherwise, it opens a Paragraph tag (<p>).

    • The tag is pushed onto the .pStack using .open_tag (which calls .prefix_tag, .add_to_stack, and .suffix_tag).

B. Inline Element Processing and HTML Encoding

The inner loop (.skip_to_eol to .eol_found) processes inline content:

  • HTML Encoding: Encodes characters into HTML entities (e.g., < to &lt;) unless mmfDontFormat or mmfNoHTML flags are set.

  • Inline Formatting Tags: Checks for markers like * (bold/<strong>), / (italic/<em>), _ (underline/<ins>), and - (strikethrough/<del>) and a preceding space.

    • If a marker matches the type of the last tag on the stack, it closes that tag.

    • Otherwise, it opens the corresponding tag and pushes it onto the stack, provided the preceding character was whitespace or EOL/BOF and the following is not whitespace (to avoid treating * as bold marker in words like mini*mag*).

  • Code Block Marker ():** Toggles the mmfNoInline` flag and opens/closes the tagInlineCode ().

  • Links and Images ([):

    • This is where the previously defined links are used, or inline links are resolved.

    • It checks for image/video/anchor/URL markers (!, ?, $, #, or nothing).

    • It recalculates the label's hash and searches the .pLinks table for a definition.

    • If found (a defined link), or if it's an inline link, it determines the URL/label part.

    • It then inserts the opening tag, the sanitized URL (via the call to .procSanitizeURL), and the closing tag, effectively replacing the MiniMag link syntax with the HTML tag.

  • Emoji: Checks for UTF-8 character sequences that match known emojis using IsEmoji and replaces them with an HTML span structure (using templates in emoJ).

C. End of Line and End of Text

  • End of Line: When EOL is found (.eol_found), processing jumps back to .start_of_line to handle the next line.

  • Empty Line (.empty_line_found): When an empty line is encountered, it checks the stack and closes "text leaf blocks" (like <p>, <th>, <td>, <li>) but typically keeps major blocks (like <blockquote> or <ul>/<ol>) open, enabling paragraph separation.

  • End of Text (.end_of_text): All remaining open tags on the stack are closed, and the allocated memory for the link table and stack is freed.

The function returns the pointer to the modified TText structure (edx), which now contains the HTML output.

How the TranslateMiniMag function work?

0
#