Lesson 25 of the SQL Optimization Course: Full-Text Indexes

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

DBA
Database developers
Students

We will use MySQL as the demonstration database.

Introduction to Full-Text Indexing

Traditional SQL queries like SELECT COUNT(*) FROM fx WHERE s1 LIKE '%cluster%' suffer from poor performance due to full table scans. Full-Text Indexes solve this by leveraging inverted indexes, which store term frequencies and positions instead of raw data. This approach enables fast text searches, even on large datasets.

Key advantages:

Efficient keyword searching
Supports complex queries (e.g., proximity, relevance ranking)
Reduces reliance on costly full table scans

How Full-Text Indexes Work

Inverted Index Structure

Unlike B-tree indexes, full-text indexes use an inverted index to map terms to documents:

KEY: Keyword (e.g., “cluster”)
VALUE: List of document IDs and term positions

Example Workflow

Index Creation:

1
ALTER TABLE ft_sample ADD FULLTEXT ft_s1(s1);

Query Execution:

1
2
SELECT * FROM ft_sample 
WHERE MATCH(s1) AGAINST('mysql oracle');

Underlying Storage Mechanisms

Auxiliary Tables

MySQL creates hidden tables to manage inverted index data:

fts_00000000000004c2_00000000000001ba_index_1 to _index_6: Store term-document mappings.
fts_being_deleted / fts_deleted: Track pending deletions. These tables follow the naming convention:

1
2

{database_name}/fts_{hex_table_id}_{hex_field_id}_index_{partition}

Buffer Pool

The innodb_ft_index_cache buffer pool caches pending writes to reduce disk I/O. Key variables:

innodb_ft_cache_size: Per-table buffer size (default 8MB).
innodb_ft_total_cache_size: Global buffer limit (default 640MB).

Advanced Features

Document ID Management

FTS_DOC_ID: A hidden column (type BIGINT UNSIGNED) mapping rows to inverted index entries.
Explicit declaration:

1
2
ALTER TABLE ft_sample 
ADD FTS_DOC_ID BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;

Transaction Behavior

Full-text index updates are deferred until transaction commit:

1
2
3
4
5
BEGIN;
INSERT INTO ft_sample (s1) VALUES ('new document content');
-- Changes not visible until COMMIT
COMMIT;
SELECT * FROM ft_sample WHERE MATCH(s1) AGAINST('new');

Practical Considerations

Limitations:
- No support for partial word matches (LIKE ‘clust%’).
- Requires manual optimization (OPTIMIZE TABLE) to reclaim deleted space.
Performance Tuning:
- Adjust buffer sizes (innodb_ft_cache_size).
- Use EXPLAIN to analyze query plans.

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.