Skip to content

feat: optimize collecting entity when match empty column in entityCollecting context#467

Merged
Cythia828 merged 1 commit into
feat/emptyColumnfrom
feat/emptyColumn_optimize
May 20, 2026
Merged

feat: optimize collecting entity when match empty column in entityCollecting context#467
Cythia828 merged 1 commit into
feat/emptyColumnfrom
feat/emptyColumn_optimize

Conversation

@Cythia828
Copy link
Copy Markdown
Collaborator

@Cythia828 Cythia828 commented Mar 24, 2026

解决问题

#457 的基础上继续解决,目前已支持各大方言派生表、子查询、查询结果的字段实体收集,在语法不完整时保证了语法树完整性。

Preview地址

https://cythia828.github.io/monaco-sql-languages/

修改内容

该 commit 对 7 个方言(MySQL、PGSQL、Flink、Spark、Hive、Impala、Trino)的实体收集功能做了统一增强,核心目的是:在语法不完整时,通过语义谓词 shouldMatchEmpty() 匹配空列,保证语法树完整性,使实体收集能正常工作。

修改逻辑

  • Grammr层:每个方言的 select 列规则中新增空列匹配替代,MySQL 额外做了精简:删除 dottedIdAllowEmpty(与 dottedId 等价),将 columnName 中的 dottedIdAllowEmpty 改为 dottedId。
  • 语义谓词层:重构 shouldMatchEmpty() 方法,增加三重判断:
  1. 模式判断:仅在 entityCollecting=true 或 caretTokenIndex >= 0(补全模式)时返回 true;validate 模式下返回 false,让 ANTLR 自然报错。
  2. 位置判断:当指定 caretTokenIndex 时,检查光标是否位于当前 token 前后之间,只在光标位置匹配空列。
  3. 兜底逻辑:无 caret 位置时匹配所有空列。
  • EntityCollector 层
  1. MySQL:新增 exitSelectElement_dot_empty 处理 uid DOT 模式;exitSelectLiteralColumnName 增加空列范围判断;更新 endContextList 指向 labeled 替代(SelectElement_labelContext / SelectElement_exprContext)。
  2. PGSQL:新增 exitTarget_empty(空操作)和 exitTarget_dot_empty(收集列实体)。
  3. 通用层:entityCollector.ts 修改过滤逻辑,保留空 QUERY_RESULT 实体(text === ''),确保 select from t1 能收集到查询结果实体。
  • 测试层
    补充/更新各方言的 suggestionWithEntity.test.ts 用例,覆盖空列补全、带逗号空列补全、isContainCaret 语义、uid DOT 模式等场景。

为什么这样修改

问题 原因 解决方式

  1. 语法不完整时语法树断裂:ANTLR 遇到缺失列时直接报错,无法继续构建 AST。所以在 select 列规则中增加空列替代,配合语义谓词控制匹配时机
  2. validate 误报/不漏报:空列匹配会影响非实体收集的预测。所以shouldMatchEmpty() 只在 entityCollecting 或 caretTokenIndex 有效时返回 true,validate 模式不匹配
  3. 补全位置不准:全局匹配空列会在不该补全的位置触发。所以增加 caretTokenIndex 位置判断,只在光标附近匹配
  4. 空列收集不到 QUERY_RESULT:空列匹配后 text === '' 被过滤。所以entityCollector 保留空文本的 QUERY_RESULT 实体

目前剩余问题

PGSQL在某些语法不完整的情况下不报错:select from tb1
image

PGSQL 不报错的原因分析

根因:sqlExpression 中的 targetList? 被设计为可选(?),目的是兼容以下合法 PG 语法:

SELECT;                                    -- 合法,返回空结果集
SELECT INTO new_table;                     -- 合法,创建无数据表
SELECT DISTINCT ON (...) INTO TEMP TABLE t FROM ... -- 合法,SQL标准 SELECT INTO

如果将 targetList? 改为 targetList 必选,以上三种合法语法都会报错。
完整链路:
simpleSelect: KW_SELECT (KW_ALL? intoClause? | distinctClause?) sqlExpression
sqlExpression: targetList? intoClause? fromClause? ...
对于 select from t1

  1. KW_SELECT → select
  2. KW_ALL? intoClause? → 都不匹配(into 后面不是 INTO 关键字)
  3. sqlExpression → targetList? 匹配空 + fromClause? 匹配 FROM t1 → 成功,无报错
    要彻底修复需要重构 simpleSelect,将 SELECT INTO 和 SELECT 分拆为独立分支,但这涉及大量语法改动,风险较高。当前保持原状,所有测试通过。
image

@Cythia828 Cythia828 marked this pull request as draft March 24, 2026 07:59
@Cythia828 Cythia828 marked this pull request as ready for review May 9, 2026 08:25
@Cythia828 Cythia828 requested review from liuxy0551 and mumiao May 11, 2026 01:41
@Cythia828 Cythia828 changed the title feat: optimize collecting entity when match empty column in entityCollecting context feat: optimize collecting entity when match empty column in entityCollecting context[05.15] May 11, 2026
@Cythia828 Cythia828 changed the title feat: optimize collecting entity when match empty column in entityCollecting context[05.15] feat: optimize collecting entity when match empty column in entityCollecting context May 19, 2026
@Cythia828 Cythia828 added the 5.29 label May 19, 2026
Comment thread src/parser/postgresql/postgreEntityCollector.ts Outdated
@Cythia828 Cythia828 force-pushed the feat/emptyColumn_optimize branch from 8aa5369 to 55616b2 Compare May 19, 2026 05:57
@Cythia828 Cythia828 force-pushed the feat/emptyColumn_optimize branch from 55616b2 to c8529f1 Compare May 19, 2026 06:04
@Cythia828 Cythia828 merged commit d33b9a6 into feat/emptyColumn May 20, 2026
Cythia828 added a commit that referenced this pull request May 20, 2026
* chore(release): 4.3.0

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset

* chore(release): 4.3.1

* fix(postgresql): #432 remove error rule

* test: #432 validate unComplete sql

* fix: #432 remove error rule

* feat: mark as entityCollecting in getAllEntities context to allow empty column

* chore: update jest.config.js to hide console.log

* fix(flink): #442 fix flink's insert values() can't support function problem

* feat: remove noReserved keywords in completions

* test: add filter keywords test case

* test: #438 sync suggestion no duplicate syntaxContextType

* fix: #438 syntaxContextType not duplicate

* chore(release): 4.4.0-beta.0

* chore(release): 4.4.0

* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* feat: match empty column when in entityCollecting context

* feat: optimize collecting entity when match empty column in entityCollecting context (#467)

Co-authored-by: zhaoge <>

---------

Co-authored-by: mumiao <1270865802zl@gmail.com>
Co-authored-by: 琉易 <liuxy0551@qq.com>
Co-authored-by: zhaoge <>
Co-authored-by: XCynthia <942884029@qq.com>
Cythia828 added a commit that referenced this pull request May 20, 2026
* chore(release): 4.3.0

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset

* chore(release): 4.3.1

* fix(postgresql): #432 remove error rule

* test: #432 validate unComplete sql

* fix: #432 remove error rule

* feat: mark as entityCollecting in getAllEntities context to allow empty column

* chore: update jest.config.js to hide console.log

* fix(flink): #442 fix flink's insert values() can't support function problem

* feat: remove noReserved keywords in completions

* test: add filter keywords test case

* test: #438 sync suggestion no duplicate syntaxContextType

* fix: #438 syntaxContextType not duplicate

* chore(release): 4.4.0-beta.0

* chore(release): 4.4.0

* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* feat: match empty column when in entityCollecting context

* feat: optimize collecting entity when match empty column in entityCollecting context (#467)

Co-authored-by: Cythia828 <942884029@qq.com>

---------



Co-authored-by: Cythia828 <942884029@qq.com>

Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com>
Co-authored-by: mumiao <1270865802zl@gmail.com>
Co-authored-by: 琉易 <liuxy0551@qq.com>
Co-authored-by: zhaoge <942884029@qq.com>
Cythia828 added a commit that referenced this pull request May 20, 2026
* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* chore: remove duplicate changelog in v4.4.1

* chore(release): 4.5.0-beta.0

* Next merge main (#468)

* fix(flink): #455 fix json functions' params problem in flink

* fix(flink): some grammar rules (#465)

* fix: #464 order by + expression

* fix: #464 EXTRACT function

* test: #464 flink JSON_VALUE RETURNING

* chore(release): 4.4.2

---------

Co-authored-by: zhaoge <>
Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com>

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input (#470)

* test(parser): #283 add multi-statement error validation tests for all dialects

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input

* feat: add generic SQL language support (#469)

* fix(generic): fix INTERSECT/EXCEPT support, trim keywords to ~90

- Add INTERSECT and EXCEPT to queryNoWith rule for set operations
- Remove 173 unused KW_* lexer rules for removed features (views, indexes,
  grants, transactions, stored procedures, window functions, triggers, etc.)
- Trim nonReserved list to only keywords actually used in parser rules
- Remove unused UNICODE_STRING and DIGIT_IDENTIFIER lexer rules
- Keyword count reduced from 263 to 90 (close to ~100 target)
- All 197 test suites pass (5627 tests)

* fix(generic): reserve core structural keywords and add DIGIT_IDENTIFIER

- Remove core structural keywords from nonReserved so they cannot be
  used as identifiers: SELECT, FROM, WHERE, CREATE, TABLE, INSERT,
  UPDATE, DELETE, DROP, ALTER, SET, JOIN, GROUP, HAVING, ORDER, ON,
  UNION, INTERSECT, EXCEPT, INTO, NOT, AND, OR, IN, BETWEEN, LIKE,
  IS, EXISTS, CASE, WHEN, THEN, ELSE, END, CAST, AS, DISTINCT,
  PRIMARY, CONSTRAINT, REFERENCES, COLUMN, UNIQUE, CHECK, FOREIGN,
  RENAME, RECURSIVE, WITH, NULL, ESCAPE, NULLIF
- Add DIGIT_IDENTIFIER lexer token for identifiers starting with a
  digit (e.g. 123abc, 1st_column)
- Include DIGIT_IDENTIFIER in identifier rule alternatives

* fix(generic): add missing Listener/Visitor exports and diagnostics option

- Add GenericSqlListener and GenericSqlVisitor exports to src/index.ts
- Add GenericSQLOptions interface with configurable diagnostics flag
- Override validate() to return empty array when diagnostics disabled
- Export GenericSQLOptions type from src/index.ts

* fix(generic): add QUERY_RESULT and SELECT column entity collection

- Add exitQuerySpecification for QUERY_RESULT entity tracking
- Add exitSelectItem for column entity collection in SELECT clauses
- Track wildcard columns (ColumnDeclareType.ALL) for * and table.*
- Track expression columns with alias support (ColumnDeclareType.EXPRESSION)
- Stage previously untracked files (errorListener, splitListener, semanticContextCollector)

* test: add GenericSQL tests

* test: ensure all dialect tests pass with GenericSQL

* test(generic): add more sql test

- Add comprehensive syntax tests for all supported statement types
- Add context collect tests for entity and semantic collectors
- Add suggestion tests for token, syntax, and multi-statement scenarios
- Add error strategy, listener, visitor, and validation tests
- Fix entity collector to distinguish simple columns from expressions

* feat: match empty column when in entityCollecting context (#457) (#472)

* chore(release): 4.3.0

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset

* chore(release): 4.3.1

* fix(postgresql): #432 remove error rule

* test: #432 validate unComplete sql

* fix: #432 remove error rule

* feat: mark as entityCollecting in getAllEntities context to allow empty column

* chore: update jest.config.js to hide console.log

* fix(flink): #442 fix flink's insert values() can't support function problem

* feat: remove noReserved keywords in completions

* test: add filter keywords test case

* test: #438 sync suggestion no duplicate syntaxContextType

* fix: #438 syntaxContextType not duplicate

* chore(release): 4.4.0-beta.0

* chore(release): 4.4.0

* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* feat: match empty column when in entityCollecting context

* feat: optimize collecting entity when match empty column in entityCollecting context (#467)

Co-authored-by: Cythia828 <942884029@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants