Skip to content

feat: match empty column when in entityCollecting context#457

Merged
Cythia828 merged 17 commits into
feat/emptyColumnPreparefrom
feat/emptyColumn
May 20, 2026
Merged

feat: match empty column when in entityCollecting context#457
Cythia828 merged 17 commits into
feat/emptyColumnPreparefrom
feat/emptyColumn

Conversation

@JackWang032
Copy link
Copy Markdown
Collaborator

@JackWang032 JackWang032 commented Dec 17, 2025

当输入的SQL不完整时,特别是没有输入字段或者输入了表名table1.<empty>时,会导致语法解析错误恢复失败,最终的语法解析树无法作为实体收集的参考模板, 实体收集失败。

如在MySQL中输入SELECT t1. FROM t1,没法收集到任何实体,那么就没法去提供对应的补全
image

我们已在 grammar 中对 columnName 添加了{this.shouldMatchEmpty()}? 语义谓词分支的支持,使在实体收集时能够匹配空字段,但这仅能覆盖一小部分情况,当where、order by、join on等场景时(会匹配 expression )无效,并且输入点时也无法命中。

语义谓词:语义谓词是 ANTLR4 中将语法规则与自定义代码逻辑结合的核心机制,用于在语法解析过程中动态控制规则的匹配行为

一般来说 grammar 中将字段拆分成了 columnNamecolumnNamePath 两个规则, columnName在select item中匹配,columnNamePath 在表达式中匹配。

目前有两种方法利用语义谓词解决该问题:

方案一

columnNamePath也添加空字段的语义谓词分支,但该方式会导致比较多的语法校验单测失败(语义谓词分支会影响预测,即使该语义谓词分支没有匹配上),影响范围也没法确定。目前PGSQL尝试使用了该方式。

方案二

最小化改动原则(当前尝试的方式),在具体规则(where、 order by、 join on等)后添加语义谓词,基本不会导致现有的单测报错。

如果采用方案一,其表现效果很好,但需要深入分析下语义谓词对antlr4 预测、错误恢复等阶段的影响,避免其影响到非实体收集上下文时的功能。
如果采用方案二,需要处理很多不同 expression下的场景,如需要处理 join ... on t1.id = t2.id中 比较操作符=可能会嵌套递归的情况

使用语义谓词后语法解析树基本能保证完整
image

Preview地址 https://jackwang032.github.io/monaco-sql-languages/

mumiao and others added 15 commits November 26, 2025 15:28
…Offset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset
* test: #432 validate unComplete sql

* fix: #432 remove error rule
* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type
@JackWang032 JackWang032 marked this pull request as draft December 17, 2025 11:32
@JackWang032
Copy link
Copy Markdown
Collaborator Author

额外需要注意:非保留关键字可能会被识别为别名,需要做些特殊处理手段
image

@Cythia828 Cythia828 marked this pull request as ready for review March 24, 2026 03:12
@Cythia828 Cythia828 marked this pull request as draft March 24, 2026 03:13
@Cythia828 Cythia828 changed the base branch from next to feat/emptyColumnPrepare March 24, 2026 03:13
@Cythia828 Cythia828 marked this pull request as ready for review March 24, 2026 03:56
@Cythia828 Cythia828 marked this pull request as draft March 24, 2026 03:58
@Cythia828 Cythia828 marked this pull request as ready for review March 24, 2026 03:58
Cythia828 pushed a commit to Cythia828/dt-sql-parser that referenced this pull request Mar 24, 2026
- Add shouldMatchEmpty() method to SQLParserBase
- Add emptyColumn rule to PostgreSQL grammar
- Add exitTarget_empty method to entity collector
- Update grammar files to remove semantic predicates
- Update tests to expect empty column entities
- Regenerate all parser files

Fixes: DTStack#457
Cythia828 pushed a commit to Cythia828/dt-sql-parser that referenced this pull request Mar 24, 2026


- Update entityCollector.ts to keep empty column entities
- Add exitTarget_empty method to postgreEntityCollector.ts
- Update Hive and MySQL tests to expect empty column entities

Restores modifications lost during rebase.
@Cythia828 Cythia828 added the 5.29 label May 19, 2026
@Cythia828 Cythia828 merged commit 7f392ac into feat/emptyColumnPrepare May 20, 2026
Cythia828 added a commit that referenced this pull request May 20, 2026
* chore(release): 4.3.0

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset

* chore(release): 4.3.1

* fix(postgresql): #432 remove error rule

* test: #432 validate unComplete sql

* fix: #432 remove error rule

* feat: mark as entityCollecting in getAllEntities context to allow empty column

* chore: update jest.config.js to hide console.log

* fix(flink): #442 fix flink's insert values() can't support function problem

* feat: remove noReserved keywords in completions

* test: add filter keywords test case

* test: #438 sync suggestion no duplicate syntaxContextType

* fix: #438 syntaxContextType not duplicate

* chore(release): 4.4.0-beta.0

* chore(release): 4.4.0

* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* feat: match empty column when in entityCollecting context

* feat: optimize collecting entity when match empty column in entityCollecting context (#467)

Co-authored-by: Cythia828 <942884029@qq.com>

---------



Co-authored-by: Cythia828 <942884029@qq.com>

Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com>
Co-authored-by: mumiao <1270865802zl@gmail.com>
Co-authored-by: 琉易 <liuxy0551@qq.com>
Co-authored-by: zhaoge <942884029@qq.com>
Cythia828 added a commit that referenced this pull request May 20, 2026
* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* chore: remove duplicate changelog in v4.4.1

* chore(release): 4.5.0-beta.0

* Next merge main (#468)

* fix(flink): #455 fix json functions' params problem in flink

* fix(flink): some grammar rules (#465)

* fix: #464 order by + expression

* fix: #464 EXTRACT function

* test: #464 flink JSON_VALUE RETURNING

* chore(release): 4.4.2

---------

Co-authored-by: zhaoge <>
Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com>

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input (#470)

* test(parser): #283 add multi-statement error validation tests for all dialects

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input

* feat: add generic SQL language support (#469)

* fix(generic): fix INTERSECT/EXCEPT support, trim keywords to ~90

- Add INTERSECT and EXCEPT to queryNoWith rule for set operations
- Remove 173 unused KW_* lexer rules for removed features (views, indexes,
  grants, transactions, stored procedures, window functions, triggers, etc.)
- Trim nonReserved list to only keywords actually used in parser rules
- Remove unused UNICODE_STRING and DIGIT_IDENTIFIER lexer rules
- Keyword count reduced from 263 to 90 (close to ~100 target)
- All 197 test suites pass (5627 tests)

* fix(generic): reserve core structural keywords and add DIGIT_IDENTIFIER

- Remove core structural keywords from nonReserved so they cannot be
  used as identifiers: SELECT, FROM, WHERE, CREATE, TABLE, INSERT,
  UPDATE, DELETE, DROP, ALTER, SET, JOIN, GROUP, HAVING, ORDER, ON,
  UNION, INTERSECT, EXCEPT, INTO, NOT, AND, OR, IN, BETWEEN, LIKE,
  IS, EXISTS, CASE, WHEN, THEN, ELSE, END, CAST, AS, DISTINCT,
  PRIMARY, CONSTRAINT, REFERENCES, COLUMN, UNIQUE, CHECK, FOREIGN,
  RENAME, RECURSIVE, WITH, NULL, ESCAPE, NULLIF
- Add DIGIT_IDENTIFIER lexer token for identifiers starting with a
  digit (e.g. 123abc, 1st_column)
- Include DIGIT_IDENTIFIER in identifier rule alternatives

* fix(generic): add missing Listener/Visitor exports and diagnostics option

- Add GenericSqlListener and GenericSqlVisitor exports to src/index.ts
- Add GenericSQLOptions interface with configurable diagnostics flag
- Override validate() to return empty array when diagnostics disabled
- Export GenericSQLOptions type from src/index.ts

* fix(generic): add QUERY_RESULT and SELECT column entity collection

- Add exitQuerySpecification for QUERY_RESULT entity tracking
- Add exitSelectItem for column entity collection in SELECT clauses
- Track wildcard columns (ColumnDeclareType.ALL) for * and table.*
- Track expression columns with alias support (ColumnDeclareType.EXPRESSION)
- Stage previously untracked files (errorListener, splitListener, semanticContextCollector)

* test: add GenericSQL tests

* test: ensure all dialect tests pass with GenericSQL

* test(generic): add more sql test

- Add comprehensive syntax tests for all supported statement types
- Add context collect tests for entity and semantic collectors
- Add suggestion tests for token, syntax, and multi-statement scenarios
- Add error strategy, listener, visitor, and validation tests
- Fix entity collector to distinguish simple columns from expressions

* feat: match empty column when in entityCollecting context (#457) (#472)

* chore(release): 4.3.0

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset

* chore(release): 4.3.1

* fix(postgresql): #432 remove error rule

* test: #432 validate unComplete sql

* fix: #432 remove error rule

* feat: mark as entityCollecting in getAllEntities context to allow empty column

* chore: update jest.config.js to hide console.log

* fix(flink): #442 fix flink's insert values() can't support function problem

* feat: remove noReserved keywords in completions

* test: add filter keywords test case

* test: #438 sync suggestion no duplicate syntaxContextType

* fix: #438 syntaxContextType not duplicate

* chore(release): 4.4.0-beta.0

* chore(release): 4.4.0

* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* feat: match empty column when in entityCollecting context

* feat: optimize collecting entity when match empty column in entityCollecting context (#467)

Co-authored-by: Cythia828 <942884029@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants