Skip to content

feat: add generic SQL language support#469

Merged
Cythia828 merged 7 commits into
DTStack:nextfrom
liuxy0551:feat_generic_sql
May 20, 2026
Merged

feat: add generic SQL language support#469
Cythia828 merged 7 commits into
DTStack:nextfrom
liuxy0551:feat_generic_sql

Conversation

@liuxy0551
Copy link
Copy Markdown
Collaborator

概述

基于 Trino 语法裁剪,新增 GenericSQL 通用 SQL 方言,只保留核心 DML/DDL 语法。

支持的语法

  • DML: SELECT, INSERT, UPDATE, DELETE
  • DDL: CREATE TABLE, ALTER TABLE, DROP TABLE
  • 查询特性: JOIN, 子查询, CTE, UNION/INTERSECT/EXCEPT

移除的语法

视图、索引、存储过程、窗口函数、权限管理、事务控制等厂商扩展语法。

主要变更

  • 新增 src/grammar/generic/GenericSql.g4 语法文件
  • 新增 src/parser/generic/ 解析器实现
  • 新增 test/parser/generic/ 测试用例
  • 更新 README 支持列表
  • 重构各方言 entityCollector,统一接口

相关地址

预览地址:https://liuxy0551.github.io/monaco-sql-languages
monaco-sql-languages PR:DTStack/monaco-sql-languages#217

测试

  • 所有现有方言测试通过
  • GenericSQL 独立测试通过

liuxy0551 added 6 commits May 6, 2026 18:07
- Add INTERSECT and EXCEPT to queryNoWith rule for set operations
- Remove 173 unused KW_* lexer rules for removed features (views, indexes,
  grants, transactions, stored procedures, window functions, triggers, etc.)
- Trim nonReserved list to only keywords actually used in parser rules
- Remove unused UNICODE_STRING and DIGIT_IDENTIFIER lexer rules
- Keyword count reduced from 263 to 90 (close to ~100 target)
- All 197 test suites pass (5627 tests)
- Remove core structural keywords from nonReserved so they cannot be
  used as identifiers: SELECT, FROM, WHERE, CREATE, TABLE, INSERT,
  UPDATE, DELETE, DROP, ALTER, SET, JOIN, GROUP, HAVING, ORDER, ON,
  UNION, INTERSECT, EXCEPT, INTO, NOT, AND, OR, IN, BETWEEN, LIKE,
  IS, EXISTS, CASE, WHEN, THEN, ELSE, END, CAST, AS, DISTINCT,
  PRIMARY, CONSTRAINT, REFERENCES, COLUMN, UNIQUE, CHECK, FOREIGN,
  RENAME, RECURSIVE, WITH, NULL, ESCAPE, NULLIF
- Add DIGIT_IDENTIFIER lexer token for identifiers starting with a
  digit (e.g. 123abc, 1st_column)
- Include DIGIT_IDENTIFIER in identifier rule alternatives
…tion

- Add GenericSqlListener and GenericSqlVisitor exports to src/index.ts
- Add GenericSQLOptions interface with configurable diagnostics flag
- Override validate() to return empty array when diagnostics disabled
- Export GenericSQLOptions type from src/index.ts
- Add exitQuerySpecification for QUERY_RESULT entity tracking
- Add exitSelectItem for column entity collection in SELECT clauses
- Track wildcard columns (ColumnDeclareType.ALL) for * and table.*
- Track expression columns with alias support (ColumnDeclareType.EXPRESSION)
- Stage previously untracked files (errorListener, splitListener, semanticContextCollector)
@Cythia828
Copy link
Copy Markdown
Collaborator

看一下是否可以选择doris,确认下差别大小

@liuxy0551
Copy link
Copy Markdown
Collaborator Author

liuxy0551 commented May 19, 2026

看一下是否可以选择doris,确认下差别大小

基于 Trino 会更靠近 ANSI SQL 标准,Doris 更贴近 MySQL 生态。

后续如果客户要求或者客户使用频率高,为了加强客户使用体验,可以考虑单独添加 Doris grammar,有官方的可以参考:https://github.com/apache/doris/tree/master/fe/fe-core/src/main/antlr4/org/apache/doris/nereids

@Cythia828
Copy link
Copy Markdown
Collaborator

目前是支持提示的,相关的单测补一下,验证一下提示实体的能力是否具有

@liuxy0551 liuxy0551 force-pushed the feat_generic_sql branch 2 times, most recently from 96c952f to 0c13905 Compare May 19, 2026 14:12
- Add comprehensive syntax tests for all supported statement types
- Add context collect tests for entity and semantic collectors
- Add suggestion tests for token, syntax, and multi-statement scenarios
- Add error strategy, listener, visitor, and validation tests
- Fix entity collector to distinguish simple columns from expressions
@Cythia828 Cythia828 merged commit 107ba2d into DTStack:next May 20, 2026
6 checks passed
Cythia828 added a commit that referenced this pull request May 20, 2026
* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* chore: remove duplicate changelog in v4.4.1

* chore(release): 4.5.0-beta.0

* Next merge main (#468)

* fix(flink): #455 fix json functions' params problem in flink

* fix(flink): some grammar rules (#465)

* fix: #464 order by + expression

* fix: #464 EXTRACT function

* test: #464 flink JSON_VALUE RETURNING

* chore(release): 4.4.2

---------

Co-authored-by: zhaoge <>
Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com>

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input (#470)

* test(parser): #283 add multi-statement error validation tests for all dialects

* fix(parser): #283 collect errors from all erroneous statements in multi-statement input

* feat: add generic SQL language support (#469)

* fix(generic): fix INTERSECT/EXCEPT support, trim keywords to ~90

- Add INTERSECT and EXCEPT to queryNoWith rule for set operations
- Remove 173 unused KW_* lexer rules for removed features (views, indexes,
  grants, transactions, stored procedures, window functions, triggers, etc.)
- Trim nonReserved list to only keywords actually used in parser rules
- Remove unused UNICODE_STRING and DIGIT_IDENTIFIER lexer rules
- Keyword count reduced from 263 to 90 (close to ~100 target)
- All 197 test suites pass (5627 tests)

* fix(generic): reserve core structural keywords and add DIGIT_IDENTIFIER

- Remove core structural keywords from nonReserved so they cannot be
  used as identifiers: SELECT, FROM, WHERE, CREATE, TABLE, INSERT,
  UPDATE, DELETE, DROP, ALTER, SET, JOIN, GROUP, HAVING, ORDER, ON,
  UNION, INTERSECT, EXCEPT, INTO, NOT, AND, OR, IN, BETWEEN, LIKE,
  IS, EXISTS, CASE, WHEN, THEN, ELSE, END, CAST, AS, DISTINCT,
  PRIMARY, CONSTRAINT, REFERENCES, COLUMN, UNIQUE, CHECK, FOREIGN,
  RENAME, RECURSIVE, WITH, NULL, ESCAPE, NULLIF
- Add DIGIT_IDENTIFIER lexer token for identifiers starting with a
  digit (e.g. 123abc, 1st_column)
- Include DIGIT_IDENTIFIER in identifier rule alternatives

* fix(generic): add missing Listener/Visitor exports and diagnostics option

- Add GenericSqlListener and GenericSqlVisitor exports to src/index.ts
- Add GenericSQLOptions interface with configurable diagnostics flag
- Override validate() to return empty array when diagnostics disabled
- Export GenericSQLOptions type from src/index.ts

* fix(generic): add QUERY_RESULT and SELECT column entity collection

- Add exitQuerySpecification for QUERY_RESULT entity tracking
- Add exitSelectItem for column entity collection in SELECT clauses
- Track wildcard columns (ColumnDeclareType.ALL) for * and table.*
- Track expression columns with alias support (ColumnDeclareType.EXPRESSION)
- Stage previously untracked files (errorListener, splitListener, semanticContextCollector)

* test: add GenericSQL tests

* test: ensure all dialect tests pass with GenericSQL

* test(generic): add more sql test

- Add comprehensive syntax tests for all supported statement types
- Add context collect tests for entity and semantic collectors
- Add suggestion tests for token, syntax, and multi-statement scenarios
- Add error strategy, listener, visitor, and validation tests
- Fix entity collector to distinguish simple columns from expressions

* feat: match empty column when in entityCollecting context (#457) (#472)

* chore(release): 4.3.0

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426)

* test: #424 syntax after comments

* fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset

* chore(release): 4.3.1

* fix(postgresql): #432 remove error rule

* test: #432 validate unComplete sql

* fix: #432 remove error rule

* feat: mark as entityCollecting in getAllEntities context to allow empty column

* chore: update jest.config.js to hide console.log

* fix(flink): #442 fix flink's insert values() can't support function problem

* feat: remove noReserved keywords in completions

* test: add filter keywords test case

* test: #438 sync suggestion no duplicate syntaxContextType

* fix: #438 syntaxContextType not duplicate

* chore(release): 4.4.0-beta.0

* chore(release): 4.4.0

* feat: support query result and derived table entity collecting (#434)

* feat: support queryResult and derived table entities collecting

* feat: support query result and derived table entity collecting

* test: enhance hive and spark entity collect test case

* fix: remove _ctx and add tokenIndex into position

* fix: rename declareType COMMON to LITERAL

* fix: optimize entity collector and update  grammar

* test: add derived table and query result entities test case

* fix: remove isCaretInDerivedTableStmt and set default isAccessible to null

* fix: update _caretStmt docs

* test: add isAccessible test case

* fix: skip _caretStmt ts check

* docs: update README to include additional entity information

* test: fix create view test case

* fix:  import from error sql module

* test: update entity collection tests

* fix: remove unused type

* feat: match empty column when in entityCollecting context

* feat: optimize collecting entity when match empty column in entityCollecting context (#467)

Co-authored-by: Cythia828 <942884029@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants