doc: create ai-guidelines and include to CONTRIBUTING#62105
doc: create ai-guidelines and include to CONTRIBUTING#62105RafaelGSS wants to merge 6 commits intonodejs:mainfrom
Conversation
Co-Authored-By: Beth Griggs <bethanyngriggs@gmail.com>
|
Review requested:
|
|
There may be some ideas we can borrow from https://llvm.org/docs/AIToolPolicy.html - for example "good first issue" should not be picked up by AI is a good one. |
Co-authored-by: Aditi <62544124+Aditi-1400@users.noreply.github.com> Co-authored-by: Joyee Cheung <joyeec9h3@gmail.com>
I took inspiration from https://github.com/zulip/zulip/blob/main/CONTRIBUTING.md#ai-use-policy-and-guidelines |
doc/contributing/ai-guidelines.md
Outdated
| * **Verify accuracy** of any LLM-generated content before including it in a | ||
| PR description or comment. | ||
| * **Complete pull request templates fully** rather than replacing them with | ||
| LLM-generated summaries. |
There was a problem hiding this comment.
Do we have a template? I thought those are for issues, not PRs.
There was a problem hiding this comment.
Not strictly a template: https://github.com/nodejs/node/blob/main/.github/PULL_REQUEST_TEMPLATE.md?plain=1
There was a problem hiding this comment.
It's not possible to fulfil the instructions "Complete pull request templates fully" based on the contents of https://github.com/nodejs/node/blob/main/.github/PULL_REQUEST_TEMPLATE.md?plain=1 so it looks like this sentence needs to be removed.
Co-authored-by: Tobias Nießen <tniessen@tnie.de> Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com> Co-authored-by: Mike McCready <66998419+MikeMcC399@users.noreply.github.com> Co-authored-by: Efe <dogukankrskl@gmail.com>
There was a problem hiding this comment.
I'd be against this contribution policy update. While many different opinions exist on the licensing terms of the code produced by LLMs, my opinion is that the generated code isn't explicitly licensed and attributed to the original authors so it cannot be considered open source regardless of the used prompt.
doc/contributing/ai-guidelines.md
Outdated
| * **Verify accuracy** of any LLM-generated content before including it in a | ||
| PR description or comment. | ||
| * **Complete pull request templates fully** rather than replacing them with | ||
| LLM-generated summaries. |
There was a problem hiding this comment.
| * **Verify accuracy** of any LLM-generated content before including it in a | |
| PR description or comment. | |
| * **Complete pull request templates fully** rather than replacing them with | |
| LLM-generated summaries. | |
| * **Verify accuracy** of any LLM-generated content before including it in a | |
| PR description or comment. |
|
In #61478 (comment) , regarding the usage of Claude Code, @mcollina suggested:
I added it to the TSC agenda tomorrow for awareness/context collection before moving to a proper vote. @indutny sorry about the short notice since this is just one day ahead of the meeting, but if you'd like to join the meeting to present your points please let us know. |
|
I've updated the agenda tomorrow to mark it FYI (as in, it doesn't need to be discussed this week, but worth taking a look and do some homework before discussions happen). In any case, decisions won't be made in meetings, and this likely needs a proper vote. |
doc/contributing/ai-guidelines.md
Outdated
| * **Own every line you submit.** You are responsible for all code in your | ||
| pull request, regardless of how it was generated. Be prepared to explain | ||
| any change in detail during review. |
There was a problem hiding this comment.
| * **Own every line you submit.** You are responsible for all code in your | |
| pull request, regardless of how it was generated. Be prepared to explain | |
| any change in detail during review. | |
| * **Own every line you submit.** You are responsible for all code in your | |
| pull request, regardless of how it was generated. This includes ensuring | |
| that AI-generated or AI-assisted contributions satisfy the project's | |
| [Developer's Certificate of Origin][] and licensing requirements. Be | |
| prepared to explain any change in detail during review. |
(would need a DCO link added at the bottom of the page also)
|
[from vote on AI and in context of quoted there PR] If someone uses an advanced tool, like LLM, should everyone have a higher baseline for expectations? Take a look at initial review in github.com/#61478 , looks like begging to use node's style for ... core node. This tool, LLM, is supposed to do things in style. Instead there was a re-implementation of path.normalize() ? User of an advanced tool has responsibility to provide more advanced result. Of course, in coding, LoC count is not a metric for being better. For my 25 years, usage of more advanced tooling brought better results. Should LLM's code be considered a starting point, not a ready PR? May I note, as user of node.js, an unhealthy dynamic regarding this, and a need to frame constructive approach, also as an example for whole industry. nodejs/TSC#1831 (comment) is opened with, quote: "In order to surpass the blocks of ..." as if we are here to score social points, using heavy words. Please, snap out of it. We need to re-frame topic in nuances that matter to code. Otherwise, we either get more hack-able artifacts, or, throw away baby with the bath water. And you all here seem to have enough emotional shrapnel to swing either way. Please don't. For the sake of us, your users. |
| Node.js values concise, precise communication that respects collaborator time. | ||
|
|
||
| * **Do not post messages generated entirely by AI** in pull requests, issues, or the | ||
| project's communication channels. |
There was a problem hiding this comment.
I'm less convinced this one is necessary. It's also difficult to enforce. These should follow the same rules as contributions... whatever is posted, you're responsible for, so use appropriate discretion.
There was a problem hiding this comment.
It's only difficult to enforce if it's low quality prose, though, in which case this item seems like it'd be needed.
There was a problem hiding this comment.
I think this also disenfranchises people that use an AI to help them write English, as it might not be their first language
There was a problem hiding this comment.
They shouldn't be doing that, though - they should just write in their native language and the reader can use AI and/or translation tools. I'd always much rather read broken English than a mistranslation (which are frequent).
There was a problem hiding this comment.
We can all have opinions about what others should or should not do and about what we personally prefer to see, but we should let people decide for themselves what tools they want to use and why.
CONTRIBUTING.md
Outdated
| contributor has not personally understood, tested, and verified might be closed | ||
| without review. |
There was a problem hiding this comment.
| contributor has not personally understood, tested, and verified might be closed | |
| without review. | |
| contributor has not personally understood, tested, and verified will likely be closed | |
| without review. |
doc/contributing/ai-guidelines.md
Outdated
|
|
||
| ## Using AI for code contributions | ||
|
|
||
| AI tools may assist contributors, but must not replace contributor judgment. |
There was a problem hiding this comment.
| AI tools may assist contributors, but must not replace contributor judgment. | |
| AI tools may assist contributors, but must not replace human judgment. |
doc/contributing/ai-guidelines.md
Outdated
| * **Understand the codebase first.** Do not skip familiarizing yourself with | ||
| the relevant subsystem. LLMs frequently produce inaccurate descriptions of | ||
| Node.js internals — always verify against the actual source. When using an AI | ||
| tool, ask it to cite the exact source files/PRs/docs it’s relying on, and then |
There was a problem hiding this comment.
| tool, ask it to cite the exact source files/PRs/docs it’s relying on, and then | |
| tool, ask it to cite the exact source it’s relying on, and then |
doc/contributing/ai-guidelines.md
Outdated
|
|
||
| ## Using AI for communication | ||
|
|
||
| Node.js values concise, precise communication that respects collaborator time. |
There was a problem hiding this comment.
| Node.js values concise, precise communication that respects collaborator time. | |
| Node.js values concise, precise communication that respects collaborator and contributor time. |
doc/contributing/ai-guidelines.md
Outdated
| * **Link to primary sources** — code, documentation, specifications — rather | ||
| than quoting LLM answers. |
There was a problem hiding this comment.
| * **Link to primary sources** — code, documentation, specifications — rather | |
| than quoting LLM answers. | |
| * **Link to primary sources** — code, documentation, specifications — rather | |
| than quoting LLM answers or linking to LLM chats. |
| feedback and iterate until the work lands or is explicitly closed. If you | ||
| can no longer pursue it, close the PR. Stalled PRs block progress. | ||
|
|
||
| * **Edit generated comments critically.** LLM-produced comments are often |
There was a problem hiding this comment.
I would recommend adding an extra rule to not push "linter" changes that LLM usually does, it's pretty frequent at meteor, and dirty the commits and PR history
|
Given this is receiving a lot of attention, I had AI prepare a summary of all discussions within the Linux Kernel: https://gist.github.com/mcollina/8a4f2ee2e64d38edb90760016e89f919. I don't see anything more critical than the Linux Kernel, and it's also the flagship project of our parent Foundation. Given that this proposal follows that concept and is currently blocked on a similar basis, I think documenting our position on what has already been debated elsewhere is relevant. On a more practical note, Chrome/Chromium (which V8 is part of) allows AI assistance. As a result, we are likely already including code developed with the assistance of AI in our tree. |
|
@indutny can you please send a PR to add youself back to active contributors and use the “request chages” button in both PRs? Thanks. |
|
Hi all, I wanted to weigh in as a community member and former Node.js core contributor in favor of these proposed guidelines. And, in particular, to say that it's important that Node.js not try to instate a blanket ban on AI-assisted contributions, as I've seen some calls for. A lot of large open-source projects, including ones I've worked on like Chromium, V8, jsdom, and Undici, are seeing huge benefit from AI. This is clearest when the AI is wielded by core contributors working in their element, and any policy which prevented core contributors from taking advantage of AI uplift would hurt the project a lot. But it's also important for a project that wants to stay relevant and welcome outside contributors. I know of at least one large open source project, Servo, which has lost long-time community contributors due to their anti-AI stance. When a modern, AI-using developer wants to contribute to a nascent browser engine, and they have a choice between Servo (AI-unfriendly) and Ladybird (AI-welcoming), they choose Ladybird. If Node.js were to ban AI assistance, I think it would find itself in a similar situation with regard to other competing runtimes. I fully acknowledge that AI-generated contributions have created new problems and there needs to be some reckoning with it. I like the Rust community survey's common ground section for summarizing these issues. The guidelines proposed here seem like a good framework to start with, but surely evolution will occur over time. Regardless, I'm heartened to see this PR as a starting point, for establishing clearly that while AI assistance needs to be treated with care, it's also very much a part of modern software development. Thanks! |
which long-time community contributors were those? as far as i’m aware, even the most vocally AI-positive contributors in Servo have continued doing great work on the project. |
CONTRIBUTING.md
Outdated
|
|
||
| ## [AI Use Policy and Guidelines](./doc/contributing/ai-guidelines.md) | ||
|
|
||
| Node.js expects contributors to understand and take full responsibility for |
There was a problem hiding this comment.
this first sentence seems intentionally to be the same as the full policy document, which now has a suggested change. marking this here to resolve whether or not the language should continue syncing, if it should change
| be removed or modified without human verification. Do not rely on the LLM | ||
| to assess correctness. | ||
|
|
||
| * **Do not disappear.** If you open a PR, follow it through. Respond to |
There was a problem hiding this comment.
What is / can we link to our official policy on "stale" or stalled PRs? Like regardless if its involving AI do we have a policy to close out PRs after a certain amount of time? If not, should we?
There was a problem hiding this comment.
Right now our automation closes PRs if they are both older than a year and have had no activity in the past sixth months, but IMO we should make that just six months (no year clause)
|
@RafaelGSS Can you please add a link to https://openjsf.cdn.prismic.io/openjsf/aca4d5GXnQHGZDiZ_OpenJS_AI_Coding_Assistants_Policy.pdf (official AI policy)? This document already aligns for the most part, which should mention the |
|
While I don't personally want to see a blanket ban on AI assisted PRs, isn't the real risk here that parts of Node.js end up no longer under its current MIT license? Both the US and the EU have stated that purely AI derived work is not copyrightable. The January US ruling says "prompts alone do not provide sufficient human control". The EU paper says that there are varying interpretations of human input requirements in different member states. This means AI contributions could be considered public domain. Chad Whitacre wrote a great article covering the copyright/licensing issues from the perspective of source available licenses but this applies equally to open source licensed work if you want to retain that license! |
|
@RafaelGSS should this be put on hold until the session at the collab summit (openjs-foundation/summit#484)? |
|
Not necessarily, we'll cover this in the next TSC session and see. Likely it would drag out naturally to that, but unless otherwise specified this is not necessary, given the Foundation overall AI policy. |
|
I think given the conversation around all of this so far, I would structure the guideline differently. The core principal should simply be: Node.js expects contributions to come from people. People are free to use whatever tools they want when generating contributions. However the contribution is created, contributors are expected to understand and take full responsibility for every change they propose. If any contribution was generated with the assistance of any automation or tool (AI-based or otherwise) then that should be acknowledged honestly as part of the contribution so that those reviewing the change have appropriate context going into the review. If it becomes apparent that individual contributors are relying too much on these tools, or aren't understanding or taking responsibility for the changes they are proposing, or it becomes clear that the contributor is being dishonest about the use of automated assistance when creating PRs, then that's going to influence whether or not the project considers accepting further contributions from that person. PRs should never be opened by automated tooling not specifically approved in advance by the project. While I do not believe a blanket ban on AI contributions is at all necessary or beneficial to the project, I would absolutely accept a policy that allows a contributor to be blocked from further contributions if their only contributions are AI, they show no actual understanding of the project or processes, or are caught being habitually dishonest about their use of automation. I would even accept language saying that while me may discourage Automation/AI-assisted contributions, we will not reject them solely on that basis; every PR must still be evaluated on the technical merits of the change following our existing established review processes. |
|
Applied all suggestions I could find. Please, re-review. |
|
|
||
| If AI tools assisted in generating a contribution, that should be | ||
| acknowledged honestly (e.g., via an `Assisted-by:` tag in the commit | ||
| metadata) so that reviewers have appropriate context. |
There was a problem hiding this comment.
This should be expanded a bit, see https://docs.kernel.org/process/coding-assistants.html#attribution., I would just drop the tools, so:
Assisted-by: AGENT_NAME:MODEL_VERSION
|
|
||
| If AI tools assisted in generating a contribution, that should be | ||
| acknowledged honestly (e.g., via an `Assisted-by:` tag in the commit | ||
| metadata) so that reviewers have appropriate context. |
There was a problem hiding this comment.
Personally I find some of the tool mentions in PRs these days already somewhat annoying, it's like "sent from my iPhone" advertisements everywhere and more marketing in the commit logs. Especially when issues are found and the explanation is just "this (proprietary) AI tool did it/made me believe it", as if the tool is an excuse because it's supposed to be the best. I think if people let the tools do most of the work that it requires reviewer's awareness, it's much better to include your prompt and roughly explain your process than just leaving what (likely proprietary) model and tool you use. Also the list can be very long if they use multiple tools and it's just more advertisement/marketing noise.
There was a problem hiding this comment.
Including the prompt is impractical in most cases. For example, in the PRs where I use AI assistance the "prompt" is typically a multi-session chat with many prompts. I don't consider the prompts really to be all that useful.
There was a problem hiding this comment.
I don't think it needs to be the exact same prompt, but if we are talking about usefulness, "here's the process of how I used the tools to come to this diff" is much more useful and relevant to determine the soundness of the diff than "I used this specific model and this specific tool" but without any description of how it arrives at that diff. If the process is sound, you could probably arrive at the same diff with just any tool/model or no AI at all with varying level of efficiency, but mentioning one specific tool/model just feels like marketing if we don't care about "how long it takes to arrive at this diff", only "the quality of this diff". Even pre-AI, I don't think people would say "I use IntelliJ IDEA to refactor these names in the code base" in the commit logs, they just describe the refactoring rules and motivations in the logs. If the process is not sound, there's little difference in what model/tool/human you use to arrive at that diff.
As discussed in today's TSC meeting.
cc: @nodejs/tsc @BridgeAR