Air could delete build.entrypoint while stopping the previous app process when stop_on_error was enabled. After builds were changed to keep the old app alive until a replacement build succeeds, that cleanup could remove the freshly rebuilt binary before it was started, causing issue #910.
Failed builds should not delete the last executable either. Keeping the binary in place makes stop_on_error stop the process without destroying the artifact a later rebuild or manual restart may need.
Windows needs different sequencing because running executables are locked. Stop the old process before build.cmd on Windows, while keeping the retained-app behavior on Unix platforms.
Validation: go test ./runner -run 'TestAddPlatformOverridesForInit|TestPlatformBuildOverridesSelection|TestShouldStopBinBeforeBuild|TestStopBinBeforeBuildIfNeeded|TestBuildRunKeepsIssue910GoBuildEntrypoint' -count=1 -v; go test ./runner -run 'TestBuildRunKeepsIssue910GoBuildEntrypoint|TestShouldStopBinBeforeBuild|TestBuildRunKeeps.*Binary.*StopOnError|TestBuildRunStopsExistingBinWhenBuildFailsWithStopOnError|TestBuildRunStopsExistingBinAfterSuccessfulBuild|TestRebuild$' -count=1 -v; go test ./...; make check
🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: Marius van Niekerk <mariusvniekerk@mbp-marius-kenn.emperor-gopher.ts.net>
Co-authored-by: OpenAI Codex <noreply@openai.com>
* fix(watcher): delay rewatch after atomic saves
* fix(watcher): retry watcher.Add with exponential backoff on atomic saves
A single Add attempt after a fixed 100ms delay could fail if the file
is momentarily absent (rename-away before recreate on slow or Docker
bind-mount filesystems), leaving the file permanently unwatched.
Retry up to 5 times with doubling delays (100ms → 1600ms); log only on
final failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: keep app running until rebuild succeeds
Keep the previous app process alive while a new build is compiling, then stop it only after the build succeeds and before starting the replacement. Preserve stop_on_error behavior by stopping the existing process on failed pre-build or build steps when configured.
Fixes#746
Also addresses #860.
* test(runner): align rebuild test with retained app
The PR changes rebuild behavior so the previous app keeps serving until a replacement build succeeds. The old rebuild test still expected a connection-refused gap, which made Ubuntu CI time out even though the new behavior was intentional.
Validate the new contract by keeping the old server reachable during a delayed rebuild, then waiting for the rebuilt server response before shutdown. Also make the port helper check listen errors before reading the listener.
Validation: go test ./runner -run '^TestRebuild$' -count=3 -v; go test ./...; make test-ci; ./hack/check.sh all
* fix: adjust build defaults when tmp_dir is customized
When a user sets tmp_dir to a non-default value (e.g. ".tmp"),
Build.Cmd, Build.Bin, and Build.ExcludeDir still referenced the
default "tmp" directory, causing the build to create a "tmp/" folder
alongside the intended custom directory.
Add adjustDefaultsForTmpDir() to update these fields when they still
hold their default values and TmpDir has been changed. Explicitly set
values are preserved.
Fixes#780
Signed-off-by: majiayu000 <1835304752@qq.com>
* test: add coverage for Windows branch in adjustDefaultsForTmpDir
Extract OS parameter to allow testing the Windows code path on any
platform, covering the lines flagged by Codecov.
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: support absolute tmp_dir defaults and preserve custom exclude_dir
* fix: make tmp_dir absolute path handling OS-agnostic
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: xiantang <zhujingdi1998@gmail.com>
* fix: route air's own log messages to stderr instead of stdout
Air's internal loggers (main, build, runner, watcher) were writing to
stdout, causing air's messages to intermix with the user's application
output. This made it impossible to separate them with shell redirections
like `air 2>/dev/null`.
Change the log output destination from os.Stdout/color.Output to
os.Stderr/color.Error so air's messages go to stderr while the user's
app output remains on stdout.
Also remove dead `c.Stdout` and `c.Stderr` assignments in startCmd()
on all platforms — these had no effect after StdoutPipe()/StderrPipe()
were already called.
Fixes#744
Signed-off-by: majiayu000 <1835304752@qq.com>
* test: add regression test for logger stderr routing
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: route warning messages in config.go to stderr
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: update smoke test to check stderr for air log messages
Since air's log output now goes to stderr, the smoke test must check
nohup.err instead of nohup.out for the "running" message.
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: print splash banner to stderr
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: xiantang <zhujingdi1998@gmail.com>
* Update dockerfile to golang 1.26
fixes https://github.com/air-verse/air/issues/882
Signed-off-by: Goutham Veeramachaneni <goutham@grafana.com>
* Scope PR to Dockerfile-only: keep go.mod at 1.25 for compatibility
Revert go.mod, CI workflows, READMEs, and AGENTS.md changes per
reviewer feedback — only update the Docker base image to Go 1.26.
This preserves broad `go install` compatibility for users on older
Go versions.
Signed-off-by: Goutham Veeramachaneni <goutham@grafana.com>
---------
Signed-off-by: Goutham Veeramachaneni <goutham@grafana.com>
* Bind test listeners to loopback
* Adjust Windows test expectations
* Fix Windows test setup
* Fix lint in dangerous root test
* Stabilize Windows engine tests
* Add Windows unit tests to CI
* Prevent cwd leakage in tests
* Normalize line endings in CI
* Skip unstable Windows engine tests
* Move Windows line-ending normalization to windows job
* Stabilize ctrl-c build failure test
* Skip touch-based engine tests on Windows
* fix: Remove double-kill bug while keeping PowerShell (#777)
Problems fixed:
- Removed double TASKKILL execution when SendInterrupt is enabled
- Simplified kill logic to single TASKKILL command
- Added console window hiding for TASKKILL and PowerShell
- Improved PowerShell with -NoProfile and -NonInteractive flags
Key difference from reverted PR #855:
- Kept PowerShell instead of cmd.exe to avoid sound issues (#707)
- Still fixes the double-kill bug that caused orphaned processes
- Added window hiding for cleaner UX on Windows
- Better PowerShell performance with optimized flags
The previous code would run TASKKILL twice when SendInterrupt was
enabled, causing processes to not be properly terminated and leading
to port conflicts and orphaned processes.
This fix addresses #777 without introducing #707 sound issues.
Testing on Linux shows clean process transitions with no port conflicts.
Fixes#777
Avoids #707
* fix: clarify Windows send_interrupt handling
Document that Windows ignores send_interrupt and use CREATE_NO_WINDOW for clearer process spawning.
* Add env file config and warn on dangerous roots
* Add default env file list
---------
Co-authored-by: xiantang <zhujingdi1998@gmail.com>
* implement .env parsing and expanding
* implement .env parsing and expanding
* reload env on reload/rebuild as well
* default to empty env file. remove cyclical dependency stuff
* default to empty env file. remove cyclical dependency stuff
* use godotenv to parse, dont override global vars
* implement multiple env file support
* update readme and example config
* update readme some more
* fix typo
* dce in util.go
* Avoid loading env files on cancelled runs
* Clean up env load linting
---------
Co-authored-by: sirkostya009 <kosta.tovstik@gmail.com>
* Use SharedWorker based SSE connection sharing in supported browsers
* minor
* Improve SSE cleanup and parsing
---------
Co-authored-by: xiantang <zhujingdi1998@gmail.com>
* fix: Remove double-kill bug in Windows process termination (#777)
Problems fixed:
- Removed double TASKKILL execution when SendInterrupt is enabled
- Simplified kill logic to match Linux behavior (single kill command)
- Added console window hiding for cleaner UX (no flashing windows)
- Switched from PowerShell to cmd.exe for better performance
- Improved logging to match Linux output format
- Better error handling for already-terminated processes
The previous code would run TASKKILL twice when SendInterrupt was
enabled, or handle it inconsistently when disabled. This caused
processes to not be properly terminated, leading to port conflicts
and orphaned processes as reported in #777.
Testing on Linux shows clean process transitions with no port
conflicts. Windows users requested to test and verify the fix.
Fixes#777
* refactor: Address review feedback - match Linux logging style
- Use mainDebug instead of runnerLog to match Linux implementation
- Remove verbose logging as requested by maintainer
- Clarify why send_interrupt is not supported on Windows
- Simplify error handling to match Linux style
- Keep core fix: single TASKKILL, console hiding, proper Wait()
Addresses feedback from @xiantang in PR review.
Add safety check to refuse running in home directory, system root (/), or /root to prevent excessive file watching that could impact performance or system stability.
- Add isDangerousRoot() utility function to detect dangerous paths
- Integrate check in config.preprocess() with clear error message
- Add comprehensive tests for all dangerous path scenarios
* abort kill delay if process closes after interrupt
if the process to be killed honours the SIGINT and closes down,
we should not sleep before trying to kill a process that is
no longer there
* make sure to wait for both wait and kill results when kill delay is 0
* use cmd.process.signal and cmd.process.kill instead of raw syscalls
* switch back to syscall as os.process.signal or kill do not signal the entire group
* remove unstable and slightly dangerous kill test
the test paniced if it ever got a nil err since t.Failf does not
imply that test is stopped
selecting a random pid to kill might have interesting side effects
when run and its result can never be guaranteed to be the same, meaning
that the test had intermittent failures
no code depends on the specific return code of killCmd, so there
does not seem to be a reason to have a unit test for it
* add clarifying comments to the goroutine response collection code
* test: add regression tests for issue #671 (send_interrupt early exit optimization)
Add three comprehensive regression tests to verify the fix in commit 4d26204:
1. Test_killCmd_SendInterrupt_FastGracefulExit
- Verifies processes that exit quickly on SIGINT return immediately
- Saves ~2s when process exits in <1ms vs 2s kill_delay
2. Test_killCmd_SendInterrupt_IgnoresSIGINT
- Verifies processes ignoring SIGINT still get SIGKILL after kill_delay
- Ensures optimization doesn't break fallback behavior
3. Test_killCmd_SendInterrupt_SlowGracefulExit
- Verifies processes that take time to cleanup still benefit
- Saves ~700ms when process exits in 300ms vs 1s kill_delay
These tests ensure the goroutine-based optimization continues to work
correctly and prevent future regressions.
Related: https://github.com/air-verse/air/issues/671
---------
Co-authored-by: xiantang <zhujingdi1998@gmail.com>
* feat: add app_start_timeout to proxy configuration and improve error handling
* fix: resolve P0 and P1 issues in app_start_timeout proxy feature
- Use context timeout for first HTTP request to prevent hanging
- Fix WriteHeader called before setting Content-Length header
- Simplify retry loop condition (for err != nil instead of for { if err == nil })
- Add code comments for clarity (HTML vs non-HTML response handling)
- Clean up strconv.Itoa usage (remove unnecessary byte conversion)
- Add documentation to air_example.toml explaining app_start_timeout
- Add comprehensive unit tests for timeout scenarios (5 test cases)
P0 fixes (critical):
- First request now uses context timeout
- WriteHeader now called after all headers are set
P1 fixes (strongly recommended):
- Documented in air_example.toml with usage examples
- Added TestProxy_appStartTimeout with 5 test scenarios
Related to #656 (proxy reload timing issue)
* test: add comprehensive test coverage for app_start_timeout feature
---------
Co-authored-by: Juan Gonzalez <jrg2156@gmail.com>
Co-authored-by: Juan Gonzalez <juanrgon@github.com>
Reduce test execution time through smart waiting and selective parallelization.
Changes:
- Replace fixed sleep delays with condition-based waiting (20ms polling)
- Add CI-aware timeout multiplier (2x in CI environments)
- Enable parallel execution for 30+ pure function tests
- Add test and test-ci Make targets
- Update GitHub Actions workflow with CI flag and timeout
Performance:
- Before: ~60 seconds
- After: ~35 seconds
- Improvement: 42% faster (25 seconds saved)
Technical details:
- New helpers: waitForCondition(), waitForEngineState() in test_util.go
- Optimized tests: TestRebuild, TestRun, TestRebuildWhenRunCmdUsingDLV, etc.
- Parallelized: config_test.go (6 tests), flag_test.go (1 test), util_test.go (13 tests)
- Avoided parallelizing tests with global state (os.Setenv, os.Chdir, signal handlers)
Limitations:
- Some tests cannot be parallelized due to Go 1.25 restrictions on t.Parallel() + t.Setenv()
- Pre-existing race conditions in engine tests remain (not addressed in this change)
* fix: prevent race condition where new builds cancel themselves (#784)
The race condition occurred when a new build was triggered during an active
build. The stop signal meant for the previous build could be consumed by the
new build, causing it to cancel itself.
The fix introduces a build generation counter. When a stop signal is sent,
it includes the target build's generation number. The receiving build only
stops if the signal's generation matches or exceeds its own generation,
preventing newer builds from consuming stop signals meant for older builds.
A mutex ensures atomicity between checking/consuming stop signals and
incrementing the generation counter.
Signed-off-by: majiayu000 <1835304752@qq.com>
* refactor: simplify race condition fix using channel-of-channels pattern
Replace generation counter + mutex approach with simpler solution that
eliminates deadlock risk and reduces code complexity.
Changes:
- Remove buildGeneration, buildRunStopCh, buildRunStopCheckMu fields
- Change buildRunCh from 'chan bool' to 'chan chan struct{}'
- Each build gets unique stop channel, closed to signal cancellation
- Add exitCh check at first stop point for faster shutdown
Benefits:
- No mutex needed (eliminates potential deadlock)
- No atomic counter (simpler state management)
- Impossible for wrong build to receive wrong stop signal
- Better test coverage with rapid-changes scenario
Related to #784
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: xiantang <zhujingdi1998@gmail.com>
* fix: proxy should forward SSE/chunked responses verbatim
For streaming responses (Transfer-Encoding: chunked or text/event-stream),
the proxy now flushes each chunk immediately instead of buffering. This
fixes SSE streaming where events were being accumulated instead of
forwarded in real-time.
Also fixes header ordering issue where Content-Length was set after
WriteHeader() which had no effect.
Fixes#791
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: prevent proxy from buffering SSE and chunked responses
Fixes#791 by ensuring the proxy streams Server-Sent Events and chunked
responses immediately instead of buffering them. This eliminates the
18-78 second delays in real-time event delivery.
Key changes:
- Check for http.Flusher support before writing headers (critical bug fix)
- Extract isStreamingResponse() to detect SSE/chunked responses
- Extract streamCopy() with 512-byte buffer and immediate flushing
- Add comprehensive tests for streaming and non-streaming responses
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: xiantang <zhujingdi1998@gmail.com>