Incidents | Steel Incidents reported on status page for Steel https://status.steel.dev/ https://d1lppblt9t2x15.cloudfront.net/logos/6fc319a4408f5ce9b0e90aa72c671dc3.png Incidents | Steel https://status.steel.dev/ en Live streaming issues https://status.steel.dev/incident/746472 Mon, 20 Oct 2025 18:40:00 -0000 https://status.steel.dev/incident/746472#8aa6c85575bc2a938016d84739d1784361bb30ba3d2474707599e62445be258d We have patched the majority of the issues here. There are still lingering network related issues here (especially networks where NAT traversal is limited). We'll push some updates there soon. Live streaming issues https://status.steel.dev/incident/746472 Sun, 19 Oct 2025 19:52:00 -0000 https://status.steel.dev/incident/746472#2b5701960e2e83f880225168aa9d0205dee112c85244697e9fc07bdb30914f8c We're facing some issues on our streaming service but pushing a patch soon -- in the meantime; you can set `headless: true` in the session creation field and it should be patched Some sessions are not successfully being created/released under maintenance https://status.steel.dev/incident/745467 Fri, 17 Oct 2025 19:26:00 -0000 https://status.steel.dev/incident/745467#9a54711e66b1e9e21b85908e771ff6a7b806fb6e09cbfc54d29a4fefb8b104ca Things are stabilizing and seem to be back on track across the board -- we're still keeping an eye on things and will notify if things change. Some sessions are not successfully being created/released under maintenance https://status.steel.dev/incident/745467 Fri, 17 Oct 2025 14:59:00 -0000 https://status.steel.dev/incident/745467#b8b9594b8a78f5169bfd407f3b911336c1abe137a6074939183b88c4b25f6a80 We are working on a fix here and all things should be resolved soon Last Minute Maintenance for Sessions API https://status.steel.dev/incident/745438 Fri, 17 Oct 2025 14:43:44 -0000 https://status.steel.dev/incident/745438#8d76ab33aaa4c092c1943882eb7dfaba321e0dead92dae68e2f369cdef57cc99 Maintenance completed Last Minute Maintenance for Sessions API https://status.steel.dev/incident/745438 Fri, 17 Oct 2025 13:43:44 -0000 https://status.steel.dev/incident/745438#7ff1de675007d548ad2c9300cc277c0489f4815b0d3d317718702443c608cd36 Requests will fail for some users for <10 mins as we ship some crucial infrastructure changes. Apologies in advance, we will be back shortly. Elevated error rates from high usage https://status.steel.dev/incident/727417 Sat, 20 Sep 2025 01:22:00 -0000 https://status.steel.dev/incident/727417#102b0e052923fae52a084c568adc455ebe27fb31e7fd79e02874f83bc681ba25 We've been monitoring the situation for quite a while now and it seems like session acquisition + launching is back to being stable. We'll follow up with an incident report with more details. Elevated error rates from high usage https://status.steel.dev/incident/727417 Fri, 19 Sep 2025 23:44:00 -0000 https://status.steel.dev/incident/727417#e634cbbe6faed61af33c76e730e857e8b6f7cf019604e2d1a8e0de287641a28e Our scaling services are slowly healing from the backlash of earlier requests. We're continuing to monitor the situation and will post an update once everything is all clear. Elevated error rates from high usage https://status.steel.dev/incident/727417 Fri, 19 Sep 2025 21:46:00 -0000 https://status.steel.dev/incident/727417#30ad880deef0c1a399161a20a42dba7b0fd7e76d810950448a4c71d5c2c52475 We're looking into this right now -- we'll be back with updates shortly Facing some blowback from earlier https://status.steel.dev/incident/725558 Wed, 17 Sep 2025 01:55:00 -0000 https://status.steel.dev/incident/725558#55d18a2011105bb61b37e8853b33bbf502fae5a194db16bb3d2d619eed4a042f Last artifact has been cleaned up! We should be smooth sailing from now on. Facing some blowback from earlier https://status.steel.dev/incident/725558 Wed, 17 Sep 2025 01:18:00 -0000 https://status.steel.dev/incident/725558#7988c9ae44e6117643b814c906a04912c5e883c736d707064e2554db81350d3e Some instances are still awaiting new changes -- manually expediting this currently. Elevated error rates due to some backpressure https://status.steel.dev/incident/725405 Tue, 16 Sep 2025 20:48:00 -0000 https://status.steel.dev/incident/725405#f124a32045c59d832ba29d53d7d6ff7fee14a7354cb6c72fc114954b779a55d3 ## Overview The Steel Sessions API entered a degraded state with a higher error rate due to intermittent failures in the session acquisition and reservation logic. The switching logic responsible for managing session lifecycles exhibited unpredictable behavior, resulting in users being unable to establish or maintain browser sessions. This was made prevalent to us due to a large amount of requests/backpressure which revealed this specific issue. ## Timeline - 6:03 PM UTC - Initial increase in number of requests to the Sessions API - 6:53 PM UTC - Engineering team identified intermittent failures in session management + some early user reports - 7:48 PM UTC - Root cause traced to switching logic in session acquisition/reservation system - 8:12 PM UTC - Issue isolated and fix implemented - 8:48 PM UTC - Service fully restored and monitoring confirmed stability ## Root Cause The session acquisition and reservation system contained flawed switching logic that intermittently failed to properly: - Allocate available browser sessions to incoming requests - Release reserved sessions back to the available pool - Handle concurrent session requests during high load periods This resulted in a cascading failure where the session pool became exhausted due to sessions being incorrectly marked as reserved but not properly allocated or released. ## Impact - **User Impact:** Slowly increasing unavailability for browser automation tasks - **Duration:** 2h46m - **Affected Users:** Many users attempting to create new browser sessions during the incident window - **Business Impact:** Service interruption affecting customer workflows and API integrations ## Resolution The engineering team identified and corrected the faulty switching logic in the session management system. The fix involved: - Refactoring the session state transitions to ensure atomic operations - Implementing proper error handling for edge cases in session allocation - Adding additional validation checks for session pool consistency ## Current Status ✅ **Issue Resolved:** The switching logic has been fixed and deployed ✅ **Service Restored:** All session acquisition functionality is operating normally ✅ **Monitoring Active:** Enhanced monitoring is in place to detect similar issues ## Next Steps and Preventive Measures ### Immediate Actions (Next 7 Days) 1. **Enhanced Monitoring Implementation** - Deploy additional alerting for session pool health metrics 2. **Load Testing** - Conduct comprehensive load testing of session management under various scenarios - Validate fix effectiveness under simulated high-concurrency conditions ### Medium-term Actions (Next 30 Days) 1. **Code Review and Testing Enhancement** - Comprehensive audit of session management codebase - Implement additional unit and integration tests for session lifecycle edge cases - Establish chaos engineering practices for session management resilience 2. **Infrastructure Improvements** - Evaluate session pool sizing and auto-scaling mechanisms and test with methods in #1 - Design graceful degradation strategies for session pool exhaustion 3. **Documentation and Runbooks** - Create detailed runbooks for session management incidents ## Lessons Learned - High-concurrency scenarios expose edge cases not apparent under normal load - Proactive monitoring of internal system states (session pools) is critical for early detection - Automated testing should include concurrent access patterns and resource exhaustion scenarios ## Post-Incident Review A detailed post-incident review meeting will be scheduled within 48 hours to discuss: - Technical deep-dive into the root cause - Evaluation of response time and communication - Assessment of proposed preventive measures - Assignment of follow-up action items Elevated error rates due to some backpressure https://status.steel.dev/incident/725405 Tue, 16 Sep 2025 19:00:00 -0000 https://status.steel.dev/incident/725405#0d8cd7cb1d6476c382290b14216d3a8ce7d435d2f38061090c433a6a2165e531 Several services are being affected, but we're working on some upgrades Region Selection Failures https://status.steel.dev/incident/721203 Tue, 09 Sep 2025 17:08:00 -0000 https://status.steel.dev/incident/721203#64e85e357385f67190a32293a81ba315e7bf9baf63f5d4ebc4367c14d5a50067 This should be resolved. Region Selection Failures https://status.steel.dev/incident/721203 Tue, 09 Sep 2025 14:38:00 -0000 https://status.steel.dev/incident/721203#eb6a3a88db388c32dd9c23004b96f41283d2d65ff08dd648280c7f4c747df909 Region selection is failing due to an upstream issue with our global routing provider. We are looking into this. Network issues in BOM https://status.steel.dev/incident/720440 Mon, 08 Sep 2025 13:05:00 -0000 https://status.steel.dev/incident/720440#a56d90429052b689a97c8525ce66dff36bb1a18b28492a647a0578c01b51866b We have pushed changes to reflect the situation in Mumbai and we're no longer routing there. All services should be back online. Network issues in BOM https://status.steel.dev/incident/720440 Mon, 08 Sep 2025 12:39:00 -0000 https://status.steel.dev/incident/720440#b1616fcaa4803301cf9f41b09dc5fb123eed3767aad54fff8ba2f83c6ccfce2a We are continuing to work with our external connectivity providers to address the service degradation caused by multiple submarine cable outages. Traffic between India and Europe/US East is impacted. Since this is an external incident, some degradation may still occur. We'll provide another update as soon as we have more information. We are facing turbulence with our captcha solving services https://status.steel.dev/incident/620201 Fri, 18 Jul 2025 18:37:00 -0000 https://status.steel.dev/incident/620201#3be58aa66c9993053988e24b113b2ced26c18e26cf9542d3eddc73bc44ee114a We increased our scale capabilities and we are back to smooth sailing! We are facing turbulence with our captcha solving services https://status.steel.dev/incident/620201 Tue, 15 Jul 2025 17:02:00 -0000 https://status.steel.dev/incident/620201#e65087edbba646bc517f7878bbdce3df33755cd40f0df42a9ff43b543eddb841 Currently facing a larger number of requests to solve Cloudflare captchas -- we are working on a fix and we'll update this shortly Some session creations are facing issues https://status.steel.dev/incident/607702 Mon, 23 Jun 2025 15:47:00 -0000 https://status.steel.dev/incident/607702#247c971081bc6594970402c9291ea400a0340868ad52c1477662d1cfc24322be Things are starting to look okay, we're monitoring the situation as we get back to peak performance Some session creations are facing issues https://status.steel.dev/incident/607702 Mon, 23 Jun 2025 13:57:00 -0000 https://status.steel.dev/incident/607702#db1f4c0f28c5d656479d30fc49a159ddfe9c0c86cc9095515bbb75a9912dd06c We're investigating an error that is showing up for a lot of users when creating a new session, specifically: ``` Maximum call stack size exceeded ``` We'll be back with more updates Our Proxies are Facing Some Connection Issues https://status.steel.dev/incident/606421 Fri, 20 Jun 2025 15:11:00 -0000 https://status.steel.dev/incident/606421#c396ef8f51d4bb56540a400f3f9da06adcfc861797899df68c824a5e551a3d23 A fix has been deployed -- everything looks good on initial testing but we'll continue to monitor the situation. Our Proxies are Facing Some Connection Issues https://status.steel.dev/incident/606421 Fri, 20 Jun 2025 14:33:00 -0000 https://status.steel.dev/incident/606421#0765a877aaee5d357e11b378015dfd5c2fdf5f6455bb91dec24049644ed7b629 We're looking into this right now and will have an update shortly. We're facing a higher number of requests than usual https://status.steel.dev/incident/581975 Thu, 29 May 2025 22:27:00 -0000 https://status.steel.dev/incident/581975#ab7b1faddc6ca0903e3742612887b4fa31fe1f7eeb2d288eb2a45277a22b73f2 Okay, after some long investigations; we've found some temporary fixes to help with some of the scale problems we're dealing with. We'll continue to improve on this with a more thoughtful long-term solution -- but in the meantime we'll be monitoring our service closely. Thank you all for your patience! We're facing a higher number of requests than usual https://status.steel.dev/incident/581975 Tue, 27 May 2025 19:15:00 -0000 https://status.steel.dev/incident/581975#6eeb886a37d5368547ff9d453a8c0f2dc8481709a4d151cc1420b5b29481ce8b We’re investigating an issue with our sessions API that is impacting some users due to a larger number of requests than usual. We’re working to fix the problem as quickly as we can. We’ll share another update shortly. Service is degraded https://status.steel.dev/incident/554108 Wed, 30 Apr 2025 02:31:00 -0000 https://status.steel.dev/incident/554108#e9be378e9827f405cf025a7530aa3f02777a8b795a5a3e0b76c8e837a3d32433 API was slow to respond for a little bit but the issue has been resolved! Service is degraded https://status.steel.dev/incident/554108 Wed, 30 Apr 2025 02:11:00 -0000 https://status.steel.dev/incident/554108#1e722ec09724a10ce56c6694081e463130153ed6d66c9e20c636a5bfa252f669 Running into issues Burst usage from surf.new https://status.steel.dev/incident/509641 Sat, 08 Feb 2025 21:28:00 -0000 https://status.steel.dev/incident/509641#cedb8844ff042147877095d5d63978768362b3b0ac995f80d171321908938bfc Everything has been resolved here! Burst usage from surf.new https://status.steel.dev/incident/509641 Sat, 08 Feb 2025 21:18:00 -0000 https://status.steel.dev/incident/509641#1bf310debf60286f6e5ef3c457300dce59997b3d4e8005c1e15ed50337fd564e The API is slow to create sessions + some users are experiencing errors Session creation is slower than usual (and browser performance is hindered) https://status.steel.dev/incident/501482 Sat, 25 Jan 2025 08:23:00 -0000 https://status.steel.dev/incident/501482#87809f2df10eff75e9f10c6b5c860f79248be52365925dee029043953ebe0e39 Fixed! Session creation is slower than usual (and browser performance is hindered) https://status.steel.dev/incident/501482 Sat, 25 Jan 2025 00:21:00 -0000 https://status.steel.dev/incident/501482#abdf110644aa4ba2102c0be2055ee1882c312b85c0359f8f7ac575808258e04b We're taking a look at a fix -- should be available soon! Session store is down https://status.steel.dev/incident/500986 Fri, 24 Jan 2025 01:55:00 -0000 https://status.steel.dev/incident/500986#24b9f61692c6e2d0bca2fc3ce43a1f2147e84dce9ea8f9a10e96224dc29c0ce1 We are back. Session store is down https://status.steel.dev/incident/500986 Fri, 24 Jan 2025 01:54:00 -0000 https://status.steel.dev/incident/500986#5b869e3110980951e4c8845280f95a6c29a7707d7bff869631427b39b2c2c890 This issue is related to our internal service. We're working on fixing the issue and service will be available again very soon. Session creation is slower than usual https://status.steel.dev/incident/500088 Wed, 22 Jan 2025 22:59:00 -0000 https://status.steel.dev/incident/500088#d79f72634e05ff81ea1d606174c02890821b54fb4934921e95c3b587c9b4a24a We have identified the cause and pushed a fix! Things look somewhat stable now. Session creation is slower than usual https://status.steel.dev/incident/500088 Wed, 22 Jan 2025 14:32:00 -0000 https://status.steel.dev/incident/500088#89cb007c0bc11dca4f45a0f35bb415179fe145eecf712fcce8b3dc25c38ac5fc We’re looking into it but session creation times have gone up a bit (P99 of 3.7 seconds) We will update this page once a fix is up! Session Live Viewer https://status.steel.dev/incident/495440 Tue, 14 Jan 2025 00:54:00 -0000 https://status.steel.dev/incident/495440#4a4add2514cc0e1578c3441ebbabd1d82bfd7b4741f9a0d347ec9b5249413706 Live session views were down for about ~30 minutes due to an error with our session event persistence layer, everything should be good now! Our routing between services is experiencing some latency https://status.steel.dev/incident/479277 Tue, 17 Dec 2024 13:00:00 -0000 https://status.steel.dev/incident/479277#1d5e36cc24af3fd7253fe966bd5947b35126915300e0eeaed21298145c1489f6 We have finally pushed the fixes for the session creation bugs -- sessions and other requests should be back to being quick and snappy! Our routing between services is experiencing some latency https://status.steel.dev/incident/479277 Mon, 16 Dec 2024 22:28:00 -0000 https://status.steel.dev/incident/479277#20c17a452a70775f427a6d0fd51f2562ac6b9f540dc2d11eb95d83dd2d923659 Some requests are taking longer than normal -- we are on it, we'll have an update shortly Session creation times are taking longer than expected https://status.steel.dev/incident/477215 Sat, 14 Dec 2024 00:15:00 -0000 https://status.steel.dev/incident/477215#27c473fea7433dcbf77d21eef8870a12f3626c79cc79dc8e5b751a4337e28e88 We have pushed a temporary fix while we continue to work on a permanent solution. Session creation times are taking longer than expected https://status.steel.dev/incident/477215 Fri, 13 Dec 2024 15:06:00 -0000 https://status.steel.dev/incident/477215#9895927408205deb0a453dcdf214ae70a52616d4abad8fba2e7b606c2010361a We’re having some trouble with our machine locks during session creation. We’re working to fix the problem as quickly as we can. We’ll share another update shortly. API was down briefly https://status.steel.dev/incident/446973 Fri, 18 Oct 2024 19:51:00 -0000 https://status.steel.dev/incident/446973#aac13237d6382cbe94e4c54ac9d2d0ddf98266fd5973177aebfc2ab8021fbfc8 Our API was briefly down for ~3 minutes as we scaled our cache. Everything should be back to normal.