numlocked 5 hours ago

Hi folks -- I'm Chris from OpenRouter. This one hurts. We're back, but our database was down for about 45 minutes, which caused user and credit lookups to fail, and took down the API. We are investigating why, and of course going to look into improving durability so this failure mode can't happen again. We will share a post-mortem on the site when we have finished our investigation. I'm sorry to our users who count on us.

vintagedave 7 hours ago

One of OpenRouter's main points is that it allows you to bypass individual AI vendors' downtimes. I was considering using it for an uptime-critical project of mine.

The post-mortem will be worth watching.

  • SamLeBarbare 7 hours ago

    OpenRouter: eliminating Single Points of Failure… by introducing a beautifully centralized one.

    • lordofgibbons 6 hours ago

      Their uptime is still infinitely better than any single provider though.

      • sokoloff 6 hours ago

        infinitely?

        • phh 5 hours ago

          Well in FP4

  • drclegg 7 hours ago

    To be fair, it is still useful on this front; it's much faster than waiting for requests to fail and fallback to a backup yourself.

    You still need another backup provider or two for cases like this though.

  • logicchains 7 hours ago

    >One of OpenRouter's main points is that it allows you to bypass individual AI vendors' downtimes.

    Only if you're using a model hosted by multiple providers (e.g. an open model).

    • gkbrk 7 hours ago

      Nope, for closed models too. Claude for example has multiple providers they work with. Google Vertex, Amazon Bedrock and Anthropic themselves all provide inference for Claude.

      The vast majority of models on OpenRouter (both closed and open) have multiple providers.

      • simianwords 6 hours ago

        Interesting. I would think they would safeguard core IP from competitors.

      • OJFord 6 hours ago

        Also you might be fine with routing to a different model.

gitmagic 7 hours ago

Been down for ~50 minutes now and there's no information other than the automated notice on their status page.

  • euazOn 6 hours ago

    FYI, they (oddly enough) communicate mostly through Discord, and they have said they are investigating the issue at 10:30am UTC - 13 minutes after the first user reports.

  • rozenmd 7 hours ago

    Frankly I prefer that than a green tick and "All Systems Operational"

    • baq 6 hours ago

      yellow: "volcano has erupted under the datacenter and it's being flooded with lava. engineers are investigating"

      red: "datacenter has been subject to multiple nuclear strikes. next update in 30 min"

    • euazOn 6 hours ago

      Could that be due to contractual clauses for uptime in SLAs?

    • gitmagic 6 hours ago

      True, that happens far too often.

blitzar 7 hours ago

Can someone power it off and back on again please?

lvl155 7 hours ago

How can a router be down this long? I would have to reconsider using them moving forward.

  • gitmagic 7 hours ago

    I'm mostly concerned about their lack of communication. Would have been nice to know that they are looking into it and an ETA.

andrewinardeer 5 hours ago

So it seems you can't subscribe via RSS. Shame.

  • rozenmd 5 hours ago

    Hey, I run the underlying status page service - I'll add RSS!

jug 7 hours ago

Should be coming up now.

rvz 7 hours ago

Looking forward to the postmortem.