Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion

Here's a statement of the obvious: The opinions expressed here are those of the participants, not those of the Mutual Fund Observer. We cannot vouch for the accuracy or appropriateness of any of it, though we do encourage civility and good humor.

    Support MFO

  • Donate through PayPal

Huge AWS Outage Affecting Businesses & Consumers

edited October 20 in Other Investing
Fidelity was very slow when I logged on this morning. Once in, all worked as intended. Amazon appears to be down. Have not been able to view account or do any product searches all day.

Link

”At 1:26am PDT (4:26am ET, 9:26am BST), the issue was diagnosed as a big one related to the DynamoDB endpoint of AWS — the digital phonebook of the internet.”

“ While the Amazon mobile app itself going down is one thing, Amazon Web Services is the crucial one here. It's the backbone of a lot of the internet, and the likes of Snapchat, Venmo, Ring, Pokémon GO and more are also down because of it.”

Other reports mention trading at Robinhood as having been impacted

The U.S. government (Defense, CIA) also rely on Amazon’s AWS cloud server to some extent. Don’t know if those have been impacted.

Comments

  • edited October 20
    Wow. Talk about national vulnerability.

    Edit: After reading the below report, it seems the vulnerability is a lot greater than just "national".
  • Following are excerpts from a current report in The Guardian:

    Crash that hit apps and websites around world demonstrates ‘urgent need for diversification in cloud computing’- Amazon Web Services outage shows internet users ‘at mercy’ of too few providers, experts say
    Experts have warned of the perils of relying on a small number of companies for operating the global internet after a glitch at Amazon’s cloud computing service brought down apps and websites around the world.

    The affected platforms included Snapchat, Roblox, Signal and Duolingo as well as a host of Amazon-owned operations including its main retail site and the Ring doorbell company.

    More than 2,000 companies worldwide have been affected, according to Downdetector, a site that monitors internet outages, with 8.1m reports of problems from users including 1.9m reports in the US, 1m in the UK and 418,000 in Australia.

    In the UK, Lloyds bank was affected, as well as its subsidiaries Halifax and Bank of Scotland, while there were also problems accessing the HM Revenue and Customs website on Monday morning. Also in the UK, Ring users complained on social media that their doorbells were not working.

    In the UK alone, reports of problems on individual apps ran into the tens of thousands for each platform. Other affected platforms around the world included Wordle, Coinbase, Duolingo, Slack, Pokémon Go, Epic Games, PlayStation Network and Peloton.

    By 10.30am UK time, Amazon was reporting that the problem, which first emerged at about 8am, was being resolved as AWS was “seeing significant signs of recovery”. However, after reporting further positive progress by late morning in the UK, Amazon still appeared to be struggling to overcome the glitch this afternoon as it acknowledged it was still experiencing elevated errors.

    “We can confirm significant API errors and connectivity issues across multiple services … We are investigating,” AWS said in an update around 7am Pacific time and 3pm UK time. To aid the recovery, AWS said it was putting in place limits on the number of requests that could be made on its platform.

    Experts said the outage underlined the dangers of the internet’s reliance on a small number of tech companies, with Amazon, Microsoft and Google playing a key role in the cloud market. Last year, airports, healthcare services and businesses worldwide were hit by the “largest outage in history”, caused by a botched software upgrade from cybersecurity company CrowdStrike that hit Microsoft’s Windows operating system.

    Amazon reported that the problem on Monday originated in the east coast of the US at Amazon Web Services, a unit that provides vital web infrastructure for a host of companies, which rent out space on Amazon servers. AWS is the world’s largest cloud computing platform.

    Shortly after midnight (PDT) in the US (8am BST) on Monday, Amazon confirmed “increased error rates and latencies” for AWS services in a region on the east coast of the US. The ripple effect hit services around the world, with Downdetector reporting problems with the same sites in multiple continents.

    Experts said the outage appeared to be an IT issue rather than a cyber-attack. AWS’s online health dashboard referred to DynamoDB, its database system where AWS customers store their data. Amazon appeared to rule out foul play, saying the root cause was an internal subsystem responsible for monitoring its load balancers, which prevent traffic from overloading its servers.

    .
  • edited October 20
    DNS resolution issues impacting the DynamoDB API endpoint were the root cause for this outage.
    DynamoDB is a centralized database service that many internet-based services
    use to store key data, track user information, and manage operations.
    Although many AWS services appear to be functioning properly now,
    Amazon's recovery efforts are ongoing.
    Here are the latest updates reported by Amazon.

    Oct 20 12:15 PM PDT We continue to observe recovery across all AWS services, and instance launches are succeeding across multiple Availability Zones in the US-EAST-1 Regions. For Lambda, customers may face intermittent function errors for functions making network requests to other services or systems as we work to address residual network connectivity issues. To recover Lambda’s invocation errors, we slowed down the rate of SQS polling via Lambda Event Source Mappings. We are now increasing the rate of SQS polling as we experience more successful invocations and reduced function errors.
    We will provide another update by 1:00 PM PDT.

    Oct 20 11:22 AM PDT Our mitigations to resolve launch failures for new EC2 instances continue to progress and we are seeing increased launches of new EC2 instances and decreasing networking connectivity issues in the US-EAST-1 Region. We are also experiencing significant improvements to Lambda invocation errors, especially when creating new execution environments (including for Lambda@Edge invocations).
    We will provide an update by 12:00 PM PDT.

    Oct 20 10:38 AM PDT Our mitigations to resolve launch failures for new EC2 instances are progressing and the internal subsystems of EC2 are now showing early signs of recovering in a few Availability Zones (AZs) in the US-EAST-1 Region. We are applying mitigations to the remaining AZs at which point we expect launch errors and network connectivity issues to subside.
    We will provide an update by 11:30 AM PDT.

    Oct 20 10:03 AM PDT We continue to apply mitigation steps for network load balancer health and recovering connectivity for most AWS services. Lambda is experiencing function invocation errors because an internal subsystem was impacted by the network load balancer health checks. We are taking steps to recover this internal Lambda system. For EC2 launch instance failures, we are in the process of validating a fix and will deploy to the first AZ as soon as we have confidence we can do so safely.
    We will provide an update by 10:45 AM PDT.

    https://health.aws.amazon.com/health/status

    The article below supposedly lists all the websites which were impacted by this outage.
    I doubt this list is complete.
    https://www.techradar.com/computing/internet/amazon-outage-every-website-knocked-offline-by-the-huge-aws-outage
  • edited October 20
    Deleted
  • edited October 20
    Interesting to find out via this outage that most customers (including large ones) do not use the disaster recovery features of DynamoDB. Maybe this is AWS nudging them to do so:)
  • edited October 20
    stayCalm said:

    Interesting to find out via this outage that most customers (including large ones) do not use the disaster recovery features of DynamoDB. Maybe this is AWS nudging them to do so:)

    It appears that many customers did not implement the redundancy needed
    to fall back to other regions or cloud providers.

    Vaibhav Tupe, a senior member with nonprofit technical organization IEEE, stated:
    “This outage shows that even the largest cloud providers are vulnerable
    when failure occurs at the control-plane level.
    It raises fundamental questions about over reliance on a single provider or region and may accelerate
    demand for multi-cloud and multi-region architectures as a baseline expectation for resilience.

  • the cloud oligopoly is by far not the major problem; at least there is some capitalism nudging competition\competence.

    gop cuts to cyberdefense from foreign actors is a clear sign america is open to looting.
    individuals, and 99% of corporations, are on their own. china, russia, and NK has tens of thousands waiting for this era.

    https://www.forbes.com/sites/thomasbrewster/2025/10/02/government-shutdown-cisa-weaker-insiders-say/

    the one time trumps plays equal opp globalist, its for cybercrime (see crypto)
    one should consider moving abroad may not protect liquid wealth assets...see today's post on muni hack.
  • And this just now from NPR:
    Amazon's cloud computing service, AWS, is like an invisible scaffolding that helps much of the internet function. AWS lets companies store and manage data online using its database service DynamoDB, which was the service affected by the outage.

    "In other words, they rent out their cloud computing resources to others so they can serve their own customers," says Chang Lou, an assistant professor at the University of Virginia who specializes in cloud computing.

    An early-morning software update to DynamoDB, however, contained an error, which took down the service in Northern Virginia. The error within that update then caused a chain reaction of service failures and disruptions.

    Comment:   "An update... contained an error".  That's why I never update anything unless forced to, and then kicking and screaming every inch of the way.  If it works don't f___ with it !!

  • edited October 20
    "That's why I never update anything unless forced to, and then kicking and screaming every inch of the way.
    If it works don't f___ with it !!"


    This is a very good philosophy.
    I would make an exception for security vulnerabilities.
  • Yes, grudgingly...
  • edited October 20
    The over reliance of vast swaths of the internet on less than 10 tech firms has been known now for many moons.
  • Maybe when they finally get all that "AI" to work right they can ask it how to fix that.
Sign In or Register to comment.