Amidst different claims and liabilities for the 24th February outage, the National Stock Exchange, has now blame storage area network (SAN) system, especially the failover logic provided by its vendor for the trading halt.
In a statement, NSE says, "...on 24th February, post link failure, the SAN system at the primary data centre stopped functioning, which was completely unexpected. Subsequent incident analysis showed that the problem was caused by failover logic implemented by the vendor, which did not conform to NSE's stated design requirements, coupled with issues in the configuration done by the SAN vendor that triggered the failover logic. We note that the specific failure logic used by the vendor is not documented, was not communicated to NSE, and was not appropriate for NSE's setup. The resultant SAN failure led to the incident on 24th February."
According to NSE, SAN is a fault tolerant system designed to function seamlessly even in the event of telecom link failures between primary and near disaster recovery (NDR) copies. "One of the features of SAN that was deployed in October 2020 was designed to provide not just zero data loss but also zero down time. Before deployment, the system was tested against various scenarios including link failures and functioned properly," it claimed.
NSE has its primary data centre is in Bandra-Kurla Complex (BKC), an NDR site maintained in Kurla, both in Mumbai and the disaster recovery (DR) site in Chennai. It claims that there is synchronous data replication between NSE's primary site in BKC and NDR site to ensure no data loss in case of primary site failure, and asynchronous replication to its DR site in Chennai, which is designed to take over with zero data loss in case of disaster at the primary site.
"Between our primary and NDR sites, NSE has multiple telecom links with two service providers to ensure redundancy. On 24 February 2021 we had instability in links from both service providers primarily due to digging and construction activity along the path between the two sites. The replication to NDR is designed such that in the event of the links between primary and NDR getting cut, the primary continues operations without any direct effect. Post earlier link failures in February 2021, operations continued without any interruption," it says.
However, NSE says, "on 24th February, post link failure, we saw unexpected behaviour of the SAN system, with the primary SAN becoming inaccessible to the host servers. This resulted in the risk management system of NSE Clearing Ltd (NCL) and other systems such as clearing and settlement, index and surveillance systems becoming unavailable."
"While there was no impact on the trading system, given that the risk management system was unavailable, allowing trading to continue on NSE posed an unacceptable risk, and hence trading had to be halted," the Exchange added.
According to the bourse, on 24th February, NSE Clearing's risk management system (RMS) at BSE and MSEI was functioning and cleared trades executed on BSE and MSEI within the collateral levels available at the time NCL's RMS at the primary site became unavailable. It says, "Updation of collateral was not available as part of the design which is being addressed as part of the strengthening of certain aspects of interoperability that all MIIs are collectively working on with SEBI."
NSE says it took various steps and few others are under implementation to address the SAN and telecom link issues. "We had already placed orders in January for two additional telecom provider links and have removed the SAN software that caused the incident. We are also exploring alternate solutions to de-risk dependency of critical applications to a single storage device," it added.
Last week, the Reserve Bank of India, in its monthly bulletin, had blamed the massive shutdown at NSE on the closure of the NSE Clearing.
"The major issue in this incident was the ineffectiveness of interoperability because of shutting down of the NCL...Another important failure was the inability to switch operations to the disaster recovery site...Brokers believe that timely communication and clarification could have averted the panic sell-off by online traders on the BSE and prevented huge losses to investors," the central bank had said in an article titled, "State of the Economy" published in its monthly bulletin. (
Read: NSE Trading Halt: RBI Says Failure at NSE Clearing Led to Outage on 24th February)
If it did not conform to NSE's stated design requirements why was it accepted in the first place? What prevented NSE from demanding that the failure logic should be documented and communicated to NSE so that its appropriateness and compatibility with NSE's setup could be confirmed? Can a car company absolve itself by saying that "We deeply regret the death of passengers but you see, the airbag vendor supplied airbags that did not confirm to our stated design requirements"? Absolutely callous and shameless.