A DevOps Automated Governance Story - Knight Capital
What is Knight Capital
Knight Capital Group, Inc. was a global financial services firm that operated in the world's premier market-making, electronic execution, and offered side platform. It was one of the leading market makers in the nation, with more than 1,800 registered representatives serving approximately 31,000 active retail brokerage accounts. Knight provides access to an array of equity, equity options, fixed assets.
Who is Knight Capital's biggest competitor
They compete with: Goldman Sachs, UBS Investment Bank, and Morgan Stanley. Knight Capital made money through making markets in equities and equity options, electronically executing trades on behalf of broker-dealers, high-frequency trading firms, and institutional investors, and facilitating cash transfers between securities exchanges.
Why did they fail
On August 1, 2012, the company's employees made a system upgrade related to a new Retail Liquidity Program (RLP) called SMARS. This caused Knight's US platform to produce faulty code, which resulted in it placing errant trades on NYSE-listed stocks. Knight Capital was not able to prevent the malfunction from occurring. Over the course of 45 minutes, SMARS routed millions of orders into the market, which resulted in over 4 million executions in 154 stocks, representing over 397 million shares. By the time Knight stopped sending orders, they had a net long position in 80 stocks worth approximately $3.5 billion and a net short position in 74 stocks worth roughly $3.15 billion. Knight ultimately lost over $460 million due to these unwanted positions.
The Details
The deployment of the new RLP code in SMARS was intended to replace older code running on eight servers. However, the old code was still on all of the servers. Previously, the older code had been used for functionality called "Power Peg," which Knight had discontinued using many years earlier. Even though the Power Peg functionality had not been used in years, it was still accessible and callable at the time of the August 1, 2012, RLP deployment. The new RLP code also used a previously utilized flag to activate the Power Peg code. As part of the upgrade, Knight planned to remove the old Power Peg code so that new RLP functionality rather than Power Peg would be activated when this flag was turned on. In 2003, Knight stopped using Power Peg. However, in 2005, the old Power Peg code was repurposed to a different entry point of the SMARS code sequence. According to the Securities and Exchange cease-and-desist proceedings, this Power Peg code was not retested after moving it to see whether the application would still work properly if utilized. During the installation of the new RLP code, the code was only deployed to seven of the eight servers. The Securities and Exchange Commission's cease-and-desist proceedings emphasized that a second technician did not review the change. In DevOps parlance, there was no peer review on the merge request. There were no established procedures in place that required such a review. In the end, orders sent to the eighth server that had the old code and the repurposed flag began sending erroneous orders, causing one of the largest High-Frequency Traders (HFT) to end up with a $460 million loss within 45 minutes and bankruptcy in 24 hours.
Lessons Learned
The following is a summary of the Securities and Exchange Commission's cease-and-desist order dated October 16, 2013.
Knight did not have an adequate written description of its risk management controls as part of its books and records in a consistent manner.
Knight did not have technology governance controls and supervisory procedures sufficient to ensure the orderly deployment of new code or to prevent the activation of code no longer intended for use.
Knight did not have controls and supervisory procedures reasonably designed to guide employees' responses to significant technological and compliance incidents.
Knight did not adequately review its business activity in connection with its market access to assure the overall effectiveness of its risk management controls and supervisory procedures.
Knight's 2012 annual CEO certification was defective because it did not certify Knight's risk management controls and supervisory procedures.
A DevOps Automated Governance Perspective
Although the Securities and Exchange cease-and-desist proceedings dated October 16, 2013, did a great job outlining the incident, there is very little other public information to analyze the incident comprehensively. Therefore most post analyses of the Knight Capital incidents, including this one, should be considered counterfactual reasoning. However, for a DevOps Automated Governance discussion, we can highlight some of what could have been likely contributors to the August 1, 2012 incident. If at least as a learning lesson.
Probably the most glaring observation is that Knight didn't seem to have any evidence of how they deployed their software. One could further surmise from the cease-and-desist proceedings that they were not automating their deployments. Again, this might be a counterfactual; however, the cease-and-desist proceedings strongly suggest that the deployment was a manual process.
"Knight's technicians did not copy the new code to one of the eight SMARS computer servers. "
Maybe if they had a DevOps Automated Governance system, Knight would have created a solid response to the cease-and-desist by showing immutable digitally signed evidence of what happened instead of ad hoc post-investigation reviews. For example, if the deployment was automated using a tool like Chef, Puppet, or Ansible, they could have created immutable evidence for the deployment. Furthermore, it would have been less likely that the eighth server would have been missed if they were using automation. Although highly unlikely, however, if for some "black swan" reason they were following stricter compliance practices and the automation service failed, Knight would have had a much stronger response to the cease-and-desist of why the eighth server was missed. Along those same lines, Knight had no evidence of whether they were following Separation of Duties Principle (SoD) principles. DevOps Automated Governance, at a minimum, would have shown evidence of their awareness and intent of SoD. Also, a DevOps Automated Governance system common practice is to process all of the evidence and controls would have gated the non-compliant activity along with immutable attestations. With control gates, not only would Knight have had immutable evidence, but the non-compliant activity would have been flagged, and the deployment might have been stopped. Almost all of the well-known compliance frameworks require some form of evidence of testing and review procedures. Including but not limited to GDPR, HIPAA, NIST, PCI DSS, and SOX. In the SEC cease-and-desist proceedings, it was observed multiple times that changes to the old and repurposed Power Peg along with the new SMARS software were not tested and reviewed.
Summary
In the end, one can observe that Knight had a very poorly developed risk management strategy that included poor compliance evidence. Their lack of automation across some of their product offerings severely limited their ability to respond and communicate effectively. Finally, they did not seem to have any evidence of compliance testing or reviews in place - even though the prevailing industry best practices required this type of control. Had Knight had any evidence of DevOps Automated Governance and risk management in place, they likely would have been able to communicate the facts quickly and effectively to regulators and customers alike - mitigating damage and adverse impact.