The recent update published by CrowdStrike which caused global issues highlights the need for all software vendors to ensure that their updates to customers are robust and fit for purpose.
This applies to all software vendors, not just those delivering cybersecurity products. However cybersecurity software typically runs some parts with higher privileges, as it needs low-level access to detect and remediate threats, and this means even more care is needed as the impact of any mistake can result in a bluescreen rather than a simple crash. This is critically important when you release several updates every day.
For over 30 years, WithSecure has been delivering updates to our customers in a variety of methods, ranging from floppy discs sent through the postal system, all the way to today’s fully automated systems. In those early days, when getting an update to customers could take a week or more, we quickly learned that there is no room for poor quality in updates.
Is CrowdStrike’s Error a Wake-Up Call?
When a vendor releases an update to their software, it is super-important that their customers have trust in the updates. After all, any update can have a negative impact on their business, putting their own reputation at risk with their own customer-base. If their customers lose faith in them, they’ll find another supplier.
What does WithSecure do to ensure updates are fit for purpose?
- Update types
Updates that change the behavior of WithSecure Elements agents in an endpoint can be categorized into two main categories:
- Feature updates introduce new features to the endpoint, or address discovered software defects
- Continuous intelligence updates provide the features with up-to-date instructions on recognizing both safe and suspicious activity in the endpoint, as well as actions how to improve endpoint’s overall security
Each of these update types follows its own release process, with an appropriate balance to release (and rollback) lead time vs. verification of correct behaviour in existing endpoints.
- Endpoint segmentation
To allow separating software test environment from internal users and customers and support release staggering, many updates are released to users in a staggered fashion: the update is only made available to a specific subset of endpoints, while observing for potential defects.
All Elements installations exist in one of four staggering groups, or release stages:
- CI (Continuous Integration) installations comprise of the endpoints running in WithSecure’s own test environments for the sole purpose of testing software before its release. These include WithSecure’s CI environments for testubg functionality and ability to detect and protect, as well as single endpoint installations deployed for specialized testing activities during feature development. CI installations are never used outside of WithSecure’s own test environments.
Except for the verification of correct functionality, test activities at WithSecure happen in CI environments. Successfully passing CI testing is a mandatory part of all endpoint software development activities. - Staging installations are real-life endpoints controlled and used by WithSecure personnel and used both to verify and to demonstrate new functionality already tested in CI installations. Although Staging installations do provide a level of real-life protection similar to that of Production for one endpoint, software deployed to Staging installations is not yet considered sufficiently validated for customer environments.
- Early Access installations (sometimes still referred to as Pilot) are those endpoints specifically selected by the customer to initially receive new updates. All of WithSecure’s own endpoints are part of Early Access. The level of endpoint protection of Early Access installations is the same as that of Production.
- Production installations consist of all customer endpoints that are not selected for Early Access.
While some update types are released to these groups in strict sequential order (CI, Staging, Early Access, Production) with mandatory human reviews in between to avoid unintentional defects, others are released to all groups almost at once in an automatic fashion to minimize customers’ risk of exposure to new threats. The review process takes into account evaluating telemetry produced by the endpoint agents, and we also take care at this stage to ensure that key personnel are available. For example, we do not release significant updates just before the weekend, and try to avoid making them during main vacation periods.
Workstation running WithSecure agent
Continuous Intelligence updates are released to CI, Staging and Production environments. Before release to any environment, each update undergoes tests to ensure correct functioning of the update. Production Protection updates are also tested for known False Positives. Early Access uses Continuous Intelligence updates identical to Production.
- Architectural Considerations
We have made a deliberate decision to keep the kernel footprint of our Software to a minimum. This ensures that software defects are easier to recover from, but also results in improved platform compatibility. As a consequence of this, protection updates are also never directly read by kernel components.
- WithSecure Release Process and Stages
* Early access version exists for all Managed Detection and Response clients and Elements solutions on Windows and Mac operating systems.
RCB (Release Control Board) is a series of meetings where development team discusses specific feature update with stakeholders and decides whether to release it. The RCB Meeting consists of the development team, product owners, quality leads, and senior developers from other teams. Releases assessed to be higher risk than normal will additionally include customer support and sales representatives in the RCB meeting. Topics to review: test results, business case, dependencies, overall risk, customer feedback, release schedule and decision. The process has been evaluated and improved during earlier incidents.
One important thing to note is that the WithSecure release process ensures that no single person can release code to our customers alone, there are checks and balances throughout the process to prevent this from happening.
In Summary
No testing process can ever cover 100% of possible cases. There are tens of millions of applications available for Microsoft Windows alone, and it would be impossible for us to test against them all.
These applications range from “very simple, relatively low risk of compatibility problems” such as little desktop utilities, all the way through to “complex, potentially high risk” applications which may introduce components such as kernel drivers.
In addition, some of these applications may be proprietary, and unavailable to WithSecure in any form.
- We will never claim that our processes and practices are foolproof. We have made mistakes in releases over our long existence, but we have taken lessons from these. We continuously look to see if we can make improvements to the processes, and as a result we have not released any significantly faulty and wide-ranging update since 2009.
- We have, however, had releases which are incompatible with third-party applications that we simply cannot catch internally before release. The most recent case was in 2022 when one of our releases had a bug triggered by a third-party driver that we did not have access to, but some of our customers did have. Only a small number of customers was affected, and we fixed the issue in the very next release, and updated our processes to suit.
- While WithSecure endeavours to catch as many of these issues as possible in our testing strategy, it is unfortunately inevitable that some issues may not be caught. We do however aim to minimize the severity of any issues, and we can and do revise our processes and practices as we react to issues found.
A good Change Management process is essential in providing control and quality in development tasks. Quality control is a mandatory step in all aspects of our software development here at WithSecure, and we are always looking into ways we can make our own processes better. We’re really good already, but there’s no space for complacency.
Our company vision states “We envision the future where no one should experience a serious loss or be put out of business because of cyber attack or crime. At least no one who puts their trust in us.” Part of this is ensuring that those who put their trust in us have a service they can rely on, and a robust solution with quality tested updates is a very important part of that. What does WithSecure do to ensure updates are fit for purpose?