May 23, 2022

Over 35,000 Java packages, or more than 8% of the Maven Central repository (the largest Java package repository), have been compromised by the recently discovered log4j vulnerabilities, with enormous ramifications throughout the software industry. The flaws allow a hacker to execute remote code by taking advantage of the vulnerable JNDI lookups capability supplied by the logging library log4j. This vulnerable feature was automatically enabled in various library versions.

Because of its severity and global impact, this vulnerability has fascinated the information security ecosystem since its publication on December 9th. Log4j is a popular logging tool that is used by thousands of software packages (referred to as artifacts in the Java ecosystem) and projects throughout the software industry. The lack of visibility (of the users) into their dependencies as Googlell as transitive dependencies has made fixing difficult; it has also made determining the full blast range of this bug challenging. Google surveyed all variants of all artifacts in the Maven Central Repository using Open Source Insights, a project that helps understand open source dependencies, to evaluate the scope of the issue in the open-source ecosystem of JVM-based languages, and to monitor the ongoing strategies to minimize the affected packages.

How widespread is the log4j vulnerability?

Google discovered that 35,863 of the accessible Java components from Maven Central rely on the impacted log4j code as of December 16, 2021. This indicates that more than 8% of all packages on Maven Central have had at least one variant that is vulnerable to this flaw. (These figures do not include all Java packages, such as directly distributed binaries, but Maven Central is a good proxy for the ecosystem’s health.)

In terms of ecological effect, 8% is tremendous. The average ecological effect of Maven Central recommendations is 2 percent, with a median of less than 0.1 percent.

Around 7,000 of the vulnerable artifacts are direct dependents, which means that any of their versions rely on an affected version of log4j-core or log4j-API, as detailed in the CVEs. The bulk of impacted artifacts is the result of indirect dependencies (that is, dependencies on dependencies), which means log4j is not expressly listed as a dependence of the item but is dragged in as a transitive dependency.

What is the current status of the open-source JVM ecosystem?

Google considered an artifact to be corrected if it had at least one impacted version and had published a larger stable version (according to semantic versioning) that was unaffected. An artifact affected by log4j is deemed repaired if it has been upgraded to 2.16.0 or has lost its reliance on log4j entirely.

At the time of writing, over 5,000 of the impacted items have been repaired. This indicates a quick response and massive effort on the part of both the log4j maintainers and the larger community of open-source customers.

Over 30,000 artifacts are impacted, many of which are transitively dependent on another artifact to fix and are therefore likely to be stopped.

Why is fixing the JVM ecosystem hard?

The majority of artifacts that rely on log4j do so in an indirect manner. The deeper the vulnerability is in a dependency chain, the more steps it takes to repair it. The graphic below depicts a histogram of how deeply an impacted log4j package (core or API) first appears in consumer dependency graphs. More than 80% of the packages have a vulnerability that extends more than one level down, with the majority impacted five levels down (and some as many as nine levels down). These packages will necessitate fixes throughout the tree, beginning with the deepest dependencies.

Another challenge stems from ecosystem-level decisions in the dependency resolution method and requirement definition conventions.

It’s standard practice in the Java ecosystem to define “soft” version requirements — precise versions that the resolution process uses if no other version of the same package exists earlier in the dependency chain. Propagating a repair frequently necessitates deliberate action on the part of the maintainers to change the dependent needs to a patched version.

This practice contrasts with other ecosystems, such as npm, where developers are more likely to declare open ranges for dependent needs. Open ranges enable the resolution method to choose the most recently published version that meets dependency criteria, hence bringing in fresh updates. After the fix is ready, consumers can obtain a patched version on the following build, which swiftly propagates the dependencies.

How long will it take to patch this issue throughout the whole ecosystem?

It’s difficult to say. Google examined all publicly publicized critical warnings impacting Maven packages to determine how soon other vulnerabilities Google re-resolved. Less than half (48%) of the artifacts affected by a vulnerability have been repaired, suggesting that Google may be in for a lengthy wait, maybe years.

HoGooglever, things are looking up on the log4j front. In less than a Googleek, 4,620 impacted artifacts (13%) had been repaired. This statistic, more than any other, demonstrates the enormous work put in by open source maintainers, information security professionals, and consumers all across the world.

What should be the next focus?

Thank you and congrats to the open-source maintainers and log4j users who have already upgraded their versions. Google compiled a list of 500 impacted packages with some of the greatest transitive use as part of our analysis. Prioritizing these packages as a maintainer or user assisting with the patching work might maximize your effect and unblock more of the community.

Leave a Reply

Your email address will not be published.