In 2010, Steffen Rendle, currently a senior research scientist at Google, introduced a seminal paper in the world of machine learning. In this work, Rendle described a concept known as a factorization machine. Many years have passed since such an impactful algorithm has been introduced in the world of machine learning.

Deep learning has become a popular topic, but the idea behind it has been understood for years. However, only now do we have the computational power to execute its more theoretical aspects. In this post, I will examine something brand new in this sphere.

About Factorization and Support Vector Machines

Factorization machines can be compared to support vector machines (SVMs) with a polynomial kernel, according to Rendle and others. This algorithm has been well-studied and evaluated. The interesting thing behind support vector machines is that they are still somewhat of a mystery. They are often verified empirically rather than against a theoretical baseline or limit. Even so, they are widely used as general predictors. SVMs define a multidimensional hyperplane, which learns the shape of the curve of the data.

However, SVMs have known weaknesses, some of which are addressed by Rendle’s factorization machines. For example, they require that the output model contain support vectors, which are training examples that help the algorithm define the shape of the hyperplane as well as other parameters such as the margin for prediction.

SVMs function best on dense data — that is, data with few to no missing values and data whose plotted points fall near each other. Finally, in SVMs, the input variables are still independent variables even though the polynomial kernel attempts to model the interaction among the variables. This is computed in polynomial time based upon the number of dimensions in the input data as well as the order of the polynomial (e.g., quadratic, cubic or quartic).

Working Around Weaknesses

Factorization machines were designed to address these weaknesses. Firstly, no training examples are required in the model parameters, making the models much more compact. Factorization machines perform extremely well on sparse data, including data of very high sparsity. As a trade-off, they do not perform well on dense data, so other algorithms are more suited to this class of data.

Finally, and perhaps most amazing of all, factorization machines can model n-way variable interactions, where n is the number of polynomial order. Note that in all current implementations, the order has been fixed at two. While the equations generalize to higher orders, there are some issues such as the numerical stability of the optimization methods that have prevented the generalization of n.

Rendle ingeniously proposed a method of reducing the polynomial computation time to linear complexity. It is this capability of factorization machines that makes them extremely attractive to researchers.

Additionally, Rendle demonstrates that, through proper feature engineering, the machines can mimic the best specialized factorization models developed for very specific situations. This allows them to be applied in a multitude of situations where a specific form of learning algorithm and predictor are required.

Additional Methods

In his original paper, Rendle discussed a method of optimization for the model parameters known as stochastic gradient descent, which works well with several loss functions. However, the optimization algorithm is extremely dependent on the learning rate, one of the hyperparameters of the method. If the learning rate is too high, the model parameters will not converge, while if it is too low, the algorithm is no longer time-efficient.

Because of this, Rendle reviewed three more methods of optimization in “Factorization Machines With LibFM,” known as alternating least-squares, marcov chain monte carlo inference and adaptive stochastic gradient descent. He recommends marcov chain monte carlo inference because there are fewer hyperparameters, and those that must be specified are not as sensitive to their initial values.

Factorization machines have been widely applied in the field of collaborative recommendation systems, where their sparse predictive power and ability to mimic state-of-the-art, specific factorization methods make them generalizable to several tasks. Not only are they used in recommending movies and music, for example, but researchers have even applied them to use social media to predict the stock market with surprisingly positive results.

Factorization and Data Science

As I discussed above, factorization machines can function as general predictors akin to support vector machines in high-dimensional sparse data. While they are not commonly used outside of recommendation systems at present, they have the potential to become classifiers similar to SVMs.

There are several implementations of factorization machines. The state-of-the-art continues to be libFM, created by Rendle himself. However, there are other interesting implementations such as fastFM. There’s even a version of libFM designed for the Spark framework, known as spark-libFM, that allows factorization machines to be parallelized.

Factorization machines are just now being studied rigorously, but they hold great promise in the world of sparse predictors, especially when pairwise interaction of variables is useful and linear complexity with polynomial results is desired. I suspect that soon, as these machines are studied in the realm of general prediction, they will become a critical tool in the data scientist’s toolbox.

More from X-Force

“Authorized” to break in: Adversaries use valid credentials to compromise cloud environments

4 min read - Overprivileged plaintext credentials left on display in 33% of X-Force adversary simulations Adversaries are constantly seeking to improve their productivity margins, but new data from IBM X-Force suggests they aren’t exclusively leaning on sophistication to do so. Simple yet reliable tactics that offer ease of use and often direct access to privileged environments are still heavily relied upon. Today X-Force released the 2023 Cloud Threat Landscape Report, detailing common trends and top threats observed against cloud environments over the past…

Email campaigns leverage updated DBatLoader to deliver RATs, stealers

11 min read - IBM X-Force has identified new capabilities in DBatLoader malware samples delivered in recent email campaigns, signaling a heightened risk of infection from commodity malware families associated with DBatLoader activity. X-Force has observed nearly two dozen email campaigns since late June leveraging the updated DBatLoader loader to deliver payloads such as Remcos, Warzone, Formbook, and AgentTesla. DBatLoader malware has been used since 2020 by cybercriminals to install commodity malware remote access Trojans (RATs) and infostealers, primarily via malicious spam (malspam). DBatLoader…

New Hive0117 phishing campaign imitates conscription summons to deliver DarkWatchman malware

8 min read - IBM X-Force uncovered a new phishing campaign likely conducted by Hive0117 delivering the fileless malware DarkWatchman, directed at individuals associated with major energy, finance, transport, and software security industries based in Russia, Kazakhstan, Latvia, and Estonia. DarkWatchman malware is capable of keylogging, collecting system information, and deploying secondary payloads. Imitating official correspondence from the Russian government in phishing emails aligns with previous Hive0117 campaigns delivering DarkWatchman malware, and shows a possible significant effort to induce a sense of urgency as…

X-Force releases detection & response framework for managed file transfer software

5 min read - How AI can help defenders scale detection guidance for enterprise software tools If we look back at mass exploitation events that shook the security industry like Log4j, Atlassian, and Microsoft Exchange when these solutions were actively being exploited by attackers, the exploits may have been associated with a different CVE, but the detection and response guidance being released by the various security vendors had many similarities (e.g., Log4shell vs. Log4j2 vs. MOVEit vs. Spring4Shell vs. Microsoft Exchange vs. ProxyShell vs.…