November 5, 2015 By Brad Harris 3 min read

In 2010, Steffen Rendle, currently a senior research scientist at Google, introduced a seminal paper in the world of machine learning. In this work, Rendle described a concept known as a factorization machine. Many years have passed since such an impactful algorithm has been introduced in the world of machine learning.

Deep learning has become a popular topic, but the idea behind it has been understood for years. However, only now do we have the computational power to execute its more theoretical aspects. In this post, I will examine something brand new in this sphere.

About Factorization and Support Vector Machines

Factorization machines can be compared to support vector machines (SVMs) with a polynomial kernel, according to Rendle and others. This algorithm has been well-studied and evaluated. The interesting thing behind support vector machines is that they are still somewhat of a mystery. They are often verified empirically rather than against a theoretical baseline or limit. Even so, they are widely used as general predictors. SVMs define a multidimensional hyperplane, which learns the shape of the curve of the data.

However, SVMs have known weaknesses, some of which are addressed by Rendle’s factorization machines. For example, they require that the output model contain support vectors, which are training examples that help the algorithm define the shape of the hyperplane as well as other parameters such as the margin for prediction.

SVMs function best on dense data — that is, data with few to no missing values and data whose plotted points fall near each other. Finally, in SVMs, the input variables are still independent variables even though the polynomial kernel attempts to model the interaction among the variables. This is computed in polynomial time based upon the number of dimensions in the input data as well as the order of the polynomial (e.g., quadratic, cubic or quartic).

Working Around Weaknesses

Factorization machines were designed to address these weaknesses. Firstly, no training examples are required in the model parameters, making the models much more compact. Factorization machines perform extremely well on sparse data, including data of very high sparsity. As a trade-off, they do not perform well on dense data, so other algorithms are more suited to this class of data.

Finally, and perhaps most amazing of all, factorization machines can model n-way variable interactions, where n is the number of polynomial order. Note that in all current implementations, the order has been fixed at two. While the equations generalize to higher orders, there are some issues such as the numerical stability of the optimization methods that have prevented the generalization of n.

Rendle ingeniously proposed a method of reducing the polynomial computation time to linear complexity. It is this capability of factorization machines that makes them extremely attractive to researchers.

Additionally, Rendle demonstrates that, through proper feature engineering, the machines can mimic the best specialized factorization models developed for very specific situations. This allows them to be applied in a multitude of situations where a specific form of learning algorithm and predictor are required.

Additional Methods

In his original paper, Rendle discussed a method of optimization for the model parameters known as stochastic gradient descent, which works well with several loss functions. However, the optimization algorithm is extremely dependent on the learning rate, one of the hyperparameters of the method. If the learning rate is too high, the model parameters will not converge, while if it is too low, the algorithm is no longer time-efficient.

Because of this, Rendle reviewed three more methods of optimization in “Factorization Machines With LibFM,” known as alternating least-squares, marcov chain monte carlo inference and adaptive stochastic gradient descent. He recommends marcov chain monte carlo inference because there are fewer hyperparameters, and those that must be specified are not as sensitive to their initial values.

Factorization machines have been widely applied in the field of collaborative recommendation systems, where their sparse predictive power and ability to mimic state-of-the-art, specific factorization methods make them generalizable to several tasks. Not only are they used in recommending movies and music, for example, but researchers have even applied them to use social media to predict the stock market with surprisingly positive results.

Factorization and Data Science

As I discussed above, factorization machines can function as general predictors akin to support vector machines in high-dimensional sparse data. While they are not commonly used outside of recommendation systems at present, they have the potential to become classifiers similar to SVMs.

There are several implementations of factorization machines. The state-of-the-art continues to be libFM, created by Rendle himself. However, there are other interesting implementations such as fastFM. There’s even a version of libFM designed for the Spark framework, known as spark-libFM, that allows factorization machines to be parallelized.

Factorization machines are just now being studied rigorously, but they hold great promise in the world of sparse predictors, especially when pairwise interaction of variables is useful and linear complexity with polynomial results is desired. I suspect that soon, as these machines are studied in the realm of general prediction, they will become a critical tool in the data scientist’s toolbox.

More from X-Force

A spotlight on Akira ransomware from X-Force Incident Response and Threat Intelligence

7 min read - This article was made possible thanks to contributions from Aaron Gdanski.IBM X-Force Incident Response and Threat Intelligence teams have investigated several Akira ransomware attacks since this threat actor group emerged in March 2023. This blog will share X-Force’s unique perspective on Akira gained while observing the threat actors behind this ransomware, including commands used to deploy the ransomware, active exploitation of CVE-2023-20269 and analysis of the ransomware binary.The Akira ransomware group has gained notoriety in the current cybersecurity landscape, underscored…

Hive0051 goes all in with a triple threat

13 min read - As of April 2024, IBM X-Force is tracking new waves of Russian state-sponsored Hive0051 (aka UAC-0010, Gamaredon) activity featuring new iterations of Gamma malware first observed in November 2023. These discoveries follow late October 2023 findings, detailing Hive0051's use of a novel multi-channel method of rapidly rotating C2 infrastructure (DNS Fluxing) to deliver new Gamma malware variants, facilitating more than a thousand infections in a single day. An examination of a sample of the lures associated with the ongoing activity reveals…

Ongoing ITG05 operations leverage evolving malware arsenal in global campaigns

13 min read - As of March 2024, X-Force is tracking multiple ongoing ITG05 phishing campaigns featuring lure documents crafted to imitate authentic documents of government and non-governmental organizations (NGOs) in Europe, the South Caucasus, Central Asia, and North and South America. The uncovered lures include a mixture of internal and publicly available documents, as well as possible actor-generated documents associated with finance, critical infrastructure, executive engagements, cyber security, maritime security, healthcare, business, and defense industrial production. Beginning in November 2023, X-Force observed ITG05…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today