November 5, 2015 By Brad Harris 3 min read

In 2010, Steffen Rendle, currently a senior research scientist at Google, introduced a seminal paper in the world of machine learning. In this work, Rendle described a concept known as a factorization machine. Many years have passed since such an impactful algorithm has been introduced in the world of machine learning.

Deep learning has become a popular topic, but the idea behind it has been understood for years. However, only now do we have the computational power to execute its more theoretical aspects. In this post, I will examine something brand new in this sphere.

About Factorization and Support Vector Machines

Factorization machines can be compared to support vector machines (SVMs) with a polynomial kernel, according to Rendle and others. This algorithm has been well-studied and evaluated. The interesting thing behind support vector machines is that they are still somewhat of a mystery. They are often verified empirically rather than against a theoretical baseline or limit. Even so, they are widely used as general predictors. SVMs define a multidimensional hyperplane, which learns the shape of the curve of the data.

However, SVMs have known weaknesses, some of which are addressed by Rendle’s factorization machines. For example, they require that the output model contain support vectors, which are training examples that help the algorithm define the shape of the hyperplane as well as other parameters such as the margin for prediction.

SVMs function best on dense data — that is, data with few to no missing values and data whose plotted points fall near each other. Finally, in SVMs, the input variables are still independent variables even though the polynomial kernel attempts to model the interaction among the variables. This is computed in polynomial time based upon the number of dimensions in the input data as well as the order of the polynomial (e.g., quadratic, cubic or quartic).

Working Around Weaknesses

Factorization machines were designed to address these weaknesses. Firstly, no training examples are required in the model parameters, making the models much more compact. Factorization machines perform extremely well on sparse data, including data of very high sparsity. As a trade-off, they do not perform well on dense data, so other algorithms are more suited to this class of data.

Finally, and perhaps most amazing of all, factorization machines can model n-way variable interactions, where n is the number of polynomial order. Note that in all current implementations, the order has been fixed at two. While the equations generalize to higher orders, there are some issues such as the numerical stability of the optimization methods that have prevented the generalization of n.

Rendle ingeniously proposed a method of reducing the polynomial computation time to linear complexity. It is this capability of factorization machines that makes them extremely attractive to researchers.

Additionally, Rendle demonstrates that, through proper feature engineering, the machines can mimic the best specialized factorization models developed for very specific situations. This allows them to be applied in a multitude of situations where a specific form of learning algorithm and predictor are required.

Additional Methods

In his original paper, Rendle discussed a method of optimization for the model parameters known as stochastic gradient descent, which works well with several loss functions. However, the optimization algorithm is extremely dependent on the learning rate, one of the hyperparameters of the method. If the learning rate is too high, the model parameters will not converge, while if it is too low, the algorithm is no longer time-efficient.

Because of this, Rendle reviewed three more methods of optimization in “Factorization Machines With LibFM,” known as alternating least-squares, marcov chain monte carlo inference and adaptive stochastic gradient descent. He recommends marcov chain monte carlo inference because there are fewer hyperparameters, and those that must be specified are not as sensitive to their initial values.

Factorization machines have been widely applied in the field of collaborative recommendation systems, where their sparse predictive power and ability to mimic state-of-the-art, specific factorization methods make them generalizable to several tasks. Not only are they used in recommending movies and music, for example, but researchers have even applied them to use social media to predict the stock market with surprisingly positive results.

Factorization and Data Science

As I discussed above, factorization machines can function as general predictors akin to support vector machines in high-dimensional sparse data. While they are not commonly used outside of recommendation systems at present, they have the potential to become classifiers similar to SVMs.

There are several implementations of factorization machines. The state-of-the-art continues to be libFM, created by Rendle himself. However, there are other interesting implementations such as fastFM. There’s even a version of libFM designed for the Spark framework, known as spark-libFM, that allows factorization machines to be parallelized.

Factorization machines are just now being studied rigorously, but they hold great promise in the world of sparse predictors, especially when pairwise interaction of variables is useful and linear complexity with polynomial results is desired. I suspect that soon, as these machines are studied in the realm of general prediction, they will become a critical tool in the data scientist’s toolbox.

More from X-Force

Phishing kit trends and the top 10 spoofed brands of 2023

4 min read -  The 2024 IBM X-Force Threat Intelligence Index reported that phishing was one of the top initial access vectors observed last year, accounting for 30% of incidents. To carry out their phishing campaigns, attackers often use phishing kits: a collection of tools, resources and scripts that are designed and assembled to ease deployment. Each phishing kit deployment corresponds to a single phishing attack, and a kit could be redeployed many times during a phishing campaign. IBM X-Force has analyzed thousands of…

Grandoreiro banking trojan unleashed: X-Force observing emerging global campaigns

16 min read - Since March 2024, IBM X-Force has been tracking several large-scale phishing campaigns distributing the Grandoreiro banking trojan, which is likely operated as a Malware-as-a-Service (MaaS). Analysis of the malware revealed major updates within the string decryption and domain generating algorithm (DGA), as well as the ability to use Microsoft Outlook clients on infected hosts to spread further phishing emails. The latest malware variant also specifically targets over 1500 global banks, enabling attackers to perform banking fraud in over 60 countries…

Threat intelligence to protect vulnerable communities

2 min read - Key members of civil society—including journalists, political activists and human rights advocates—have long been in the cyber crosshairs of well-resourced nation-state threat actors but have scarce resources to protect themselves from cyber threats. On May 14, 2024, the Cybersecurity and Infrastructure Security Agency (CISA) released a High-Risk Communities Protection (HRCP) report developed through the Joint Cyber Defense Collaborative that addresses the threat to these vulnerable groups, with findings contributed by the X-Force Threat Intelligence team.Cyber criminals seek stolen credentialsThe HRCP…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today