November 5, 2015 By Brad Harris 3 min read

In 2010, Steffen Rendle, currently a senior research scientist at Google, introduced a seminal paper in the world of machine learning. In this work, Rendle described a concept known as a factorization machine. Many years have passed since such an impactful algorithm has been introduced in the world of machine learning.

Deep learning has become a popular topic, but the idea behind it has been understood for years. However, only now do we have the computational power to execute its more theoretical aspects. In this post, I will examine something brand new in this sphere.

About Factorization and Support Vector Machines

Factorization machines can be compared to support vector machines (SVMs) with a polynomial kernel, according to Rendle and others. This algorithm has been well-studied and evaluated. The interesting thing behind support vector machines is that they are still somewhat of a mystery. They are often verified empirically rather than against a theoretical baseline or limit. Even so, they are widely used as general predictors. SVMs define a multidimensional hyperplane, which learns the shape of the curve of the data.

However, SVMs have known weaknesses, some of which are addressed by Rendle’s factorization machines. For example, they require that the output model contain support vectors, which are training examples that help the algorithm define the shape of the hyperplane as well as other parameters such as the margin for prediction.

SVMs function best on dense data — that is, data with few to no missing values and data whose plotted points fall near each other. Finally, in SVMs, the input variables are still independent variables even though the polynomial kernel attempts to model the interaction among the variables. This is computed in polynomial time based upon the number of dimensions in the input data as well as the order of the polynomial (e.g., quadratic, cubic or quartic).

Working Around Weaknesses

Factorization machines were designed to address these weaknesses. Firstly, no training examples are required in the model parameters, making the models much more compact. Factorization machines perform extremely well on sparse data, including data of very high sparsity. As a trade-off, they do not perform well on dense data, so other algorithms are more suited to this class of data.

Finally, and perhaps most amazing of all, factorization machines can model n-way variable interactions, where n is the number of polynomial order. Note that in all current implementations, the order has been fixed at two. While the equations generalize to higher orders, there are some issues such as the numerical stability of the optimization methods that have prevented the generalization of n.

Rendle ingeniously proposed a method of reducing the polynomial computation time to linear complexity. It is this capability of factorization machines that makes them extremely attractive to researchers.

Additionally, Rendle demonstrates that, through proper feature engineering, the machines can mimic the best specialized factorization models developed for very specific situations. This allows them to be applied in a multitude of situations where a specific form of learning algorithm and predictor are required.

Additional Methods

In his original paper, Rendle discussed a method of optimization for the model parameters known as stochastic gradient descent, which works well with several loss functions. However, the optimization algorithm is extremely dependent on the learning rate, one of the hyperparameters of the method. If the learning rate is too high, the model parameters will not converge, while if it is too low, the algorithm is no longer time-efficient.

Because of this, Rendle reviewed three more methods of optimization in “Factorization Machines With LibFM,” known as alternating least-squares, marcov chain monte carlo inference and adaptive stochastic gradient descent. He recommends marcov chain monte carlo inference because there are fewer hyperparameters, and those that must be specified are not as sensitive to their initial values.

Factorization machines have been widely applied in the field of collaborative recommendation systems, where their sparse predictive power and ability to mimic state-of-the-art, specific factorization methods make them generalizable to several tasks. Not only are they used in recommending movies and music, for example, but researchers have even applied them to use social media to predict the stock market with surprisingly positive results.

Factorization and Data Science

As I discussed above, factorization machines can function as general predictors akin to support vector machines in high-dimensional sparse data. While they are not commonly used outside of recommendation systems at present, they have the potential to become classifiers similar to SVMs.

There are several implementations of factorization machines. The state-of-the-art continues to be libFM, created by Rendle himself. However, there are other interesting implementations such as fastFM. There’s even a version of libFM designed for the Spark framework, known as spark-libFM, that allows factorization machines to be parallelized.

Factorization machines are just now being studied rigorously, but they hold great promise in the world of sparse predictors, especially when pairwise interaction of variables is useful and linear complexity with polynomial results is desired. I suspect that soon, as these machines are studied in the realm of general prediction, they will become a critical tool in the data scientist’s toolbox.

More from X-Force

Strela Stealer: Today’s invoice is tomorrow’s phish

12 min read - As of November 2024, IBM X-Force has tracked ongoing Hive0145 campaigns delivering Strela Stealer malware to victims throughout Europe - primarily Spain, Germany and Ukraine. The phishing emails used in these campaigns are real invoice notifications, which have been stolen through previously exfiltrated email credentials. Strela Stealer is designed to extract user credentials stored in Microsoft Outlook and Mozilla Thunderbird. During the past 18 months, the group tested various techniques to enhance its operation's effectiveness. Hive0145 is likely to be…

Hive0147 serving juicy Picanha with a side of Mekotio

17 min read - IBM X-Force tracks multiple threat actors operating within the flourishing Latin American (LATAM) threat landscape. X-Force has observed Hive0147 to be one of the most active threat groups operating in the region, targeting employee inboxes at scale, with a primary focus on phishing and malware distribution. After a 3-month break, Hive0147 returned in July with even larger campaign volumes, and the debut of a new malicious downloader X-Force named "Picanha,” likely under continued development, deploying the Mekotio banking trojan. Hive0147…

FYSA – Critical RCE Flaw in GNU-Linux Systems

2 min read - Summary The first of a series of blog posts has been published detailing a vulnerability in the Common Unix Printing System (CUPS), which purportedly allows attackers to gain remote access to UNIX-based systems. The vulnerability, which affects various UNIX-based operating systems, can be exploited by sending a specially crafted HTTP request to the CUPS service. Threat Topography Threat Type: Remote code execution vulnerability in CUPS service Industries Impacted: UNIX-based systems across various industries, including but not limited to, finance, healthcare,…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today