If you’ve ever seen a “recommended item” on eBay or Amazon that was just what you were looking for (or maybe didn’t know you were looking for), it’s likely the suggestion was powered by a recommendation engine. In a recent interview, Co-founder of machine learning startup Delvv, Inc., Raefer Gabriel, said these applications for recommendation engines and collaborative filtering algorithms are just the beginning of a powerful and broad-reaching technology.
Gabriel noted that content discovery on services like Netflix, Pandora, and Spotify are most familiar to people because of the way they seem to “speak” to one’s preferences in movies, games, and music. Their relatively narrow focus of entertainment is a common thread that has made them successful as constrained domains. The challenge lies in developing recommendation engines for unbounded domains, like the internet, where there is more or less unlimited information.
“Some of the more unbounded domains, like web content, have struggled a little bit more to make good use of the technology that’s out there. Because there is so much unbounded information, it is hard to represent well, and to match well with other kinds of things people are considering,” Gabriel said. “Most of the collaborative filtering algorithms are built around some kind of matrix factorization technique and they definitely tend to work better if you bound the domain.”
Of all the recommendation engines and collaborative filters on the web, Gabriel cites Amazon as the most ambitious. The eCommerce giant utilizes a number of strategies to make item-to-item recommendations, complementary purchases, user preferences, and more. The key to developing those recommendations is more about the value of the data that Amazon is able to feed into the algorithm initially, hence reaching a critical mass of data on user preferences, which makes it much easier to create recommendations for new users.
“In order to handle those fresh users coming into the system, you need to have some way of modeling what their interest may be based on that first click that you’re able to extract out of them,” Gabriel said. “I think that intersection point between data warehousing and machine learning problems is actually a pretty critical intersection point, because machine learning doesn’t do much without data. So, you definitely need good systems to collect the data, good systems to manage the flow of data, and then good systems to apply models that you’ve built.”
Beyond consumer-oriented uses, Gabriel has seen recommendation engines and collaborative filter systems used in a narrow scope for medical applications and in manufacturing. In healthcare for example, he cited recommendations based on treatment preferences, doctor specialties, and other relevant decision-based suggestions; however, anything you can transform into a “model of relationships between items and item preferences” can map directly onto some form of recommendation engine or collaborative filter.
One of the most important elements that has driven the development of recommendation engines and collaborative filtering algorithms is the Netflix Prize, Gabriel said. The competition, which offered a $1 million prize to anyone who could design an algorithm to improve upon the proprietary Netflix’s recommendation engine, allowed entrants to use pieces of the company’s own user data to develop a better algorithm. The competition spurred a great deal of interest in the potential applications of collaborative filtering and recommendation engines, he said.
In addition, relative ease of access to an abundant amount of cheap memory is another driving force behind the development of recommendation engines. An eCommerce company like Amazon with millions of items needs plenty of memory to store millions of different of pieces of item and correlation data while also storing user data in potentially large blocks.
“You have to think about a lot of matrix data in memory. And it’s a matrix, because you’re looking at relationships between items and other items and, obviously, the problems that get interesting are ones where you have lots and lots of different items,” Gabriel said. “All of the fitting and the data storage does need quite a bit of memory to work with. Cheap and plentiful memory has been very helpful in the development of these things at the commercial scale.”
Looking forward, Gabriel sees recommendation engines and collaborative filtering systems evolving more toward predictive analytics and getting a handle on the unbounded domain of the internet. While those efforts may ultimately be driven by the Google Now platform, he foresees a time when recommendation-driven data will merge with search data to provide search results before you even search for them.
“I think there will be a lot more going on at that intersection between the search and recommendation space over the next couple years. It’s sort of inevitable,” Gabriel said. “You can look ahead to what someone is going to be searching for next, and you can certainly help refine and tune into the right information with less effort.”
While “mind-reading” search engines may still seem a bit like science fiction at present, the capabilities are evolving at a rapid pace, with predictive analytics at the bow.