作者:kuber 来源:博客园   酷勤网收集 2008-06-28

摘要
  使用基于Slope One算法的推荐需要以下数据: 1. 有一组用户;2. 有一组Items;3. 用户会对其中某些项目打分表达他们的喜好Slope One算法要解决的问题是, 对某个用户, 已知道他对其中一些Item的Rating了, 向他推荐一些他还没有Rating的Items, 以增加销售机会.

上一篇简单介绍了Slope One算法的概念, 这次介绍C#实现
使用基于Slope One算法的推荐需要以下数据:
1. 有一组用户
2. 有一组Items(文章, 商品等)
3. 用户会对其中某些项目打分(Rating)表达他们的喜好
Slope One算法要解决的问题是, 对某个用户, 已知道他对其中一些Item的Rating了, 向他推荐一些他还没有Rating的Items, 以增加销售机会. :-)

一个推荐系统的实现包括以下三步:
1. 计算出任意两个Item之间Rating的差值
2. 输入某个用户的Rating记录, 推算出对其它Items的可能Rating值
3. 根据Rating的值排序, 给出Top Items;

第一步:例如我们有三个用户和4个Items, 用户打分的情况如下表.

Ratings User1 User2 User3
Item1 5 4 4
Item2 4 5 4
Item3 4 3 N/A
Item4 N/A 5 5

在第一步中我们的工作就是计算出Item之间两两的打分之差, 也就是使说计算出以下矩阵:
  Item1 Item2 Item3 Item4
Item1 N/A 0/3 2/2 -2/2
Item2 0/3 N/A 2/2 -1/2
Item3 -2/2 -2/2 N/A -2/1
Item4 2/2 1/2 2/1 N/A


考虑到加权算法, 还要记录有多少人对这两项打了分(Freq), 我们先定义一个结构来保存Rating:
    public class Rating
    {
        public float Value { get; set; }
        public int Freq { get; set; }

        public float AverageValue
        {
            get {return Value / Freq;}
        }
    }
我决定用一个Dictionary来保存这个结果矩阵:
    public class RatingDifferenceCollection : Dictionary<string, Rating>
    {
        private string GetKey(int Item1Id, int Item2Id)
        {
            return Item1Id + "/" + Item2Id;
        }

        public bool Contains(int Item1Id, int Item2Id)
        {
            return this.Keys.Contains<string>(GetKey(Item1Id, Item2Id));
        }

        public Rating this[int Item1Id, int Item2Id]
        {
            get {
                    return this[this.GetKey(Item1Id, Item2Id)];
            }
            set { this[this.GetKey(Item1Id, Item2Id)] = value; }
        }
    }

接下来我们来实现SlopeOne类. 首先创建一个RatingDifferenceCollection来保存矩阵, 还要创建HashSet来保持系统中总共有哪些Items:
    public class SlopeOne
    {       
        public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection();  // The dictionary to keep the diff matrix
        public HashSet<int> _Items = new HashSet<int>();  // Tracking how many items totally

方法AddUserRatings接收一个用户的打分记录(Item-Rating): public void AddUserRatings(IDictionary<int, float> userRatings)
AddUserRatings中有两重循环, 外层循环遍历输入中的所有Item, 内层循环再遍历一次, 计算出一对Item之间Rating的差存入_DiffMarix, 记得Freq加1, 以记录我们又碰到这一对Items一次:
    Rating ratingDiff = _DiffMarix[item1Id, item2Id];
    ratingDiff.Value += item1Rating - item2Rating;
    ratingDiff.Freq += 1;

对每个用户调用AddUserRatings后, 建立起矩阵. 但我们的矩阵是以表的形式保存:
  Rating Dif Freq
Item1-2 0 3
Item1-3 1 2
Item2-1 0 3
Item2-3 1 2
Item3-1 -1 2
Item3-2 -1 2
Item1-4 -1 2
Item2-4 -0.5 2
Item3-4 -2 1
Item4-1 1 2
Item4-2 0.5 2
Item4-3 2 1


第二步:输入某个用户的Rating记录, 推算出对其它Items的可能Rating值:
public IDictionary<int, float> Predict(IDictionary<int, float> userRatings)
也是两重循环, 外层循环遍历_Items中所有的Items; 内层遍历userRatings, 用此用户的ratings结合第一步得到的矩阵, 推算此用户对系统中每个项目的Rating:
    Rating itemRating = new Rating(); // Prediction of this user's rating
    ...
    Rating diff = _DiffMarix[itemId, inputItemId]:
    itemRating.Value += diff.Freq * (diff.AverageValue + userRating.Value);
    itemRating.Freq += diff.Freq;

第三步:得到用户的Rating预测后,就可以按rating排序, 向用户推荐了. 测试一下:
    Dictionary<int, float> userRating userRating = new Dictionary<int, float>();
    userRating.Add(1, 5);
    userRating.Add(3, 4);
    IDictionary<int, float> Predictions = test.Predict(userRating);
    foreach (var rating in Predictions)
    {
        Console.WriteLine("Item " + rating.Key + " Rating: " + rating.Value);
    }   
输出:
Item 2 Rating: 5
Item 4 Rating: 6

改进:
观察之前产生的矩阵可以发现, 其中有很多浪费的空间; 例如: 对角线上永远是不会有值的. 因为我们是用线性表保存矩阵值, 已经避免了这个问题;
对角线下方的值和对角线上方的值非常对称,下方的值等于上方的值乘以-1; 在数据量很大的时候是很大的浪费. 我们可以通过修改RatingDifferenceCollection来完善. 可以修改GetKey方法, 用Item Pair来作为Key:
    private string GetKey(int Item1Id, int Item2Id) {
        return (Item1Id < Item2Id) ? Item1Id + "/" + Item2Id : Item2Id + "/" + Item1Id ;;
    }
完整代码见下面的附录,在.net 3.5上调试通过;
参考资料
tutorial about how to implement Slope One in Python

文章来自:Slope One 之二: C#实现

附录:(下面代码来自:C# Implement of Slope One

Here's the source code of my C# implementation of Weighted Slope One. Tested on .net 3.5; The instruction in Chinese can be find here and here.
Reference:
Tutorial about how to implement Slope One in Python
Slope One Predictors for Online Rating-Based Collaborative Filtering
Recommender Systems: Slope One


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace SlopeOne
{
    public class Rating
    {
        public float Value { get; set; }
        public int Freq { get; set; }

        public float AverageValue
        {
            get { return Value / Freq; }
        }
    }

    public class RatingDifferenceCollection : Dictionary<string, Rating>
    {
        private string GetKey(int Item1Id, int Item2Id)
        {
            return (Item1Id < Item2Id) ? Item1Id + "/" + Item2Id : Item2Id + "/" + Item1Id ;
        }

        public bool Contains(int Item1Id, int Item2Id)
        {
            return this.Keys.Contains<string>(GetKey(Item1Id, Item2Id));
        }

        public Rating this[int Item1Id, int Item2Id]
        {
            get {
                    return this[this.GetKey(Item1Id, Item2Id)];
            }
            set { this[this.GetKey(Item1Id, Item2Id)] = value; }
        }
    }

     public class SlopeOne
    {       
        public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection();  // The dictionary to keep the diff matrix
        public HashSet<int> _Items = new HashSet<int>();  // Tracking how many items totally

        public void AddUserRatings(IDictionary<int, float> userRatings)
        {
            foreach (var item1 in userRatings)
            {
                int item1Id = item1.Key;
                float item1Rating = item1.Value;
                _Items.Add(item1.Key);

                foreach (var item2 in userRatings)
                {
                    if (item2.Key <= item1Id) continue; // Eliminate redundancy
                    int item2Id = item2.Key;
                    float item2Rating = item2.Value;

                    Rating ratingDiff;
                    if (_DiffMarix.Contains(item1Id, item2Id))
                    {
                        ratingDiff = _DiffMarix[item1Id, item2Id];
                    }
                    else
                    {
                        ratingDiff = new Rating();
                        _DiffMarix[item1Id, item2Id] = ratingDiff;
                    }

                    ratingDiff.Value += item1Rating - item2Rating;
                    ratingDiff.Freq += 1;
                }
            }
        }

        // Input ratings of all users
        public void AddUerRatings(IList<IDictionary<int, float>> Ratings)
        {
            foreach(var userRatings in Ratings)
            {
                AddUserRatings(userRatings);
            }
        }

        public IDictionary<int, float> Predict(IDictionary<int, float> userRatings)
        {
            Dictionary<int, float> Predictions = new Dictionary<int, float>();
            foreach (var itemId in this._Items)
            {
                if (userRatings.Keys.Contains(itemId))    continue; // User has rated this item, just skip it

                Rating itemRating = new Rating();

                foreach (var userRating in userRatings)
                {
                    if (userRating.Key == itemId) continue;
                    int inputItemId = userRating.Key;
                    if (_DiffMarix.Contains(itemId, inputItemId))
                    {
                        Rating diff = _DiffMarix[itemId, inputItemId];
                        itemRating.Value += diff.Freq * (userRating.Value + diff.AverageValue * ((itemId < inputItemId) ? 1 : -1));
                        itemRating.Freq += diff.Freq;
                    }
                }
                Predictions.Add(itemId, itemRating.AverageValue);               
            }
            return Predictions;
        }

        public static void Test()
        {
            SlopeOne test = new SlopeOne();

            Dictionary<int, float> userRating = new Dictionary<int, float>();
            userRating.Add(1, 5);
            userRating.Add(2, 4);
            userRating.Add(3, 4);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 4);
            userRating.Add(2, 5);
            userRating.Add(3, 3);
            userRating.Add(4, 5);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 4);
            userRating.Add(2, 4);
            userRating.Add(4, 5);
            test.AddUserRatings(userRating);

            userRating = new Dictionary<int, float>();
            userRating.Add(1, 5);
            userRating.Add(3, 4);

            IDictionary<int, float> Predictions = test.Predict(userRating);
            foreach (var rating in Predictions)
            {
                Console.WriteLine("Item " + rating.Key + " Rating: " + rating.Value);
            }
        }
    }
}

分类: 算法艺术 设计模式

上一篇:Windows游戏中的NP完全问题   下一篇:应用程序的性能: C# vs C/C++