C++进阶——用Hash封装unordered_map和unordered

前言

源码怎么说

为什么要兼容？

兼容的具体做法？

为什么要把Key转为整数（HashFcn）？

模拟实现

一、建立框架

二、迭代器

++运算符重载

迭代器兼容大法

三、[ ]重载

四、实现不能修改key

实现及测试代码

HashTable.h

unordered_set.h

unordered_map.h

test.cpp

前言

本篇博客我会利用上一篇博客以链地址法实现的哈希表对unordered_map和unordered_set进行封装，大家如果想了解链地址法实现哈希表可以移步我的上一篇博客~

C++进阶——哈希思想及实现-CSDN博客

源码怎么说

我们之前用红黑树封装map和set的时候，就用到了兼容的思想，思路就是红黑树存储的结点由key、value转变成key、pair<key, value>，然后这两个统一以一个模板参数T(data)代替，set就是直接用key即可，map就是使用KeyOfT仿函数把key提取出来。

我们用哈希表封装unordered_map和unordered_set，原理同样如此。

大家和我一起来分析源码，用到的源码是SGI-STL30版本。

由于这个版本在C++11之前，也就是还没有unordered_map和unordered_set，所以里面的命名为hash_map和hash_set,大家明白是什么就好。

一些可能会出现的困惑：

为什么要兼容？

对于unordered_map和unordered_set，底层都是哈希表，如果要为他们都单独设计出一个哈希表出来，那么这两个哈希表的结构除了结点存储的数据不同，其它的逻辑基本上都一样，这样就会出现很多的代码冗余问题，如果我们想办法设计出一个哈希表，可以让unordered_map和unordered_set底层都调用它，就很好的解决了这个问题。

毕竟对于写源码的那些大佬而言，大量的重复代码不够优雅~

兼容的具体做法？

前面我们说到，unordered_map和unordered_set需要用到的哈希表的核心不同就是存储的数据类型不同，那么我们只要让这个哈希表能存储这两种数据不就行了。具体怎么做，那当然是用模板，把桶结点的模板类型写成T（T实例化出data）即可。

当unordered_set需要使用时，T就是key

当unordered_map需要使用时，T就是pair<K, V>

这样产生出来的问题就是我们在插入桶结点时，传入T不能直接拿到key的值了，而我们的unordered_map和unordered_set既然使用的是同一个哈希表模板，当然也不能去T.first去取key，不然unordered_set肯定会出错。

所以我们就需要设计出两个仿函数（各一个），这个仿函数由unordered_map和unordered_set传入哈希表，如果T是key，就返回key，如果T是pair<K, V>，就把其中的key提取并返回。

为什么要把Key转为整数（HashFcn）？

因为对于unordered_map和unordered_set而言，他们的key值最终都要转换为整数取映射到哈希表中然后存储（除法散列法），而他们的key值可能是非整形如一些自定义类型，这些编译器不能帮我们转变成整形的，我们就需要手动写一个仿函数来将其转变为整形。

这样将这个仿函数传入到哈希表里面，哈希表就能处理key值并根据其自身的size进行相应的映射了。

提示：SGI-STL30版本的默认仿函数只能处理常见的char、int、long等整形类别（char就返回ascll值，其它返回原值）以及string，其余的类型如果我们想用作为key都需要自己造一个仿函数

模拟实现

一、建立框架

以下的代码都是基于我上述对于源码的分析及之前的实现而来，只是命名风格可能有略微变化

#pragma once
#include <iostream>
#include <vector>
#include <string>
using namespace std;static const int __stl_num_primes = 28;
static const unsigned long __stl_prime_list[__stl_num_primes] =
{53,         97,         193,       389,       769,1543,       3079,       6151,      12289,     24593,49157,      98317,      196613,    393241,    786433,1572869,    3145739,    6291469,   12582917,  25165843,50331653,   100663319,  201326611, 402653189, 805306457,1610612741, 3221225473, 4294967291
};inline unsigned long __stl_next_prime(unsigned long n)
{const unsigned long* first = __stl_prime_list;const unsigned long* last = __stl_prime_list + __stl_num_primes;const unsigned long* pos = lower_bound(first, last, n);return pos == last ? *(last - 1) : *pos;
}template<class K>
struct KeyToInt
{size_t operator()(const K& key){return (size_t)key;}
};template<>
struct KeyToInt<string>
{size_t operator()(const string& key){size_t ret = 0;for (auto& e : key){ret += e;ret *= 131;}return ret;}
};// 链地址法-哈希表-兼容版
namespace MyHashMS
{// 哈希表的桶结点//          pair或keytemplate<class T>struct HashNode{HashNode(const T& data):_data(data),_next(nullptr){}T _data;HashNode* _next;};// 编译器为了提升效率，查找只会向上，迭代器要用到HashTable，故需要前置声明template<class K, class T, class KeyToInt, class KeyOfT>class HashTable;// 迭代器template<class K, class T, class Ref, class Ptr, class KeyToInt, class KeyOfT >struct HashTable_Iterator{typedef HashNode<T> Node;typedef HashTable<K, T, KeyToInt, KeyOfT> HashTable;typedef HashTable_Iterator<K, T, Ref, Ptr, KeyToInt, KeyOfT> Self;HashTable_Iterator(Node* cur, const HashTable* ht):_cur(cur),_ht(ht){}// 迭代器存储当前结点指针和其对应的哈希表指针（其实_hashtable也可以）Node* _cur;const HashTable* _ht;// 前置++Self& operator++()Ref operator*()Ptr operator->()bool operator==(const Self& other)bool operator!=(const Self& other)};//        key    pair或key    KeyToInt由上层传，这样修改转整数的逻辑比较方便//                                KeyOfT也是从上层传，set传直接返回key的，map传pair<>.first的template<class K, class T, class KeyToInt, class KeyOfT>class HashTable{// 结构体的友元声明需要带着模板参数// 这里友元声明是因为HashTable_Iterator里需要用HashTable的指针去访问其私有成员template<class K, class T, class Ref, class Ptr, class KeyToInt, class KeyOfT >friend struct HashTable_Iterator;typedef HashNode<T> Node;public:typedef HashTable_Iterator<K, T, T&, T*, KeyToInt, KeyOfT> iterator;typedef HashTable_Iterator<K, T, const T&, const T*, KeyToInt, KeyOfT> const_iterator;public:HashTable():_hashtable(__stl_next_prime(0)),_n(0){}iterator begin()iterator end()const_iterator begin() const const_iterator end() const// 清除数据，释放结点，便于析构和赋值运算符重载复用void clear()// 有new申请的结点，这些结点是自定义类型，需要手动写析构函数~HashTable(){clear();}// 拷贝构造HashTable(const HashTable& other)// 赋值运算符重载HashTable& operator=(const HashTable& other)pair<iterator, bool> Insert(const T& data)iterator Find(const K& key)bool Erase(const K& key)private:vector<Node*> _hashtable;size_t _n;};
}

// unordered_set.h#pragma once
#include "HashTable.h"namespace MyHashMS
{template<class K, class KeyToInt = KeyToInt<K>>class unordered_set{struct SetKeyOfT{const K& operator()(const K& key){return key;}};typedef HashTable<K, K, KeyToInt, SetKeyOfT> HashTable;typedef typename HashTable::iterator iterator;typedef typename HashTable::const_iterator const_iterator;public:iterator begin() { return _ht.begin(); }iterator end() { return _ht.end(); }const_iterator begin() const { return _ht.begin(); }const_iterator end() const { return _ht.end(); }pair<iterator, bool> insert(const K& key)iterator find(const K& key)bool erase(const K& key)void print()private:// 关于KeyToInt和SetKeyOfT要不要传模板类型的问题// 模板参数是要传类型的，而前两者K，K都是类型，显然没问题// KeyToInt是unordered_set模板参数里定义好的类型，也没问题// SetKeyOfT是在unordered_set类里面定义的一个结构体，也可以直接作为类型HashTable _ht;// 当实例化 unordered_set<int> 时，K 就会被推导为 int，然后 SetKeyOfT 就会根据 K（即 int）来创建一个实例};}

// unordered_map.h#pragma once#include "HashTable.h"namespace MyHashMS{template<class K, class V, class KeyToInt = KeyToInt<K>>class unordered_map{struct MapKeyOfT{const K& operator()(const pair<K, V>& kv){return kv.first;}};typedef HashTable<K, pair<K, V>, KeyToInt, MapKeyOfT> HashTable;typedef typename HashTable::iterator iterator;typedef typename HashTable::const_iterator const_iterator;public:iterator begin() { return _ht.begin(); }iterator end() { return _ht.end(); }const_iterator begin() const { return _ht.begin(); }const_iterator end() const { return _ht.end(); }V& operator[](const K& key)pair<iterator, bool> insert(const pair<K, V>& kv)iterator find(const K& key)bool erase(const K& key)void print()private:HashTable _ht;};}

二、迭代器

我们要实现迭代器，只需要给哈希表实现迭代器，然后外面套一层壳就可以实现unordered_map和unordered_set的迭代器啦~

我们要实现的迭代器为单向迭代器，因为之前的一个哈希桶的结点是由单向链表串起来的。

我们实现哈希表的迭代器，就需要一个指向哈希表桶结点的指针和指向哈希表的指针，这样才能进行遍历操作

++运算符重载

我们要实现单向迭代器，核心工程就是实现++运算符的重载。

如果让我们去实现该单向迭代器的++运算符重载，大部分人的第一反应就是，从第一个桶开始作为begin，遍历该桶的所有结点，然后再去找下一个桶，到最后一个桶遍历结束的下一个结点为end（nullptr）。

实际上库里面也是这么做的：

// 前置++
Self& operator++()
{	if (!_cur) return *this;KeyToInt kti;KeyOfT kot;const Node* old = _cur;_cur = _cur->_next;if (!_cur){// cur为nullptr，说明当前桶的结点已经都走过了，去寻找下一个桶的第一个结点size_t hashi = kti(kot(old->_data)) % _ht->_hashtable.size() + 1;for (; hashi < _ht->_hashtable.size(); ++hashi){if (_ht->_hashtable[hashi]){_cur = _ht->_hashtable[hashi];break;}}}return *this;
}

迭代器兼容大法

const迭代器和非const迭代器之间的区别就在于其对于容器的访问权限，通过迭代器访问容器的元素肯定需要使用的是迭代器的 * 以及 -> 运算符，所以我们只需要控制好这两个运算符的返回类型，就能控制迭代器对容器的访问权限。

// 迭代器
template<class K, class T, class Ref, class Ptr, class KeyToInt, class KeyOfT >
struct HashTable_Iterator
{typedef HashNode<T> Node;typedef HashTable<K, T, KeyToInt, KeyOfT> HashTable;typedef HashTable_Iterator<K, T, Ref, Ptr, KeyToInt, KeyOfT> Self;HashTable_Iterator(Node* cur, const HashTable* ht):_cur(cur), _ht(ht){}Node* _cur;const HashTable* _ht;Ref operator*(){return _cur->_data;}Ptr operator->(){return &(operator*());}};

我们利用模板参数来控制operator*以及operator->的返回值，来达到一份迭代器代码兼容两种版本迭代器的效果。

template<class K, class T, class KeyToInt, class KeyOfT>
class HashTable
{// 结构体的友元声明需要带着模板参数// 这里友元声明是因为HashTable_Iterator里需要用HashTable的指针去访问其私有成员template<class K, class T, class Ref, class Ptr, class KeyToInt, class KeyOfT >friend struct HashTable_Iterator;typedef HashNode<T> Node;public:// 通过模板参数来控制operator*以及operator->的返回值，来达到一份迭代器代码兼容两种版本迭代器的效果。typedef HashTable_Iterator<K, T, T&, T*, KeyToInt, KeyOfT> iterator;typedef HashTable_Iterator<K, T, const T&, const T*, KeyToInt, KeyOfT> const_iterator;public:HashTable():_hashtable(__stl_next_prime(0)), _n(0){}private:vector<Node*> _hashtable;size_t _n;
};

三、[ ]重载

在进行[ ]重载之前，我们要先对我们之前版本的HashTable的一些函数的返回值做修改，使其返回值提供的信息更丰富起来。

我就不详细粘贴代码了，所有代码都会在博客结尾给出。

库里面对[ ]重载的效果是达到了既能实现插入又能实现修改(key不存在为插入，key存在为修改)。

[ ]的重载只对unordered_map有意义，所以我们只实现这一个版本即可。

namespace MyHashMS
{template<class K, class V, class KeyToInt = KeyToInt<K>>class unordered_map{// ...typedef HashTable<K, pair<K, V>, KeyToInt, MapKeyOfT> HashTable;typedef typename HashTable::iterator iterator;typedef typename HashTable::const_iterator const_iterator;public:V& operator[](const K& key){// ret拿到插入数据的返回值auto ret = insert({ key,V() });// 如果插入成功，返回value的引用    （ 插入功能 ）// 如果插入失败，也就是原本就存在，返回原本就在的kv的value的引用   （ 修改功能 ）return ret.first->second;}private:HashTable _ht;};}

四、实现不能修改key

数据是通过key来定位到哈希表中的，所以如果我们把key修改掉，会造成数据查找不到、数据完整性及安全性等问题，所以实现这个功能还是很有必要的。

在外层，用户通过只能迭代器拿到key，所以我们需要保证Ptr operator->()以及Ref operator*()返回的data中的key值为const，所以需要修改data存储的数据类型中的key为const，所以我们直接定位到哈希表的结点。

template<class T>
struct HashNode
{HashNode(const T& data):_data(data),_next(nullptr){}T _data; // 对于 unordered_map，T 是 pair<const K, V>；对于 unordered_set，T 是 const KHashNode* _next;
};

而T又是从外层的unordered_map和unordered_set传入：

如此我们便实现了不能修改key的功能。

实现及测试代码

下面是我本博客最终实现好的所有代码，供大家参考。

HashTable.h

#pragma once
#include <iostream>
#include <vector>
#include <string>
using namespace std;static const int __stl_num_primes = 28;
static const unsigned long __stl_prime_list[__stl_num_primes] =
{53,         97,         193,       389,       769,1543,       3079,       6151,      12289,     24593,49157,      98317,      196613,    393241,    786433,1572869,    3145739,    6291469,   12582917,  25165843,50331653,   100663319,  201326611, 402653189, 805306457,1610612741, 3221225473, 4294967291
};inline unsigned long __stl_next_prime(unsigned long n)
{const unsigned long* first = __stl_prime_list;const unsigned long* last = __stl_prime_list + __stl_num_primes;const unsigned long* pos = lower_bound(first, last, n);return pos == last ? *(last - 1) : *pos;
}template<class K>
struct KeyToInt
{size_t operator()(const K& key){return (size_t)key;}
};template<>
struct KeyToInt<string>
{size_t operator()(const string& key){size_t ret = 0;for (auto& e : key){ret += e;ret *= 131;}return ret;}
};// 链地址法-哈希表-兼容版
namespace MyHashMS
{// 哈希表的桶结点//          pair或keytemplate<class T>struct HashNode{HashNode(const T& data):_data(data),_next(nullptr){}T _data; // 对于 unordered_map，T 是 pair<const K, V>；对于 unordered_set，T 是 const KHashNode* _next;};// 编译器为了提升效率，查找只会向上，迭代器要用到HashTable，故需要前置声明template<class K, class T, class KeyToInt, class KeyOfT>class HashTable;// 迭代器template<class K, class T, class Ref, class Ptr, class KeyToInt, class KeyOfT >struct HashTable_Iterator{typedef HashNode<T> Node;typedef HashTable<K, T, KeyToInt, KeyOfT> HashTable;typedef HashTable_Iterator<K, T, Ref, Ptr, KeyToInt, KeyOfT> Self;HashTable_Iterator(Node* cur, HashTable* ht):_cur(cur),_ht(ht){}// 迭代器存储当前结点指针和其对应的哈希表指针（其实_hashtable也可以）Node* _cur;HashTable* _ht;// 前置++Self& operator++(){	if (!_cur) return *this;KeyToInt kti;KeyOfT kot;const Node* old = _cur;_cur = _cur->_next;if (!_cur){// cur为nullptr，说明当前桶的结点已经都走过了，去寻找下一个桶的第一个结点size_t hashi = kti(kot(old->_data)) % _ht->_hashtable.size() + 1;for (; hashi < _ht->_hashtable.size(); ++hashi){if (_ht->_hashtable[hashi]){_cur = _ht->_hashtable[hashi];break;}}}return *this;}Ref operator*(){return _cur->_data;}Ptr operator->(){return &(operator*());}bool operator==(const Self& other){return _cur == other._cur && _ht == other._ht;}bool operator!=(const Self& other){return !(*this == other);}};//        key    pair或key    KeyToInt由上层传，这样修改转整数的逻辑比较方便//                                KeyOfT也是从上层传，set传直接返回key的，map传pair<>.first的template<class K, class T, class KeyToInt, class KeyOfT>class HashTable{// 结构体的友元声明需要带着模板参数// 这里友元声明是因为HashTable_Iterator里需要用HashTable的指针去访问其私有成员template<class K, class T, class Ref, class Ptr, class KeyToInt, class KeyOfT >friend struct HashTable_Iterator;typedef HashNode<T> Node;public:typedef HashTable_Iterator<K, T, T&, T*, KeyToInt, KeyOfT> iterator;typedef HashTable_Iterator<K, T, const T&, const T*, KeyToInt, KeyOfT> const_iterator;public:HashTable():_hashtable(__stl_next_prime(0)),_n(0){}iterator begin(){for (auto& e : _hashtable){if (e)return { e,this };}return	end();}iterator end(){return { nullptr, this };}const_iterator begin() const {for (auto& e : _hashtable){if (e)return { e,this };}return	end();}const_iterator end() const{return { nullptr, this };}// 清除数据，释放结点，便于析构和赋值运算符重载复用void clear(){// 走个循环就好for (size_t i = 0; i < _hashtable.size(); ++i){// 查找每个桶有无需要释放的结点，有则释放，没有就看下一个桶Node* cur = _hashtable[i];while (cur){Node* next = cur->_next;delete cur;cur = next;}_hashtable[i] = nullptr;}}// 有new申请的结点，这些结点是自定义类型，需要手动写析构函数~HashTable(){clear();}// 拷贝构造HashTable(const HashTable& other):_hashtable(other._hashtable.size()), _n(0){// 初始化列表是先走的，这里用_hashtable的size不用担心 for (size_t i = 0; i < _hashtable.size(); ++i){Node* cur = other._hashtable[i];while (cur){Insert(cur->_data);cur = cur->_next;}}}// 赋值运算符重载HashTable& operator=(const HashTable& other){// 自我赋值检测if(other != *this){// 先清除原来的数据clear();// 调整大小_hashtable.resize(other._hashtable.size());_n = 0;// 复制数据for (size_t i = 0; i < _hashtable.size(); ++i){Node* cur = other._hashtable[i];while (cur){Insert(cur->_data);cur = cur->_next;}}}return *this;}pair<iterator, bool> Insert(const T& data){KeyToInt kti;KeyOfT kot;// 不允许重复auto ret = Find(kot(data));if (ret != end())return { ret, false };// 扩容if (_n * 10 / _hashtable.size() >= 10){HashTable newHashTable;newHashTable._hashtable.resize(__stl_next_prime(_hashtable.size() + 1));for (size_t i = 0; i < _hashtable.size(); ++i){Node* cur = _hashtable[i];while (cur){size_t hashi = kti(kot(cur->_data)) % newHashTable._hashtable.size();newHashTable.Insert(cur->_data);cur = cur->_next;}}_hashtable.swap(newHashTable._hashtable);}Node* newnode = new Node(data);size_t hashi = kti(kot(data)) % _hashtable.size();// 头插newnode->_next = _hashtable[hashi];_hashtable[hashi] = newnode;++_n;return { { newnode,this }, true };}iterator Find(const K& key){KeyToInt kti;KeyOfT kot;size_t hashi = kti(key) % _hashtable.size();Node* cur = _hashtable[hashi];while (cur && kot(cur->_data) != key){cur = cur->_next;}return { cur,this };}bool Erase(const K& key){KeyToInt kti;KeyOfT kot;size_t hashi = kti(key) % _hashtable.size();Node* cur = _hashtable[hashi];Node* prev = nullptr;while (cur){if (kot(cur->_data) == key){// 头删/不是头删prev == nullptr ? _hashtable[hashi] = cur->_next : prev->_next = cur->_next;delete cur;--_n;return true;}else{prev = cur;cur = cur->_next;}}return false;}private:vector<Node*> _hashtable;size_t _n;};
}

unordered_set.h

#pragma once
#include "HashTable.h"namespace MyHashMS
{template<class K, class KeyToInt = KeyToInt<K>>class unordered_set{struct SetKeyOfT{const K& operator()(const K& key){return key;}};typedef HashTable<K, const K, KeyToInt, SetKeyOfT> HashTable;typedef typename HashTable::iterator iterator;typedef typename HashTable::const_iterator const_iterator;public:iterator begin() { return _ht.begin(); }iterator end() { return _ht.end(); }const_iterator begin() const { return _ht.begin(); }const_iterator end() const { return _ht.end(); }pair<iterator, bool> insert(const K& key){return _ht.Insert(key);}iterator find(const K& key){return _ht.Find(key);}bool erase(const K& key){return _ht.Erase(key);}void print(){for (auto& e : _ht){cout << e << endl;}}private:// 关于KeyToInt和SetKeyOfT要不要传模板类型的问题// 模板参数是要传类型的，而前两者K，K都是类型，显然没问题// KeyToInt是unordered_set模板参数里定义好的类型，也没问题// SetKeyOfT是在unordered_set类里面定义的一个结构体，也可以直接作为类型HashTable _ht;// 当实例化 unordered_set<int> 时，K 就会被推导为 int，然后 SetKeyOfT 就会根据 K（即 int）来创建一个实例};}

unordered_map.h

	#pragma once#include "HashTable.h"namespace MyHashMS{template<class K, class V, class KeyToInt = KeyToInt<K>>class unordered_map{struct MapKeyOfT{const K& operator()(const pair<K, V>& kv){return kv.first;}};typedef HashTable<K, pair<const K, V>, KeyToInt, MapKeyOfT> HashTable;typedef typename HashTable::iterator iterator;typedef typename HashTable::const_iterator const_iterator;public:iterator begin() { return _ht.begin(); }iterator end() { return _ht.end(); }const_iterator begin() const { return _ht.begin(); }const_iterator end() const { return _ht.end(); }V& operator[](const K& key){// ret拿到插入数据的返回值auto ret = insert({ key,V() });// 如果插入成功，返回value的引用    （ 插入功能 ）// 如果插入失败，也就是原本就存在，返回原本就在的kv的value的引用   （ 修改功能 ）return ret.first->second;}pair<iterator, bool> insert(const pair<K, V>& kv){return _ht.Insert(kv);}iterator find(const K& key){return _ht.Find(key);}bool erase(const K& key){return _ht.Erase(key);}void print(){for (auto& e : _ht){cout << e.first << " " << e.second << endl;}}private:HashTable _ht;};}

test.cpp

#include "unordered_set.h"
#include "unordered_map.h"void umap_test()
{MyHashMS::unordered_map<string, string> umap;string s[] = { "abcd", "thatgirl", "hahaha", "ohye", "handsome", "apple", "oppo", "vivo","svip" };for (auto& e : s){umap.insert({ e, e });}// 修改和插入umap["thatgirl"] = "那个女孩";umap["sort"] = "算法";umap.print();cout << endl;if (umap.find("sort") != umap.end())cout << "find it!" << endl;cout << endl;umap.erase("sort");if (umap.find("sort") == umap.end())cout << "don't find!" << endl;cout << endl;umap.print();// 测试不能修改keyauto it = umap.find("apple");// it->first = "pear";it->second = "pear";}void uset_test()
{MyHashMS::unordered_set<string> uset;string s[] = { "abcd", "thatgirl", "hahaha", "ohye", "handsome", "apple", "oppo", "vivo","svip" };for (auto& e : s){uset.insert(e);}uset.print();cout << endl;if (uset.find("ohye") != uset.end())cout << "find!" << endl;uset.erase("ohye");cout << endl;if (uset.find("ohye") == uset.end())cout << "don't find!" << endl;cout << endl;uset.print();// 测试不能修改keyauto it = uset.find("apple");// *it = "pear";}// main函数不能放到命名空间之中，不然编译器找不到程序的入口
int main()
{umap_test();// uset_test();//hash<int> int_hasher;//hash<string> s_hasher;//cout << int_hasher(123) << endl;//cout << s_hasher("acsjak") << endl;return 0;
}

完结撒花啦~

(๑•̀ㅂ•́)و✧