＜C++＞哈希表模拟实现STL_unordered

哈希表模板参数的控制

首先需要明确的是，unordered_set是K模型的容器，而unordered_map是KV模型的容器。

要想只用一份哈希表代码同时封装出K模型和KV模型的容器，我们必定要对哈希表的模板参数进行控制。

为了与原哈希表的模板参数进行区分，这里将哈希表的第二个模板参数的名字改为T。

template<class K, class T>
class HashTable

如果上层使用的是unordered_set容器，那么传入哈希表的模板参数就是key和key。

template<class K>
class unordered_set {
public://...
private:HashTable<K, K> _ht;//传入底层哈希表的是K和K
};

但如果上层使用的是unordered_map容器，那么传入哈希表的模板参数就是key以及key和value构成的键值对。

template<class K, class V>
class unordered_map {
public://...
private:HashTable<K, pair<K, V>> _ht;//传入底层哈希表的是K以及K和V构成的键值对
};

也就是说，哈希表中的模板参数T的类型到底是什么，完全却决于上层所使用容器的种类。

在这里插入图片描述

而哈希结点的模板参数也应该由原来的K、V变为T：

上层容器是unordered_set时，传入的T是键值，哈希结点中存储的就是键值。
上层容器是unordered_map时，传入的T是键值对，哈希结点中存储的就是键值对。

更改模板参数后，哈希结点的定义如下：

template<class T>
struct HashNode {HashNode<T> *_next;T _data;HashNode(const T &data): _data(data), _next(nullptr) {}
};

在哈希映射过程中，我们需要获得元素的键值，然后通过哈希函数计算出对应的哈希地址进行映射。

现在由于我们在哈希结点当中存储的数据类型是T，这个T可能就是一个键值，也可能是一个键值对，对于底层的哈希表来说，它并不知道哈希结点当中存储的数据究竟是什么类型，因此需要由上层容器提供一个仿函数，用于获取T类型数据当中的键值。

因此，unordered_map容器需要向底层哈希表提供一个仿函数，该仿函数返回键值对当中的键值。

template<class K, class V>
class unordered_map {
public:struct MapKeyOft {const K &operator()(const pair<K, V> &kv) {return kv.first;}};private:HashTable<K, pair<const K, V>, MapKeyOft> _ht;
};

而虽然unordered_set容器传入哈希表的T就是键值，但是底层哈希表并不知道上层容器的种类，底层哈希表在获取键值时会统一通过传入的仿函数进行获取，因此unordered_set容器也需要向底层哈希表提供一个仿函数。

template<class K>
class unordered_set {
public:struct SetKeyOfT {const K &operator()(const K &key) {return key;}};private:HashTable<K, K, SetKeyOfT> _ht;
};

因此，底层哈希表的模板参数现在需要增加一个，用于接收上层容器提供的仿函数。

template<class K, class T, class KeyOfT>
class HashTable

string类型无法取模问题

经过上面的分析后，我们让哈希表增加了一个模板参数，此时无论上层容器是unordered_set还是unordered_map，我们都能够通过上层容器提供的仿函数获取到元素的键值。

但是在我们日常编写的代码中，用字符串去做键值key是非常常见的事，比如我们用unordered_map容器统计水果出现的次数时，就需要用各个水果的名字作为键值。

而字符串并不是整型，也就意味着字符串不能直接用于计算哈希地址，我们需要通过某种方法将字符串转换成整型后，才能代入哈希函数计算哈希地址。

但遗憾的是，我们无法找到一种能实现字符串和整型之间一对一转换的方法，因为在计算机中，整型的大小是有限的，比如用无符号整型能存储的最大数字是4294967295，而众多字符能构成的字符串的种类却是无限的。

鉴于此，无论我们用什么方法将字符串转换成整型，都会存在哈希冲突，只是产生冲突的概率不同而已。

因此，现在我们需要在哈希表的模板参数中再增加一个仿函数，用于将键值key转换成对应的整型。

template<class K, class T, class KeyOfT, class HashFunc = Hash<K>>
class HashTable

若是上层没有传入该仿函数，我们则使用默认的仿函数，该默认仿函数直接返回键值key即可，但是用字符串作为键值key是比较常见的，因此我们可以针对string类型写一个类模板的特化

template<class K>
struct HashFunc {size_t operator()(const K &key) {return key;}
};// 特化模板，传string的话，就走这个
template<>
struct HashFunc<string> {size_t operator()(const string &s) {size_t hash = 0;for (auto ch: s) {hash += ch;hash *= 31;}return hash;}
};

哈希表正向迭代器的实现

哈希表的正向迭代器实际上就是对哈希结点指针进行了封装，但是由于在实现++运算符重载时，可能需要在哈希表中去寻找下一个非空哈希桶，因此每一个正向迭代器中都应该存储哈希表的地址。

代码：

//前置声明
template<class K, class T, class KeyOft, class Hash = HashFunc<K>>
class HashTable;template<class K, class T, class Ref, class Ptr, class KeyOft, class Hash>
struct HashIterator {typedef HashNode<T> Node;typedef HashTable<K, T, KeyOft, Hash> HT;//Ref和Ptr可能是T&和T*，也可能是const T&/const T*，需要创建一个支持普通转换为const的迭代器typedef HashIterator<K, T, Ref, Ptr, KeyOft, Hash> Self;typedef HashIterator<K, T, T &, T *, KeyOft, Hash> iterator;//正向迭代器HashIterator(Node *node, HT *ht): _node(node), _ht(ht) {}//正向迭代器实现反向迭代器,不能只靠self，如果self传的就是const迭代器，再加上const就有问题了HashIterator(const iterator &it): _node(it._node), _ht(it._ht) {}Ref operator*() {return _node->_data;}Ptr operator->() {return &_node->_data;}bool operator!=(const Self &s) {return _node != s._node;}bool operator==(const Self &s) {return _node == s._node;}Self &operator++() {if (_node->_next != nullptr) {_node = _node->_next;} else {//找下一个不为空的桶KeyOft kot;Hash hash;// 算出我当前的桶位置size_t hashi = hash(kot(_node->_data)) % _ht->_tables.size();++hashi;while (hashi < _ht->_tables.size()) {if (_ht->_tables[hashi] != nullptr) {_node = _ht->_tables[hashi];break;} else {++hashi;}}//没有找到的话，返回_node为空if (hashi == _ht->_tables.size()) {_node = nullptr;}return *this;}return *this;}Node *_node;//迭代器指针HT *_ht;    //哈希表，用于定位下一个桶
};

注意： 哈希表的迭代器类型是单向迭代器，没有反向迭代器，即没有实现–运算符的重载，若是想让哈希表支持双向遍历，可以考虑将哈希桶中存储的单链表结构换为双链表结构。

正向迭代器实现后，我们需要在哈希表的实现当中进行如下操作：

进行正向迭代器类型的typedef，需要注意的是，为了让外部能够使用typedef后的正向迭代器类型iterator，我们需要在public区域进行typedef。
由于正向迭代器中++运算符重载函数在寻找下一个结点时，会访问哈希表中的成员变量_table，而_table成员变量是哈希表的私有成员，因此我们需要将正向迭代器类声明为哈希表类的友元。
将哈希表中查找函数返回的结点指针，改为返回由结点指针和哈希表地址构成的正向迭代器。
将哈希表中插入函数的返回值类型，改为由正向迭代器类型和布尔类型所构成的键值对。

完整的HashTable

#pragma once
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <utility>
#include <vector>
using namespace std;template<class K>
struct HashFunc {size_t operator()(const K &key) {return key;}
};// 特化模板，传string的话，就走这个
template<>
struct HashFunc<string> {size_t operator()(const string &s) {size_t hash = 0;for (auto ch: s) {hash += ch;hash *= 31;}return hash;}
};template<class T>
struct HashNode {HashNode<T> *_next;T _data;HashNode(const T &data): _data(data), _next(nullptr) {}
};//前置声明
template<class K, class T, class KeyOft, class Hash = HashFunc<K>>
class HashTable;template<class K, class T, class Ref, class Ptr, class KeyOft, class Hash>
struct HashIterator {typedef HashNode<T> Node;typedef HashTable<K, T, KeyOft, Hash> HT;//Ref和Ptr可能是T&和T*，也可能是const T&/const T*，需要创建一个支持普通转换为const的迭代器typedef HashIterator<K, T, Ref, Ptr, KeyOft, Hash> Self;typedef HashIterator<K, T, T &, T *, KeyOft, Hash> iterator;//正向迭代器HashIterator(Node *node, HT *ht): _node(node), _ht(ht) {}//正向迭代器实现反向迭代器,不能只靠self，如果self传的就是const迭代器，再加上const就有问题了HashIterator(const iterator &it): _node(it._node), _ht(it._ht) {}Ref operator*() {return _node->_data;}Ptr operator->() {return &_node->_data;}bool operator!=(const Self &s) {return _node != s._node;}bool operator==(const Self &s) {return _node == s._node;}Self &operator++() {if (_node->_next != nullptr) {_node = _node->_next;} else {//找下一个不为空的桶KeyOft kot;Hash hash;// 算出我当前的桶位置size_t hashi = hash(kot(_node->_data)) % _ht->_tables.size();++hashi;while (hashi < _ht->_tables.size()) {if (_ht->_tables[hashi] != nullptr) {_node = _ht->_tables[hashi];break;} else {++hashi;}}//没有找到的话，返回_node为空if (hashi == _ht->_tables.size()) {_node = nullptr;}return *this;}return *this;}Node *_node;//迭代器指针HT *_ht;    //哈希表，用于定位下一个桶
};template<class K, class T, class KeyOft, class Hash>// Hash用于将key转换成可以取模的类型
class HashTable {
public:typedef HashNode<T> Node;typedef HashIterator<K, T, T &, T *, KeyOft, Hash> iterator;typedef HashIterator<K, T, const T &, const T *, KeyOft, Hash> const_iterator;template<class K1, class T1, class Ref1, class Ptr1, class KeyOft1, class Hash1>friend struct HashIterator;//用于迭代器访问HashTable中的private成员变量，即_tables、public:~HashTable() {for (auto &cur: this->_tables) {while (cur) {Node *next = cur->_next;delete cur;cur = next;}cur = nullptr;}}iterator begin() {Node *cur = nullptr;for (size_t i = 0; i < _tables.size(); i++) {cur = _tables[i];if (cur != nullptr) {break;}}return iterator(cur, this);}iterator end() {return iterator(nullptr, this);}const_iterator begin() const {Node *cur = nullptr;for (size_t i = 0; i < _tables.size(); i++) {cur = _tables[i];if (cur != nullptr) {break;}}return const_iterator(cur, this);}const_iterator end() const {return const_iterator(nullptr, this);}//查找Key也是K类型iterator Find(const K &key) {if (this->_tables.size() == 0) {return iterator(nullptr, this);}KeyOft kot;//模板参数，用来区分是kv，还是v由上层map、set传模板参数过来(通过仿函数实现)Hash hash;size_t hashi = hash(key) % this->_tables.size();Node *cur = this->_tables[hashi];while (cur) {if (kot(cur->_data) == key) {return iterator(cur, this);}cur = cur->_next;}return iterator(nullptr, this);}//删除的值key为K类型bool Erase(const K &key) {Hash hash;KeyOft kot;size_t hashi = hash(key) % this->_tables.size();Node *prev = nullptr;Node *cur = this->_tables[hashi];while (cur) {if (kot(cur->_data) == key) {if (prev == nullptr) {this->_tables[hashi] = cur->_next;} else {prev->_next = cur->_next;}delete cur;return true;} else {prev = cur;cur = cur->_next;}}return false;}// 扩容优化，使用素数扩容size_t GetNextPrime(size_t prime) {// SGIstatic const int _stl_num_primes = 28;static const uint64_t _stl_prime_list[_stl_num_primes] = {53, 97, 193, 389, 769, 1543,3079, 6151, 12289, 24593, 49157, 98317,196613, 393241, 786433, 1572869, 3145739, 6291469,12582917, 25165843, 50331653, 100663319, 201326611, 402653189,805306457, 1610612741, 3221225473, 4294967291};size_t i = 0;for (; i < _stl_num_primes; ++i) {if (_stl_prime_list[i] > prime)return _stl_prime_list[i];}return _stl_prime_list[_stl_num_primes - 1];}//插入的类型是T类型，可能是K可能是pair<K,V> 通过模板参数传过来pair<iterator, bool> Insert(const T &data) {Hash hash;// 仿函数用于不能取模的值KeyOft kot;// 已经有这个数，就不用插入了iterator it = Find(kot(data));//如果it不是end()，说明找到了数，就不用插入，返回迭代器和falseif (it != end()) {return make_pair(it, false);}// 负载因子 == 1时扩容if (this->n == this->_tables.size()) {// size_t newsize = _tables.size() == 0 ? 10 : _tables.size() * 2;size_t newsize = this->GetNextPrime(_tables.size());vector<Node *> newtables(newsize, nullptr);for (auto &cur: this->_tables) {// cur是Node*while (cur) {// 保存下一个Node *next = cur->_next;// 头插到新表size_t hashi = hash(kot(cur->_data)) % newtables.size();cur->_next = newtables[hashi];newtables[hashi] = cur;cur = next;}}_tables.swap(newtables);}size_t hashi = hash(kot(data)) % this->_tables.size();// 头插Node *newnode = new Node(data);newnode->_next = _tables[hashi];_tables[hashi] = newnode;this->n++;//插入成功返回，通过newnode，和this构造迭代器，返回true。return make_pair(iterator(newnode, this), true);}// 获取哈希表索引最大长度(哈希桶长度)size_t MaxBucketSize() {size_t max = 0;for (int i = 0; i < _tables.size(); ++i) {auto cur = _tables[i];size_t size = 0;while (cur) {++size;cur = cur->_next;}printf("[%d]->%d\n", i, size);if (size > max) {max = size;}if (max == 5121) {printf("%d", i);break;}}return max;}private:vector<Node *> _tables;size_t n = 0;// 存储有效数据的个数
};

封装unordered_set的代码

#pragma once
#include "HashTable.h"template<class K, class Hash = HashFunc<K>>
class unordered_set {
public:struct SetKeyOfT {const K &operator()(const K &key) {return key;}};public:typedef typename HashTable<K, K, SetKeyOfT, Hash>::const_iterator iterator;typedef typename HashTable<K, K, SetKeyOfT, Hash>::const_iterator const_iterator;iterator begin() {return _ht.begin();}iterator end() {return _ht.end();}const_iterator begin() const {return _ht.begin();}const_iterator end() const {return _ht.end();}//这里的pair<iterator,bool>中的iterator是const类型的，而Insert返回的是普通迭代器pair<iterator, bool> insert(const K &key) {return _ht.Insert(key);}iterator find(const K &key) {return _ht.Find(key);}bool erase(const K &key) {return _ht.Erase(key);}private:HashTable<K, K, SetKeyOfT, Hash> _ht;
};

封装unordered_map的代码

#pragma once#include "HashTable.h"
template<class K, class V, class Hash = HashFunc<K>>
class unordered_map {
public:struct MapKeyOft {const K &operator()(const pair<K, V> &kv) {return kv.first;}};//typename 告诉编译器引入的是一个类型，而不是成员typedef typename HashTable<K, pair<const K, V>, MapKeyOft, Hash>::iterator iterator;typedef typename HashTable<K, pair<const K, V>, MapKeyOft, Hash>::const_iterator const_iterator;iterator begin() {return _ht.begin();}iterator end() {return _ht.end();}const_iterator begin() const {return _ht.begin();}const_iterator end() const {return _ht.end();}pair<iterator, bool> insert(const pair<K, V> kv) {return _ht.Insert(kv);}V &operator[](const K &key) {pair<iterator, bool> ret = insert(make_pair(key, V()));return ret.first->second;}iterator find(const K &key) {return _ht.Find(key);}bool erase(const K &key) {return _ht.Erase(key);}private:HashTable<K, pair<const K, V>, MapKeyOft, Hash> _ht;
};

测试

#include "unordered_map.h"
#include "unordered_set.h"
#include <iostream>
class Date {friend struct HashDate;public:Date(int year = 1900, int month = 1, int day = 1): _year(year), _month(month), _day(day) {}bool operator<(const Date &d) const {return (_year < d._year) ||(_year == d._year && _month < d._month) ||(_year == d._year && _month == d._month && _day < d._day);}bool operator>(const Date &d) const {return (_year > d._year) ||(_year == d._year && _month > d._month) ||(_year == d._year && _month == d._month && _day > d._day);}bool operator==(const Date &d) const {return _year == d._year && _month == d._month && _day == d._day;}friend ostream &operator<<(ostream &_cout, const Date &d);private:int _year;int _month;int _day;
};ostream &operator<<(ostream &_cout, const Date &d) {_cout << d._year << "-" << d._month << "-" << d._day;return _cout;
}//自定义Hash，模板最后一个参数，传自定义类型的话，需要自己写
struct HashDate {size_t operator()(const Date &d) {size_t hash = 0;hash += d._year;hash *= 31;hash += d._month;hash *= 31;hash += d._day;hash *= 31;return hash;}
};struct unordered_map_Test {static void unordered_map_Test1() {unordered_map<int, int> mp;mp.insert(make_pair(1, 1));mp.insert(make_pair(2, 2));mp.insert(make_pair(3, 3));unordered_map<int, int>::iterator it = mp.begin();while (it != mp.end()) {cout << it->first << " " << it->second << endl;++it;}cout << endl;}static void unordered_map_Test2() {string arr[] = {"西瓜", "西瓜", "苹果", "西瓜", "苹果", "苹果", "西瓜", "苹果", "香蕉", "苹果", "香蕉", "梨"};unordered_map<string, int> countMap;for (auto &e: arr) {countMap[e]++;}for (auto &kv: countMap) {cout << kv.first << "" << kv.second << endl;}}static void unordered_map_Test3() {Date d1(2023, 3, 13);Date d2(2023, 3, 13);Date d3(2023, 3, 12);Date d4(2023, 3, 11);Date d5(2023, 3, 12);Date d6(2023, 3, 13);Date a[] = {d1, d2, d3, d4, d5, d6};unordered_map<Date, int, HashDate> countMap;for (auto e: a) {countMap[e]++;}for (auto &kv: countMap) {cout << kv.first << ":" << kv.second << endl;}}
};struct unordered_set_Test {static void unordered_set_Test1() {unordered_set<int> s;s.insert(1);s.insert(3);s.insert(2);s.insert(7);s.insert(8);unordered_set<int>::iterator it = s.begin();while (it != s.end()) {cout << *it << " ";//(*it) = 1;++it;}cout << endl;}
};int main() {unordered_set_Test::unordered_set_Test1();unordered_map_Test::unordered_map_Test1();unordered_map_Test::unordered_map_Test2();unordered_map_Test::unordered_map_Test3();return 0;
}