哈希(Hash)

文章目录

- 1.stl提供的哈希容器
- - 1.1unordered_map和unordered_set容器
  - - 1.1.1定义
    - 1.1.2这些容器和map和set的区别在于**：
    - 1.1.3物理结构
- 2 哈希（实现哈希）
- - 2.1本质：哈希表
  - 2.2底层原理：哈希概念
  - - 2.2.1哈希方法
    - 2.2.2 哈希函数（映射方法，解决数据分散，空间过大问题）
    - 2.2.3 哈希冲突（映射出现多个数据站一个坑问题）
    - 2.2.4 哈希冲突解决
    - - 2.2.4.1 闭散列
      - 闭散列哈希的实现(包含增删查找）
      - 2.2.4.2 开散列（哈希桶/拉链法）
    - 2.2.5思考
    - - 1. 负载因子（解决哈希表空间不足问题，本质扩容）
      - 2.解决哈希关键字是负数的映射问题
      - 3.对于关键字是string类型的哈希如何进行哈希函数
- 3.unordered_map和unordered_set 的模拟实现

1.stl提供的哈希容器

在c++98中，STL提供了底层为红黑树结构的一系列关联式容器，在查询时效率可达到log2(N),这个容器在在数据过大时，查询时间过高；于是stl提供了4个unordered系列关联式容器(底层使用哈希），unordered系列关联式容器包含4个包括：unordered_map和unordered_set，unordered_multimap和unordered_multiset
这里我们只讲：unordered_map和unordered_set，

1.1unordered_map和unordered_set容器

1.1.1定义

unordered_map的文档
unordered_set在线文档说明
在这里插入图片描述

1.1.2这些容器和map和set的区别在于**：

1.迭代器只可以向后遍历（++），不可向前遍历（- )；
2.该容器是无序的；不会像map和set升序排序；他只会按插入顺序排序；
3.运行速度高于map和set的速度；
在这里插入图片描述
rand()是重复较多的随机数；
rand()+1是重复较少的随机数；
1是没有重发的，有序数；
这里得出结论：unordered较快；除了“1”时map和set比unordered快；

1.1.3物理结构

在这里插入图片描述
这里的二维结构：第一层是每个元素的key 第二层是每个元素的value

2 哈希（实现哈希）

2.1本质：哈希表

哈希表：哈希表就是在关键字和存储位置之间建立对应关系，使得元素的查找可以以O(1)的效率进行，其中关键字和存储位置之间是通过散列函数建立关系；
所以哈希表的本质：数组
举个例子：
使用哈希表来查找单词中出现一次的字母；
在这里插入图片描述

2.2底层原理：哈希概念

哈希表内：有哈希方法

2.2.1哈希方法

插入元素
根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放
搜索元素
对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

2.2.2 哈希函数（映射方法，解决数据分散，空间过大问题）

引起哈希冲突的一个原因可能是：哈希函数设计不够合理。
哈希函数设计原则：
哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址时，其值
域必须在0到m-1之间
哈希函数计算出来的地址能均匀分布在整个空间中
哈希函数应该比较简单
解决方法

直接定址法–(常用)
取关键字的某个线性函数为散列地址：Hash（Key）= Key
除留余数法–(常用)
设散列表中允许的地址数为m，取一个不大于m，但最接近或者等于m的质数p作为除数，（p是数组大小）
按照哈希函数：Hash(key) = key% p(p<=m),将关键码转换成哈希地址
除留余数法例子

例如：数据集合{1，7，6，4，5，9}；
哈希函数设置为：hash(key) = key % 容器名。size(); **容器名。size()**为存储元素底层空间总的大小。
在这里插入图片描述

2.2.3 哈希冲突（映射出现多个数据站一个坑问题）

对于两个数据元素的关键字 $k_i$ 和 $k_j$ (i != j)，有 $k_i$ != $k_j$ ，但有：Hash( $k_i$ ) ==
Hash( $k_j$ )，即：不同关键字通过相同哈希函数计算出相同的哈希地址，该种现象称为哈希冲突
或哈希碰撞。
在这里插入图片描述

发生哈希冲突该如何处理呢？
使用闭散列和开散列，下下个知识点；

2.2.4 哈希冲突解决

解决哈希冲突两种常见的方法是：闭散列和开散列
下面是两种方法实现的哈希

2.2.4.1 闭散列

闭散列：也叫开放定址法，当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置中的“下一个” 空位置中去。那如何寻找下一个空位置呢？
在这里插入图片描述
闭散列实现哈希（这里使用线性探测举例）
原理：

使用除留余数法出现哈希冲突时，在原哈希函数计算的位置向后找空位置，知道找到空位置；

体现：
这里线性探测体现在插入查找删除这几个哈希方法上

闭散列哈希的实现(包含增删查找）

#pragma once
#define _CRT_SECURE_NO_WARNINGS 1
#include<utility>
#include<vector>
#include<string>
namespace bit
{enum Status{EMPTY,EXIST,DELETE};template<class K, class V>struct HashData{pair<K, V> _kv;Status _s;          //状态};template<class k>struct HashFunc{size_t operator()(const k& key){return (size_t)key;}};//template<>//struct HashFunc<string>//{//	size_t operator()(const string& key)//	{//		size_t hash = 0;//		for (auto e :key)//		{//			hash *= 31;//			hash += e;//		}//		cout << key << ":" << hash << endl;//		return hash;//	}template<>struct HashFunc<string>{size_t operator()(const string& key){// BKDRsize_t hash = 0;for (auto e : key){hash *= 31;hash += e;}cout << key << ":" << hash << endl;return hash;}};template<class K, class V ,class Hash=HashFunc<K>>class HashTable{public:HashTable(){_tables.resize(10);}bool Insert(const pair<K, V>& kv){if (_n * 10 / _tables.size() == 7){size_t newSize = _tables.size() * 2;HashTable<K, V> newHT;newHT._tables.resize(newSize);for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]._s == EXIST){newHT.Insert(_tables[i]._kv);}}_tables.swap(newHT._tables);}Hash hf;size_t hashi = hf(kv.first) % _tables.size();while (_tables[hashi]._s == EXIST){hashi++;hashi %= _tables.size();}_tables[hashi]._kv = kv;_tables[hashi]._s = EXIST;++_n;return true;}HashData<K, V>* Find(const K& key){Hash hf;size_t hashi = hf(key) % _tables.size();while (_tables[hashi]._s != EMPTY){//下面如果不检查_tables的s是否是EXIST，那么后面的删除操作会将已经删除的元素当成hi存在if (_tables[hashi]._s == EXIST && _tables[hashi]._kv.first == key){return &_tables[hashi];}hashi++;hashi %= _tables.size();}return NULL;}bool Erase(const K& key){HashData<K, V>* ret = Find(key);if (ret){ret->_s = DELETE;--_n;return true;}elsereturn false;}void Print(){for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]._s == EXIST){cout << "[" << i << "]->" << _tables[i]._kv.first << ":" << _tables[i]._kv.second << endl;}else if (_tables[i]._s == EMPTY){printf("[%d]->\n", i);}else{printf("[%d]->D\n", i);}}cout << endl;}private:vector < HashData < K, V >> _tables;size_t _n = 0; // 存储的关键字的个数};void TestHT1(){HashTable<int, int> ht;int a[] = { 4,14,24,34,5,7,1 };for (auto e : a){ht.Insert(make_pair(e, e));}ht.Insert(make_pair(3, 3));ht.Insert(make_pair(3, 3));ht.Insert(make_pair(-3, -3));ht.Print();ht.Erase(3);ht.Print();if (ht.Find(3)){cout << "3存在" << endl;}else{cout << "3不存在" << endl;}ht.Insert(make_pair(3, 3));ht.Insert(make_pair(23, 3));ht.Print();}}

2.2.4.2 开散列（哈希桶/拉链法）

在这里插入图片描述
将使用哈希函数结果相同的元素用链表链接；
当元素过多时，使用二叉树来存储元素；代替链表；
代码实现：

#pragma once
#define _CRT_SECURE_NO_WARNINGS 1
#include<utility>
#include<vector>
#include<string>
using namespace std;template<class k>
struct HashFunc
{size_t operator()(const k& key){return (size_t)key;}
};
//template<>
//struct HashFunc<string>
//{
//	size_t operator()(const string& key)
//	{
//		size_t hash = 0;
//		for (auto e :key)
//		{
//			hash *= 31;
//			hash += e;
//		}
//		cout << key << ":" << hash << endl;
//		return hash;
//	}
template<>
struct HashFunc<string>
{size_t operator()(const string& key){// BKDRsize_t hash = 0;for (auto e : key){hash *= 31;hash += e;}cout << key << ":" << hash << endl;return hash;}
};
namespace hash_bucket
{template<class T>struct HashNode{HashNode* _next;T _data;HashNode(const T&data):_data(data),_next(nullptr){}};template<class K, class T, class KeyOfT, class Hash>class HashTable;template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>struct __HTIterator{typedef HashNode<T> Node;typedef __HTIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;Node* _node;const HashTable<K, T, KeyOfT, Hash>* _pht;// vector<Node*> * _ptb;size_t _hashi;__HTIterator(Node* node, HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi):_node(node), _pht(pht), _hashi(hashi){}__HTIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi):_node(node), _pht(pht), _hashi(hashi){}Self& operator++(){if (_node->_next){// 当前桶还有节点，走到下一个节点_node = _node->_next;}else{// 当前桶已经走完了，找下一个桶开始//KeyOfT kot;//Hash hf;//size_t hashi = hf(kot(_node->_data)) % _pht._tables.size();++_hashi;while (_hashi < _pht->_tables.size()){if (_pht->_tables[_hashi]){_node = _pht->_tables[_hashi];break;}++_hashi;}if (_hashi == _pht->_tables.size()){_node = nullptr;}}return *this;}Ref operator*(){return _node->_data;}Ptr operator->(){return &_node->_data;}bool operator!=(const Self& s){return _node != s._node;}};template<class K,class T,class KeyOfT ,class Hash>class HashTable{template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>friend struct __HTIterator;typedef HashNode<T> Node;public:/*typedef _HTIterator<K, T, T&, T*, KeyOfT, Hash> iterator;*///类型typdef的位值决定了这个类型是否可以在类外使用用来定义；public类型可以在类外定义，private只可以在类内使用typedef __HTIterator<K, T, T&, T*, KeyOfT, Hash> iterator;typedef __HTIterator<K, T, const T&, const T*, KeyOfT, Hash> const_iterator;iterator begin(){for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]){return iterator(_tables[i], this, i);}}return end();}iterator end(){return iterator(nullptr, this, -1);}const_iterator begin() const {for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]){return const_iterator(_tables[i], this, i);}}return end();}const_iterator end() const{return const_iterator(nullptr, this, -1);}HashTable(){_tables.resize(10);}~HashTable(){for (size_t i = 0; i < _tables.size(); i++){Node* cur = _tables[i];while (cur){Node* next = cur->_next;delete cur;cur = next;}_tables[i] = nullptr;}}pair<iterator,bool> Insert(const T& data){Hash hf;KeyOfT kot;iterator it = Find(kot(data));if (it!=end()){return make_pair(it,false);}if (_n == _tables.size()){   //这种输入对内存消耗大，需要重新定义链表				size_t newSize = _tables.size() * 2;/*HashTable<K, V>newHT;newHT._tables.resize(newSize);for (size_t i = 0; i < _tables.size(); i++){Node* cur = _tables[i];while (cur){newHT.Insert(cur->_kv);cur = cur->_next;}}_tables.swap(newHT._tables);*/vector<Node*>newTables;newTables.resize(_tables.size() * 2, nullptr);for (size_t i = 0; i < _tables.size(); i++){Node* cur = _tables[i];while (cur){Node* next = cur->_next;size_t hashi = hf(kot(cur->_data)) % newTables.size();cur->_next = newTables[hashi];newTables[hashi] = cur;cur = next;}_tables[i] = nullptr;}_tables.swap(newTables);}size_t hashi = hf(kot(data)) % _tables.size();Node* newnode = new Node(data);newnode->_next = _tables[hashi];_tables[hashi] = newnode;++_n;return make_pair(iterator(newnode,this,hashi),true);}iterator Find(const K& key){Hash hf;KeyOfT  kot;size_t hashi = hf(key) % _tables.size();Node * cur = _tables[hashi];while (cur){if (kot(cur->_data) == key){return iterator(cur,this,hashi);}cur = cur->_next;}return end();}bool Erase(const K& key){Hash hf;size_t hashi = hf(key) % _tables.size();Node* prev = nullptr;Node* cur = _tables[hashi];KeyOfT kot;while (cur){if (kot(cur->_data) == key){if (prev == nullptr)_tables[hashi] = cur->_next;elseprev->_next = cur->_next;delete cur;return true;}prev = cur;cur = cur->_next;}return false;}private:vector<Node*>_tables;size_t _n = 0;};}

2.2.5思考

1. 负载因子（解决哈希表空间不足问题，本质扩容）

负载因子：存储数据的个数/表的大小
使用在上面的插入中；

2.解决哈希关键字是负数的映射问题

这里不需要做出改变的原因：
除留余数法–(常用)
Hash(key) = key% p(p<=m),
这里的p是一个无符号整型；所以hash会强制转化为正数；

3.对于关键字是string类型的哈希如何进行哈希函数

这种哈希无法使用除留余数法
hash(key) = key % **容器名。size()

3.unordered_map和unordered_set 的模拟实现

模板类：hafuma.h

#pragma once
#define _CRT_SECURE_NO_WARNINGS 1
#include<utility>
#include<vector>
#include<string>
using namespace std;template<class k>
struct HashFunc
{size_t operator()(const k& key){return (size_t)key;}
};
//template<>
//struct HashFunc<string>
//{
//	size_t operator()(const string& key)
//	{
//		size_t hash = 0;
//		for (auto e :key)
//		{
//			hash *= 31;
//			hash += e;
//		}
//		cout << key << ":" << hash << endl;
//		return hash;
//	}
template<>
struct HashFunc<string>
{size_t operator()(const string& key){// BKDRsize_t hash = 0;for (auto e : key){hash *= 31;hash += e;}cout << key << ":" << hash << endl;return hash;}
};
namespace hash_bucket
{template<class T>struct HashNode{HashNode* _next;T _data;HashNode(const T&data):_data(data),_next(nullptr){}};template<class K, class T, class KeyOfT, class Hash>class HashTable;template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>struct __HTIterator{typedef HashNode<T> Node;typedef __HTIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;Node* _node;const HashTable<K, T, KeyOfT, Hash>* _pht;// vector<Node*> * _ptb;size_t _hashi;__HTIterator(Node* node, HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi):_node(node), _pht(pht), _hashi(hashi){}__HTIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi):_node(node), _pht(pht), _hashi(hashi){}Self& operator++(){if (_node->_next){// 当前桶还有节点，走到下一个节点_node = _node->_next;}else{// 当前桶已经走完了，找下一个桶开始//KeyOfT kot;//Hash hf;//size_t hashi = hf(kot(_node->_data)) % _pht._tables.size();++_hashi;while (_hashi < _pht->_tables.size()){if (_pht->_tables[_hashi]){_node = _pht->_tables[_hashi];break;}++_hashi;}if (_hashi == _pht->_tables.size()){_node = nullptr;}}return *this;}Ref operator*(){return _node->_data;}Ptr operator->(){return &_node->_data;}bool operator!=(const Self& s){return _node != s._node;}};template<class K,class T,class KeyOfT ,class Hash>class HashTable{template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>friend struct __HTIterator;typedef HashNode<T> Node;public:/*typedef _HTIterator<K, T, T&, T*, KeyOfT, Hash> iterator;*///类型typdef的位值决定了这个类型是否可以在类外使用用来定义；public类型可以在类外定义，private只可以在类内使用typedef __HTIterator<K, T, T&, T*, KeyOfT, Hash> iterator;typedef __HTIterator<K, T, const T&, const T*, KeyOfT, Hash> const_iterator;iterator begin(){for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]){return iterator(_tables[i], this, i);}}return end();}iterator end(){return iterator(nullptr, this, -1);}const_iterator begin() const {for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]){return const_iterator(_tables[i], this, i);}}return end();}const_iterator end() const{return const_iterator(nullptr, this, -1);}HashTable(){_tables.resize(10);}~HashTable(){for (size_t i = 0; i < _tables.size(); i++){Node* cur = _tables[i];while (cur){Node* next = cur->_next;delete cur;cur = next;}_tables[i] = nullptr;}}pair<iterator,bool> Insert(const T& data){Hash hf;KeyOfT kot;iterator it = Find(kot(data));if (it!=end()){return make_pair(it,false);}if (_n == _tables.size()){   //这种输入对内存消耗大，需要重新定义链表				size_t newSize = _tables.size() * 2;/*HashTable<K, V>newHT;newHT._tables.resize(newSize);for (size_t i = 0; i < _tables.size(); i++){Node* cur = _tables[i];while (cur){newHT.Insert(cur->_kv);cur = cur->_next;}}_tables.swap(newHT._tables);*/vector<Node*>newTables;newTables.resize(_tables.size() * 2, nullptr);for (size_t i = 0; i < _tables.size(); i++){Node* cur = _tables[i];while (cur){Node* next = cur->_next;size_t hashi = hf(kot(cur->_data)) % newTables.size();cur->_next = newTables[hashi];newTables[hashi] = cur;cur = next;}_tables[i] = nullptr;}_tables.swap(newTables);}size_t hashi = hf(kot(data)) % _tables.size();Node* newnode = new Node(data);newnode->_next = _tables[hashi];_tables[hashi] = newnode;++_n;return make_pair(iterator(newnode,this,hashi),true);}iterator Find(const K& key){Hash hf;KeyOfT  kot;size_t hashi = hf(key) % _tables.size();Node * cur = _tables[hashi];while (cur){if (kot(cur->_data) == key){return iterator(cur,this,hashi);}cur = cur->_next;}return end();}bool Erase(const K& key){Hash hf;size_t hashi = hf(key) % _tables.size();Node* prev = nullptr;Node* cur = _tables[hashi];KeyOfT kot;while (cur){if (kot(cur->_data) == key){if (prev == nullptr)_tables[hashi] = cur->_next;elseprev->_next = cur->_next;delete cur;return true;}prev = cur;cur = cur->_next;}return false;}private:vector<Node*>_tables;size_t _n = 0;};}

unordered_map.h

#pragma once
#include"hafuma.h"
namespace bit
{template<class K,class V, class Hash = HashFunc<K>>class unordered_map{struct MapKeyOfT{const K& operator() (const pair<K,V>& key){return key.first;}};public:typedef typename hash_bucket::HashTable<K, pair< const K, V>, MapKeyOfT, Hash>::iterator iterator;iterator begin(){return _ht.begin();}iterator end(){return _ht.end();}pair<iterator,bool> insert(const pair<K,V>& key){return _ht.Insert(key);}iterator find(const K& key){return _ht.Find(key);}bool erase(const K& key){return _ht.Erase(key);}V& operator[](const K& key){pair<iterator, bool>  ret = _ht.Insert(make_pair(key, V()));return ret.first->second;}const V& operator[](const K& key)const{pair<iterator, bool>  ret = _ht.Insert(make_pair(key, V()));return ret.first->second;}private:hash_bucket::HashTable<K, pair<const K,V>, MapKeyOfT,Hash>_ht;};void test_map(){unordered_map<string, string> dict;dict.insert(make_pair("sort", ""));dict.insert(make_pair("string", "ַ"));dict.insert(make_pair("insert", ""));for (auto& kv : dict){//kv.first += 'x';kv.second += 'x';cout << kv.first << ":" << kv.second << endl;}cout << endl;string arr[] = { "㽶", "","ƻ", "", "ƻ", "", "ƻ", "ƻ", "", "ƻ", "㽶", "ƻ", "㽶" };unordered_map<string, int> count_map;for (auto& e : arr){count_map[e]++;}for (auto& kv : count_map){cout << kv.first << ":" << kv.second << endl;}cout << endl;}}

unordered_set.h

#pragma once
#include"hafuma.h"
namespace bit
{template<class K,class Hash=HashFunc<K>>class unordered_set{struct SetKeyOfT{const K& operator() (const K& key){return key;}};public :public : typedef typename hash_bucket::HashTable<K, K, SetKeyOfT, Hash>::const_iterator iterator;typedef typename hash_bucket::HashTable<K, K, SetKeyOfT, Hash>::const_iterator const_iterator;pair<const_iterator,bool> insert(const K& key){auto ret = _ht.Insert(key);return pair<const_iterator, bool>(const_iterator(ret.first._node, ret.first._pht, ret.first._hashi), ret.second);}const_iterator begin() const{return _ht.begin();}const_iterator end() const{return _ht.end();}iterator find(const K& key){return _ht.Find(key);}bool erase(const K& key){return _ht.Erase(key);}private:hash_bucket::HashTable<K, K, SetKeyOfT ,Hash>_ht;};void test_set(){unordered_set<int> us;us.insert(5);us.insert(15);}
}