W5 Lists_stacks_queues_priority queues
To explore the relationship between interfaces and classes in the Java Collections Framework hierarchy.
To use the common methods defined in the Collectioninterface for operating collections.
To use the Iteratorinterface to traverse the elements in a collection.
To use a for-each loop to traverse the elements in a collection.
To explore how and when to use ArrayListor LinkedListto store elements.
To compare elements using the Comparableinterface and the Comparator interface.
To use the static utility methods in the Collectionsclass for sorting, searching, shuffling lists, and finding the largest and smallest element in collections.
To distinguish between Vectorand ArrayListand to use the Stackclass for creating stacks.
To explore the relationships among Collection, Queue, LinkedList, and PriorityQueueand to create priority queues using the PriorityQueueclass.
To use stacks to write a program to evaluate expressions.
Data sturcture
A data structure or collection is a collection of data organized in some fashion
not only stores data but also supports operations for accessing and manipulating the data
In object-oriented thinking, a data structure, also known as a container, is an object that stores other objects, referred to as elements
Java Collections Framework hierarchy
Java Collections Framework支持两种类型的容器:
1) One for storing a collection of elements, simply called a collection
Lists store an ordered collection of elements
Sets store a group of nonduplicate elements
Stacks store objects that are processed in a last-in, first-out fashion
Queues store objects that are processed in a first-in, first-out fashion
PriorityQueues store objects that are processed in the order of their priorities
2) One for storing key/value pairs, called a map
p.s: 这玩意就是python里的字典
The Collection interface is the root interface for manipulating a collection of objects
abstract collection类提供了collection接口的部分实现(collection中的所有方法,除了add、size和iterator方法)
注意:The Collection interface应用了 Iterable接口
All the concrete classes in the Java Collections Framework implement the java.lang.Cloneable and java.io.Serializable interfaces except that java.util.PriorityQueue does not implement the Cloneable interface
Collection interface中的一些方法不能在具体的子类中实现(例如,只读集合不能添加或删除)否则java.lang.UnsupportedOperationException
Iterator
Each collection is Iterable
List
有序,有索引,可重复
A list collection stores elements in a sequential order, and allows the user to specify where the element is stored
The user can also access the elements by index
The Listinterface stores elements in sequence and permits duplicates
Two concrete classes in Java Collections Framework: ArrayListand LinkedList
Iterator
listtiterator()和listtiterator (startIndex)方法返回一个listtiterator的实例
listtiteratorinterface扩展了Iteratorinterface,用于双向遍历列表并向列表中添加元素
The nextIndex()method returns the index of the next element in the iterator, and the previousIndex()returns the index of the previous element in the iterator
The add(element)method inserts the specified element into the list immediately before the next element that would be returned by the next()method defined in the Iterator interface
ArrayList
ArrayListclass和LinkedList类是List接口的具体实现
A list can grow or shrink dynamically
While an array is fixed once it is created
LinkedList
Node
Each node contains an element/value, and each node is linked to its next neighbor:
Node 也可以单独被定义成一个类
get(i)
get(i)方法可用于链表,但是查找每个元素是一个更耗时的操作
相反,你应该使用Iterator或for-each循环:
ArrayList vs LinkedList
If you need to support random access through an index without inserting or removing elements from any place other than the end, ArrayListoffers the most efficient collection
如果您需要支持通过索引的随机访问,而不需要从末尾以外的任何位置插入或删除元素
If your application requires the insertion or deletion of elements from at the beginning of the list, you should choose LinkedList
如果您的应用程序需要插入或删除列表开头的元素
Comparator Interface Comparator
want to compare the elements that are not instances of Comparable or by a different criteria than Comparable → define Comparator →Define a class that implements the java.util.Comparator<T> interface
(参考代码)
Static methods
max, min, disjoint, and frequency methods for collections
sort, binarySearch, reverse, shuffle, copy, and fill methods for lists
for List
1. Sorting:
Collections.sort(list);
To sort it in descending order, you can simply use the Collections.reverseOrder()
2. Binary search:
Collections.binarySearch(list1,7));
To use this method, the list must be sorted in increasing order
If the keyis not in the list, the method returns -(insertion point + 1)
for List and Collections
1. Reverse:
Collections.reverse(list);
2. Shuffle:
Collections.shuffle(list);
3.Copy:
Collections.copy(dest, src);
copy all the elements from a source list to a destination list on the same index(注意是同索引!!!其它元素不变!!!)
The copymethod performs a shallow copy: only the references of the elements from the source list are copied
4. Arrays.asList()
provides the static asList method for creating a list from a variable-length list of arguments
java.util.Arrays$ArrayList, which is also called ArrayList but it is just a wrapper for an array
5. nCopies(int n, Object o)
创建由指定对象的n个副本组成的不可变列表
由 nCopies method创建的列表是不可变的,所以你不能在列表中添加/删除元素——所有的元素都有相同的引用!
List<String> list = Collections.nCopies(5, "A"); //list={"A", "A", "A", "A", "A"} // 尝试修改列表会抛出异常 list.add("B"); // UnsupportedOperationException
6. fill(List list, Object o)
Arrays.asList("red","green","blue"); 、
Collections.fill(list, "black"); // [black, black, black]
7. max & min
find the maximum and minimum elements in a collection
Collections.max(collection)
Collections.min(collection)
8.disjoint(collection1, collection2)
returns trueif the two collections have no elements in common
9. frequency(collection, element)
method finds the number of occurrences of the elementin the collection
Vector
这些类被重新设计以适应Java Collections Framework,但为了兼容性保留了它们的旧风格方法
vector与ArrayList相同,除了包含访问和修改vector的synchronizedmethods
- 同步:
Vector
是同步的,这意味着它是线程安全的。多个线程可以同时访问一个Vector
实例而不会引发并发问题。这是Vector
与ArrayList
的一个主要区别,ArrayList
不是同步的。 - 动态调整大小:与数组不同,
Vector
的大小是可动态调整的。当需要时,它会自动扩展以适应添加的新元素。 - 实现接口:
Vector
实现了List
、RandomAccess
、Cloneable
和Serializable
接口,这使得它具有列表的特性,可以快速随机访问,可以克隆和序列化。
For many applications that do not require synchronization, using ArrayListis more efficient than using Vector
Stack
The Stackclass represents a last-in-first-out stack of objects
元素只能从堆栈顶部访问
您可以使用push(o:E)插入,使用peek()检索和使用pop()删除(都来自堆栈顶部)
堆栈作为Vector的扩展实现
从Java 2保留的方法:empty()方法与isEmpty()相同
Queues and Priority Queues
A queue is a first-in/first-out data structure: Elements are appended to the end of the queue and are removed from the beginning of the queue
The offer(o:E) method is used to add an element to the queue
This method is similar to the addmethod in the Collection interface, but the offer method is preferred for queues
The poll and remove methods are similar, except that poll() returns null if the queue is empty, whereas remove()throws an exception
The peek and element methods are similar, except that peek() returns null if the queue is empty, whereas element()throws an exception
Queue interface extends java.util.Collection with additional insertion, extraction, and inspection operations
Using LinkedList for Queue
The LinkedListclass implements the Dequeinterface, which extends the Queueinterface
Deque(“double-ended queue”) interface supports element insertion and removal at both ends
The Deque interface extends Queue with additional methods for inserting and removing elements from both ends of the queue
The methods addFirst(e), removeFirst(), addLast(e), removeLast(), getFirst(), and getLast()are defined in the Deque interface
LinkedList is the concrete class for queue and it supports inserting and removing elements from both ends of a list
priority queue
In a priority queue, elements are assigned priorities
When accessing elements, the element with the highest priority is removed first
在优先级队列中,元素被分配优先级。访问元素时,优先级最高的元素首先被删除
To use stacks to write a program to evaluate expressions**
W7 - Sets and Maps
Set
The concrete classes that implement Set must ensure that no duplicate elements can be added to the set
● HastSet & LinkedHashSet use: hashcode() + equals()
● TreeSet use: compareTo() or Comparable
HashSet
Ordering: It does not guarantee any order of iteration.
Internal Structure: Backed by a hash table.
creation
Set<String> set = new HashSet<>();
● The first diamond operation ("<>") is called a type parameter or generic type. It specifies the type of elements that the HashSet will store. In this case, the HashSet is specified to store objects of type Integer.
● In the 2nd diamond operation ("<>"),the compiler infers the generic type from the context, which is typically the same as the type specified in the first diamond operator (Just in simple cases)
● The parentheses ("()") is used for calling the constructor of the HashSet class. In this case, it is calling the no-argument constructor of the HashSet class, which creates an empty set.
2)The HashSet class is concrete class 具体类 that implements Set
3) 3 and 4. You can an empty HashSet with the specified initial capacity only (case 3) or initial capacity plus loadFactor (case 4)
初始容量InitialCapacity和负载因子loadFactor
- 初始容量:
HashSet
在创建时可以指定初始容量,即哈希表中桶的数量。默认初始容量是16。 - 负载因子:负载因子是一个介于0和1之间的数值,用于衡量集合在自动增加其容量之前可以填充的程度。默认负载因子是0.75,这意味着当哈希表中的元素数量达到桶数量的75%时,哈希表将会进行扩容。
method
接口(如Collection)和抽象类(如AbstractSet)将由 concrete classes(如HashSet、LinkedHashSet、TreeSet)实现/扩展。因此,所有声明的方法(例如,add(), remove()等)都可以在set实例中调用。
1)add()
unordered:ecause of hashing, W13
no-duplicated:
● A hash set use hashcode() and equals() to check duplication
● These 2 methods are "built-in" in the String object, as they are part of the standard String class in Java (Same as other objects from the standard Java library like Integer)
遍历
iterable,但是可以用for循环
Way 1: Enhanced for loop
Way 2: forEach()
Linked HashSet
Ordering: Maintains a doubly-linked list running through all its entries, which defines the iteration ordering, which is normally the order in which elements were inserted into the set (insertion-order). 维护一个遍历其所有条目的双链表,它定义了迭代排序,这通常是元素插入集合的顺序(插入顺序)。
Internal Structure: Backed by a hash table with a linked list running through it
● LinkedHashSet extends HashSet with a linked list implementation that supports an ordering of the elements in the set
Significant Difference: The elements in a LinkedHashSet can be retrieved in the order in which they were inserted into the set
TreeSet
Ordering: Guarantees that elements will be sorted in ascending element order, according to the natural ordering of the elements, or by a Comparator provided at set creation time.
Internal Structure: Backed by a tree
● TreeSet is a concrete class that implements the SortedSet and NavigableSet interfaces
SortedSet is a sub-interface of Set,which guarantees that the elements in the set are sorted
NavigableSet extends SortedSet to provide navigation methods (e.g., lower(e), floor(e),etc)
creation
● Empty tree set without arguement, which will be sorted in ascending order according to the natural ordering of its elements. (Due to the implementation of SortedSet interface)
TreeSet<String> treeSet = new TreeSet<>();
● Tree set with other collections, being sorted by the natural ordering.
TreeSet<String> treeSet = new TreeSet<>(list)
● Tree set with customized comparator,where we can define the orders
TreeSet<String> treeSet = new TreeSet<>(Comparator.reverseOrder())
● Tree set with the same elements and the same ordering as the specified sorted set
// If we already have a set sorted according to a specific rule
SortedSet<String> originalSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
// We take this way to create another tree set with same elements and same ordering TreeSet<String> copiedSet = new TreeSet<>(originalSet);
Method
1. add()
不能重复:不是hashcode()和equals(),这是因为Java标准库中的String和Integer以及其他包装器类中的“内置”compareTo()
差异是由于底部的数据结构,哈希集->哈希(W13),树集->树(11&12)。
hash set -> Hashing (W13), tree set -> Tree (11&12).
first(): return the first element in the set
last(): return the last element in the set
headSet(): find the elements that are less than or equal to the given toElement (i.e., “New York”)
tailSet(): find the elements that are equal to or bigger than the given toElement (i.e., “New York”)
lower(): Returns the greatest element in this set strictly less than the given element (4)
higher(): Returns the least element in this set strictly greater than the given element (?)
floor(): Returns the greatest element in this set less than or equal to the given element (5)
ceiling(): Returns the least element in this set greater than or equal to the given element (?)
pollFirst(): Retrieves and removes the first (lowest) element
pollLast(): Retrieves and removes the last (highest) element
Set vs List
Sets are more efficient than lists for storing non duplicate elements,无索引,通过for each遍历
Lists are useful for accessing elements through the index
Map
A map is a container object that stores a collection of key/value pairs.
It enables fast retrieval, deletion, and updating of the pair through the key. A map stores the values along with the keys.
● In List, the indexes are integers. In Map, the keys can be any objects.
A map cannot contain duplicate keys.
Each key maps to one value.
The HashMap, LinkedHashMap, and TreeMap classes are three concrete implementations of the Map interface, with TreeMap additionally implements SortedMap and NavigableMap
they ensure the map instances non-duplicated using hashcode() and equals()(for HashMap and LinkedHashMap) as well as the compareTo()/Comparator (for TreeMap).
create
m: Map<? extends K,? extends V>
它表示构造函数接受Map <X,Y>(即smallMap),其中X是K的子类,Y是V的子类(见图)。It would also work when X is exactly K, and Y is exactly V
e.g
Map<Integer, String> smallMap = new HashMap<>();
Map<Integer, String> largerMap = new HashMap<>(smallMap);
通常,与LinkedHashSet类似,LinkedHashMap扩展了HashMap的链表实现,支持按insertion order 检索元素。
In LinkedHashMap, there is a constructor arguement (initialCapacity: int, loadFactor: float, accessOrder: boolean)
Once it is set to be LinkedHashMap(initialCapacity, loadFactor, true), the created LinkedHashMap would allow us to retreive elements in the order in which they were last accessed 按照它们最后一次被访问的顺序, from least recently to most recently accessed (access order).
Method
• A hash map is unordered, similar to a hash set
• A tree map is ordered by the keys of the involved elements (alphabetically in this case)
• A linked hash map can be ordered by the insertion order, and by the access order (accessOrder: True)
get(): Returns the value to which the specified key is mapped
forEach():Performs the given action for each entry in this map until all entries have been processed or the action throws an exception.
W8-Developing Efficient Algorithms
To estimate algorithm efficiency using the Big O notation
To explain growth rates and why constants and non-dominating terms can be ignored
To determine the complexity of various types of algorithms
To analyze the binary search algorithm
To analyze the selection sort algorithm
To analyze the insertion sort algorithm
To analyze the Tower of Hanoi algorithm
To describe common growth functions (constant, logarithmic, linear, log-linear, quadratic, cubic (polynomial), exponential)
To design efficient for finding Fibonacci numbers using dynamic programming
To design efficient algorithms for finding the closest pair of points using the divide-and-conquer approach
Big O notation
used to measure the execution time (named time complexity)
Space complexity measures the amount of memory space used by an algorithm, can also be measured using Big-O(然而我们讲座中大多数都是O(n))
complexity of various types of algorithms
Simple Loops O(n)
Nested Loops O(n^2)
Sequence O(n)
Selection O(n)
Logarithmic time O(log n)
Binary search O(logn)
Selection Sort O(n^2)
InsertionSort O(n^2)
Tower of Hanoi O(2^n)
Fibonacci Numbers O(2^n) vs O(n)
common growth functions
Algorithm Design
1. Problem definition
2. Development of a model
3. Specification of the algorithm
4. Designing an algorithm
5. Checking the correctness of the algorithm
6. Analysis of algorithm
7. Implementation of algorithm
8. Program testing
9. Documentation preparation
Techniques
Brute-force or exhaustive search 穷举
the naive method of trying every possible solution to see which is best.
Divide and conquer 分而治之
repeatedly reduces an instance of a problem to one or more smaller instances of the same problem (usually recursively) until the instances are small enough to solve easily.
divides the problem into subproblems, solves the subproblems, then combines the solutions of subproblems to obtain the solution for the entire problem.
don't overlap
e.gClosest Pair of Points
Dynamic programming 动态规划
hen the same subproblems are used to solve many different problem instances, dynamic programming avoids recomputing solutions that have already been computed.
The key idea behind dynamic programming is to solve each subprogram only once and store the results for subproblems for later use to avoid redundant computing of the subproblems
Greedy algorithms 贪心算法
follow the problem-solving heuristic of making the locally optimal choice at each stage.
Backtracking 回溯法
multiple solutions are built incrementally and abandoned when it is determined that they cannot lead to a valid full solution.
e.g 八皇后问题
To design efficient for finding Fibonacci numbers using dynamic programming
To design efficient algorithms for finding the closest pair of points using the divide-and-conquer approach
W9 - Sorting
To study and analyze time complexity of various sorting algorithms
To design, implement, and analyze bubble sort
To design, implement, and analyze merge sort
To design, implement, and analyze quick sort
To design and implement a binary heap
To design, implement, and analyze heap sort
To design, implement, and analyze insertion sort
bubble sort
Since the number of comparisons is n - 1 in the first pass, the best-case time for a bubble sort is O(n). worst case O(n^2)
The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted 优化:如果一次排序中没有发生交换,则不需要进行下一次排序了
merge sort
divide and conquer algorithm
quick sort
基本思想:
选定pivot中心轴
将大于pivot的数字放在pivot的右边
将小于pivot的数字放在pivot的左边
分别堆左右子序列重复前三步操作
base case:
worst case: O(n^2)
Divides/partitions
binary heap
h = log n
add
O(n log n)
remove
O(n log n)
Heap Sort Time: O(n log n)
heap sort
Insertion Sort
The insertion sort algorithm sorts a list of values by repeatedly inserting an unsorted element into a sorted sublist until the whole list is sorted.
W10-1 - Graphs and Applications
To model real-world problems using graphs
Explain the Seven Bridges of Königsberg problem
To describe the graph terminologies: vertices, edges, directed/undirected, weighted/unweighted, connected graphs, loops, parallel edges, simple graphs, cycles, subgraphs and spanning tree
To represent vertices and edges using edge arrays, edge objects, adjacency matrices, adjacency vertices list and adjacency edge lists
To model graphs using the Graphinterface, the AbstractGraph class, and the UnweightedGraphclass
To represent the traversal of a graph using the
AbstractGraph.Tree
To design and implement depth-first search
To solve the connected-component problem using depth-first search
To design and implement breadth-first search
graph terminologies
A graph G = (V, E),where Vrepresents a set of vertices (or nodes) and Erepresents a set of edges (or links). A graph may be undirected (i.e., if (x,y) is in E, then (y,x) is also in E) or directed
Adjacent Vertices:
Two vertices in a graph are said to be adjacent (or neighbors) if they are connected by an edge.An edge in a graph that joins two vertices is said to be incident to both vertices
Degree:
The degree of a vertex is the number of edges incident to it
Complete graph:
every two pairs of vertices are connected
Incomplete graph:
parallel edge
If two vertices are connected by two or more edges, these edges are called parallel edges
loop:
is an edge that links a vertex to itself
simple graph:
is one that has doesn’t have any parallel edges or loops
Connected graph:
A graph is connected if there exists a path between any two vertices in the graph
任意一对节点之间都存在至少一条路径的图。这意味着从图中的任何一个节点出发,可以通过边到达图中的任何其他节点。
Tree:
A connected graph is a tree if it does not have cycles
closed path:
a path where all vertices have 2 edges incident to them
Cycle:
a closed path that starts from a vertex and ends at the same vertex
cycle vs closed path
- 节点重复:在环中,节点不能重复出现(除了起点和终点重合),而在闭路径中,节点可以重复访问,但必须通过不同的边。
- 普遍性:环是闭路径的一个特例。所有的环都是闭路径,但并非所有的闭路径都是环。
subgraph:
A subgraph of a graph G is a graph whose vertex set is a subset of that of G and whose edge set is a subset of that of G
Spanning Tree
A spanning tree of a graph G is a connected subgraph of G and the subgraph is a tree that contains all vertices in G
Representing Graphs
Representing Vertices
Representing Edges: Edge Array
Representing Edges: Edge Objects
(感觉上面的用不太到)
Representing Edges: Adjacency Matrices
int[][] adjacencyMatrix = {}把表列出来
Representing Edges: Adjacency Lists
List<List<Integer>> neighbors = = new ArrayList();
这里细节就看ppt了
Modeling Graphs
The Graph interface defines the common operations for a graph
An abstract class named AbstractGraphthat partially implements the Graphinterface
traversal of a graph
DFS
BFS
DFS Application
1) Detecting whether a graph is connected
Search the graph starting from any vertex
If the number of vertices searched is the same as the number of vertices in the graph, the graph is connected. Otherwise, the graph is not connected.
2) Detecting whether there is a path between two vertices AND find it (not the shortest)
Search the graph starting from one of the 2 vertexes
Check if the second vertex is reached by DFS
3) Finding all connected components:找出图中的所有连通分量
A connected component is a maximal connected subgraph in which every pair of vertices are connected by a path
Label all vertexes as unreached
Repeat until no vertex is unreached
Start from any unreached vertex and compute DFS
(marking all reached vertexes, including the first vertex, as reached) -> this DFS is one connected component
BFS Application
1) Detecting whether a graph is connected (i.e., if there is a path between any two vertices in the graph)
check is the size of the spanning tree is the same with the number of vertices
2)Detecting whether there is a path between two vertices
Compute the BFS from the first vertex and check if the second vertex is reached
3)Finding a shortest path between two vertices
We can prove that the path between the root and any node in the BFS tree is the shortest path between the root and that node
4) Finding all connected components
5) Detect whether there is a cycle in the graph by modifying BFS (if a node was seen before, then there is a cycle - you can also extract the cycle)
6) Testing whether a graph is bipartite(二部图)(顶点染色问题)
一个图是二部图(Bipartite Graph),当且仅当其顶点集可以被划分为两个不相交的子集 U 和 V,使得图中的每一条边都连接一个来自 U 的顶点和一个来自 V 的顶点。换句话说,二部图的顶点可以被着色为两种颜色,使得每条边的两个端点的颜色不同。
A graph is bipartite if the vertices of the graph can be divided into two disjoint sets such that no edges exist between vertices in the same set
A graph is bipartite graph if and only if it is 2-colorable.
While doing BFS traversal, each node in the BFS tree is given the opposite color to its parent.
If there exists an edge connecting current vertex to a previously-colored vertex with the same color, then we can safely conclude that the graph is NOT bipartite.
If the graph is bipartite, then one partition is the union of all odd number stratas, while another is the union of the even number stratas
W10-2 Weighted Graphs and Applications
To represent weighted edges using adjacency matrices and adjacency lists
To model weighted graphs using the WeightedGrap hclass that extends the AbstractGraph class
To design and implement the algorithm for finding a minimum spanning tree
To define the MST class that extends the Tree class
To design and implement the algorithm for finding single-source shortest paths
To define the ShortestPathTreeclass that extends the Tree class
A graph is a weighted graph if each edge is assigned a weight
Representing Weighted Graphs
Edge Array
Weighted Adjacency Matrices
Adjacency Lists
Modeling Weighted Graphs
finding a minimum spanning tree
A minimum spanning tree is a spanning tree (i.e., a subset of the edges of a connected undirected graph that connects all the vertices together without any cycles) with the minimum total weight
e.g create wired Internet lines
sol: Prim’s Algorithm
MST class that extends the Tree class
finding single-source shortest paths
Find a shortest path between two vertices in the graph
The shortest path between two vertices is a path with the minimum total weight
e.g Dijkstra,
ShortestPathTree class that extends the Tree class
W11 Binary Search Trees
To design and implement a binary search tree
To represent binary trees using linked data structures
To insert an element into a binary search tree
To search an element in binary search tree
To traverse elements in a binary tree
To delete elements from a binary search tree
To create iterators for traversing a binary tree
To implement Huffman coding for compressing data using a binary tree
A binary tree is a hierarchical structure: it is either empty or consists of an element, called the root, and two distinct binary trees, called the left subtree and right subtree
Representing
Binary Trees
linked nodes
Non-Binary Trees
Binary Search Tree
A special type of binary trees, called binary search tree is a binary tree with no duplicate elements and the property that for every node in the tree the value of any node in its left subtree is less than the value of the node and the value of any node in its right subtree is greater than the value of the node(左子树上所有节点的值都小于根节点的值。右子树上所有节点的值都大于根节点的值。)
Inserting
看ppt
Traversal
process of visiting each node in the tree exactly once
方法:
preorder, inorder, postorder, depth-first, breadth-first
Tree Interface
Delete
看ppt
Complexity
inorder, preorder, and postorder traversals is O(n)
The time complexity for search, insertion and deletion is the height of the tree
In the worst case, the height of the tree is O(n)
Iterator
The iterator method for Treereturns an instance of InorderIterator
The InorderIterator constructor invokes the inorder method that stores all the elements from the tree in a list
The hasNext() method checks whether currentis still in the range of list.
Invoking the next() method returns the current element and moves currentto point to the next element in the list.
The remove() method removes the current element from the tree and a new list is created -current does not need to be changed.
Huffman coding implementation
补代码!
Huffman 编码是一种用于数据压缩的算法,由大卫·霍夫曼(David A. Huffman)于1952年提出。它利用不同字符的出现频率来构建可变长度的编码,以便将出现频率较高的字符用较短的编码表示,从而达到压缩数据的目的。
这个算法的核心思想是构建一颗 Huffman 树(Huffman Tree),通过该树来生成每个字符的编码。
W12 AVL Trees
To know what an AVL tree is
To understand how to balance a tree using the LL rotation, LR rotation, RR rotation, and RL rotation
To knowhow to design the AVLTreeclass
To insert elements into an AVL tree
To implement node rebalancing
To delete elements from an AVL tree
To implement and test the AVLTreeclass
To analyze the complexity of search, insert, and delete operations in AVL trees
Complexity
The search, insertion, and deletion time for a binary search tree is dependent on the height of the tree
In the worst case, the height is O(n), so worse time complexity is O(n)
If a tree is perfectly balanced, i.e., a complete binary tree, its height is log n,So, search,insertion, and deletion time would be O(log n)
balance
you may have to rebalance the tree after an insertion or deletion operation
The balance factor of a node is the height of its right subtree minus the height of its left subtree
A node is said to be balanced if its balance factor is - 1, 0, or 1
A node is said to be left-heavy if its balance factor is -1
A node is said to be right-heavy if its balance factor is +1
LL rotation (left-heavy left-heavy rotation)
RR rotation (right-heavy right-heavy rotation)
LR rotation (left-heavy right-heavy rotation)
RL rotation (right-heavy left-heavy rotation)
design the AVL Tree class
AnAVL isa binary search tree, so we can define the AVLTreeclass to extend the BSTclass
W12 Hashing
To understand what hashing is and what hashing is used for
To obtain the hash code for an object and design the hash function to map a key to an index
To handle collisions using open addressing
To know the differences among linear probing, quadratic probing, and double hashing
To handle collisions using separate chaining
To understand the load factor and the need for rehashing
To implement MyHashMapusing hashing
Why hashing
A map (dictionary, hash table, or associative array) is a data structure that stores entries, where each entry contains two parts: key (also called a search key) and value
O(1) time to search, insert, and delete an element in a set or amap vs. O(log n) time in a well-balanced search tree
implementation
To implement a map, can we store the values in an array and use the key as the index to find/set the value?
The answer is yes if you can map the key to an index
the array that stores the values is called a hash table
The function that maps a key to an index in the hash table is called a hash function
Hashing is the technique that puts and retrieves the value using the index obtained from key in/from the hash table
Hash Function and hash codes
A typical hash function first converts a search key to an integer value called a hash code, then compresses the hash code into an index to the hash table
(Java’s root class Objecthas a hashCodemethod, which returns an integer hash
code -- by default, the method returns the memory address for the object (this is the decimal value of what toStringprints in hex))
You should override the hashCode method whenever the equals method is overridden to ensure that two equal objects return the same hash code
Two unequal objects may have the same hash code (called a collision)
Hash Codes for Primitive Types
具体的看PPT哈
The types byte, short, char and int are simply casted into int
float → int hashCode = Float.floatToIntBits(value);
long → int hashCode = (int)(value^(value>>32));
double → long bits = Double.doubleToLongBits(value);
int hashCode = (int)(bits ^ (bits >> 32));
String → int hashCode = 0;
for (int i = 0; i < length; i++)
{ hashCode = 31 * hashCode + charAt(i); }
Compressing Hash Codes
The hash code for a key can be a large integer that is out of the range for the hash-table index 当一个键的hash code大于hash table index怎么办呢?
Assume the index for a hash table is between 0 and N-1 -- the most common way to scale an integer is h(hashCode)=hashCode % N
N is the number of buckets in a hash table, crucial for determining the index range in which entries are placed into. It can be fixed or variable, depending on the design and requirements of the hash table.
Handling Collisions
A hashing collision (or collision) occurs when two different keys are mapped to the same index in a hash table
There are two ways for handling collisions: open addressing and separate chaining
Open addressing
the process of finding an open location in the hash table in the event of a collision
Open addressing has several variations: linear probing, quadratic probing and double hashing
Linear probing
When a collision occurs during the insertion of an entry to a hash table, linear probing finds the next available location sequentially
If a collision occurs at hashTable[key % N], check whether hashTable[(key+1) % N]is available,If not, check hashTable[(key+2) % N]and soon, until an available cell is found
1) To remove an entry from the hash table, search the entry that matches the key,If the entry is found, place a special marker marked to denote that the entry is deleted, but available for insertion of other values
Each cell in the hash table has three possible states: occupied, marked, or empty ,a marked cell is also available for insertion
2) Linear probing tends to cause groups of consecutive cells in the hash table to be occupied -- each group is called a cluster
Each cluster is actually a probe sequence that you must search when retrieving, adding, or removing an entry
Disadvantage: As clusters grow in size, they may merge into even larger clusters, further slowing down the search time
Quadratic probing
Quadratic probing looks at the cells at indices (key+j^2)%N for j>=0, that is, key%N, (key+1)%N, (key+4)%N, (key+9)%N, (key+16)%N, and soon
\
works in the same way as linear probing for retrieval, insertion and deletion, except for the change in the search sequence
Quadratic probing can avoid the clustering problem in linear probing for consecutive keys
It still has its own clustering problem, called secondary clustering for keys that collide with the occupied entry using the same probe sequence
Linear probing guarantees that an available cell can be found for insertion as long as the table is not full, there is no such guarantee for quadratic probing (it can have a cycle that skips the same positions)
double hasing
详情请看ppt
- 第一次哈希:用第一个哈希函数
h1(key)
计算初始索引。 - 第二次哈希:用第二个哈希函数
h2(key)
计算步长。
如果第一次哈希产生的索引已经被占用,就使用步长 h2(key)
来计算新的索引:(h1(key) + i * h2(key)) % N
,其中 i
是尝试次数,N 是哈希表的大小。
双重散列确保每个键有独特的探测序列,从而减少冲突,提高哈希表的性能。
Separate chaining
The separate chaining scheme places all entries with the same hash index into the same location, rather than finding new locations
places all entries with the same hash index into the same location in a list
Each location in the separate chaining scheme is called a bucket
A bucket is a container that holds multiple entries:
Collision in java.util.HashMapis possible because hash function uses hashCode() of key object which doesn't guarantee different hash codes for different objects.
<JDK8 : Linked list
>JDK8 : balanced tree
Load Factor
Load factor λ (lambda) is the ratio of the number of elements to the size of the hash table
λ =n/N
where n denotes the number of elements and N the number of locations in the hash table
For the open addressing scheme, λ is between 0 and 1
If the hash table is empty, then λ =0 If the hash table is full, then λ =1
For the separate chaining scheme, λ can be any value
Rehashing
As λ increases, the probability of a collision increases
Because the load factor measures how full a hash table is
You should maintain the load factor under 0.5for the open addressing schemes and under 0.9for the separate chaining scheme(In the implementation of the java.util.HashMap class in the Java API, the threshold 0.75is used)
If the load factor is exceeded, we increase the hash-table size and reload the entries into a new larger hash table -> this is called rehashing
We need to change the hash functions, since the hash-table size has been changed, To reduce the likelihood of rehashing, since it is costly, you should at least double the hash-table size