Register here: http://gg.gg/v1usc
–Probing –examining slots in the table. Number of probes = number of slots examined. –h(k,i,M) where i gives the number of probes/attempts: i=0,1,2, until successful hash. –Linear probing: h(k,i,M) = (h 1 (k) + i)% M,. If the slot where the key is hashed is taken, use the next available slot and wrap around the table. Given a hash table with n keys and m slots, with the simple uniform hashing assumption (each key is equally likely to be hashed into each slot). Collisions are resolved by chaining. (a) What is the probability that the first slot ends up empty? (b) What is the expected number of slots that end up not being empty?
*Number Of Slots In Hash Table Search
*Number Of Slots In Hash Tables
*Number Of Slots In Hash Tableau
In computer science, consistent hashing[1][2] is a special kind of hashing such that when a hash table is resized, only n/m{displaystyle n/m} keys need to be remapped on average where n{displaystyle n} is the number of keys and m{displaystyle m} is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation. Consistent hashing is a particular case of rendezvous hashing, which has a conceptually simpler algorithm, and was first described in 1996. Consistent hashing first appeared in 1997, and uses a different algorithm.[1]History[edit]
The term ’consistent hashing’ was introduced by David Kargeret al. at MIT for use in distributed caching. This academic paper from 1997 introduced the term ’consistent hashing’ as a way of distributing requests among a changing population of web servers. Each slot is then represented by a server in a distributed system. The addition of a server and the removal of server (say, due to failure) requires only num_keys/num_slots{displaystyle num_keys/num_slots} items to be re-shuffled when the number of slots (i.e., servers) change. The authors mention linear hashing and its ability to handle sequential server addition and removal, while consistent hashing allows servers to be added and removed in arbitrary order.[1]
Teradata used this technique in their distributed database, released in 1986, although they did not use this term. Teradata still uses the concept of a hash table to fulfill exactly this purpose. Akamai Technologies was founded in 1998 by the scientists Daniel Lewin and F. Thomson Leighton (co-authors of the article coining ’consistent hashing’). In Akamai’s content delivery network,[3] consistent hashing is used to balance the load within a cluster of servers, while a stable marriage algorithm is used to balance load across clusters.[2]
Consistent hashing has also been used to reduce the impact of partial system failures in large web applications to provide robust caching without incurring the system-wide fallout of a failure.[4] Consistent hashing is also the cornerstone of distributed hash tables (DHTs), which employ hash values to partition a keyspace across a distributed set of nodes, then construct an overlay network of connected nodes that provide efficient node retrieval by key. Rendezvous hashing, designed in 1996, is a simpler and more general technique. It achieves the goals of consistent hashing using the very different highest random weight (HRW) algorithm.Basic Technique[edit]
Consider the problem of load balancing where a set of objects (say, web pages or video segments) need to be assigned to a set of n{displaystyle n} servers. One way of distributing objects evenly across the n{displaystyle n} servers is to use a standard hash function and place object o{displaystyle o} in server with id hash(o)(mod n){displaystyle {text{hash}}(o);left({text{mod }}nright)}, However, if a server is added or removed (i.e., n{displaystyle n} changes), the server assignment of nearly every object in the system may change. This is problematic since servers often go up or down and each such event would require nearly all objects to be reassigned and moved to new servers. Consistent hashing first maps both objects and servers to the unit circle. An object is then mapped to the next server that appears on the circle in clockwise order.[2]
Consistent hashing was designed to avoid the problem of having to change the server assignment of every object when a server is added or removed. The main idea is to use a hash function to randomly map both the objects and the servers to a unit circle. Each object is then assigned to the next server that appears on the circle in clockwise order. This provides an even distribution of objects to servers. But, more importantly, if a server fails and is removed from the circle, only the objects that were mapped to the failed server need to be reassigned to the next server in clockwise order. Likewise, if a new server is added, it is added to the unit circle, and only the objects mapped to that server need to be reassigned. Importantly, when a server is added or removed, the vast majority of the objects maintain their prior server assignments.Practical Extensions[edit]
A number of extensions to the basic technique are needed for effectively using consistent hashing for load balancing in practice.[2] In the basic scheme above, if a server fails, all its objects are reassigned to the next server in clockwise order, potentially doubling the load of that server. This may not be desirable. To ensure a more even re-distribution objects on server failure, each server can be hashed to multiple locations on the unit circle.[2] When a server fails, the objects assigned to each of its replicas on the unit circle will get reassigned to a different server in clockwise order, thus redistributing the objects more evenly. Another extension concerns a flash crowd situation where a single object gets ’hot’ and is accessed a large number of times and will have to be hosted in multiple servers. In this situation, the object may be assigned to multiple contiguous servers by traversing the unit circle in clockwise order.[2] A more complex practical consideration arises when two objects that are hashed near each other in the unit circle and both get ’hot’ at the same time. In this case, both objects will use the same set of contiguous servers in the unit circle. This situation can be ameliorated by each object choosing a different hash function for mapping servers to the unit circle.[2]Comparison with Rendezvous Hashing and other alternatives[edit]
Rendezvous hashing, designed in 1996, is a simpler and more general technique, and permits fully distributed agreement on a set of k{displaystyle k} options out of a possible set of n{displaystyle n} options. It can in fact be shown that consistent hashing is a special case of rendezvous hashing. Because of its simplicity and generality, Rendezvous Hashing is now being used in place of Consistent Hashing in many applications.
If key values will always increase monotonically, an alternative approach using a hash table with monotonic keys may be more suitable than consistent hashing.[citation needed]Complexity[edit]Asymptotic time complexities for N{displaystyle N} nodes (or slots) and K{displaystyle K} keysClassic hash tableConsistent hashingadd a nodeO(K){displaystyle O(K)}O(K/N+log⁡N){displaystyle O(K/N+log N)}remove a nodeO(K){displaystyle O(K)}O(K/N+log⁡N){displaystyle O(K/N+log N)}add a keyO(1){displaystyle O(1)}O(log⁡N){displaystyle O(log N)}remove a keyO(1){displaystyle O(1)}O(log⁡N){displaystyle O(log N)}
The O(K/N){displaystyle O(K/N)} is an average cost for redistribution of keys and the O(log⁡N){displaystyle O(log N)} complexity for consistent hashing comes from the fact that a binary search among nodes angles is required to find the next node on the ring.[citation needed]Examples[edit]
Known examples of consistent hashing use include:
*Couchbase automated data partitioning [5]
*OpenStack’s Object Storage Service Swift[6]
*Partitioning component of Amazon’s storage system Dynamo[7]
*Data partitioning in Apache Cassandra[8]
*Data partitioning in Voldemort[9]
*Akka’s consistent hashing router[10]
*Riak, a distributed key-value database[11]
*Gluster, a network-attached storage file system[12]
*Akamai content delivery network[13]
*Discord chat application[14]
*Maglev network load balancer[15]
*Data partitioning in Azure Cosmos DBReferences[edit]
*^ abcKarger, D.; Lehman, E.; Leighton, T.; Panigrahy, R.; Levine, M.; Lewin, D. (1997). Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing. ACM Press New York, NY, USA. pp. 654–663. doi:10.1145/258533.258660.
*^ abcdefgBruce Maggs and Ramesh Sitaraman (2015). ’Algorithmic nuggets in content delivery’(PDF). ACM SIGCOMM Computer Communication Review. 45 (3).
*^Nygren., E.; Sitaraman R. K.; Sun, J. (2010). ’The Akamai Network: A Platform for High-Performance Internet Applications’(PDF). ACM SIGOPS Operating Systems Review. 44 (3): 2–19. doi:10.1145/1842733.1842736. S2CID207181702. Archived(PDF) from the original on September 13, 2012. Retrieved November 19, 2012.
*^Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). ’Web Caching with Consistent Hashing’. Computer Networks. 31 (11): 1203–1213. doi:10.1016/S1389-1286(99)00055-9. Archived from the original on 2008-07-21. Retrieved 2008-02-05.
*^’What Exactly Is Membase?’. Retrieved 2020-10-29.
*^Holt, Greg (February 2011). ’Building a Consistent Hashing Ring’. openstack.org. Retrieved 2019-11-17.
*^DeCandia, G.; Hastorun, D.; Jampani, M.; Kakulapati, G.; Lakshman, A.; Pilchin, A.; Sivasubramanian, S.; Vosshall, P.; Vogels, Werner (2007). ’Dynamo: Amazon’s Highly Available Key-Value Store’(PDF). Proceedings of the 21st ACM Symposium on Operating Systems Principles. 41 (6): 205–220. doi:10.1145/1323293.1294281. Retrieved 2018-06-07.
*^Lakshman, Avinash; Malik, Prashant (2010). ’Cassandra: a decentralized structured storage system’. ACM SIGOPS Operating Systems Review. 44 (2): 35–40. doi:10.1145/1773912.1773922.
*^’Design -- Voldemort’. www.project-voldemort.com/. Archived from the original on 9 February 2015. Retrieved 9 February 2015. Consistent hashing is a technique that avoids these problems, and we use it to compute the location of each key on the cluster.
*^’Akka Routing’. akka.io. Retrieved 2019-11-16.
*^’Riak Concepts’. Archived from the original on 2015-09-19. Retrieved 2016-12-06.
*^’GlusterFS Algorithms: Distribution’. gluster.org. 2012-03-01. Retrieved 2019-11-16.
*^Roughgarden, Tim; Valiant, Gregory (2016-03-28). ’Modern Algorithmic Toolbox’(PDF). stanford.edu. Retrieved 2019-11-17.
*^Vishnevskiy, Stanislav (2017-07-06). ’How Discord Scaled Elixir to 5,000,000 Concurrent Users’. Retrieved 2019-11-17.
*^Eisenbud, Daniel E.; Yi, Cheng; Contavalli, Carlo; Smith, Cody; Kononov, Roman; Mann-Hielscher, Eric; Cilingiroglu, Ardas; Cheyney, Bin; Shang, Wentao; Hosein, Jinnah Dylan. ’Maglev: A Fast and Reliable Software Network Load Balancer’(PDF). Retrieved 2019-11-17.External links[edit]
*Implementations in various languages:Retrieved from ’https://en.wikipedia.org/w/index.php?title=Consistent_hashing&oldid=990444416
Before going into Hashing techniques, let us understand certain key terms that are used in hashing.Hash Table
A hash table is an array that stores pointers to data mapping to a given hashed key.
*Hash table uses hash function to compute index of array where a record will be inserted or searched. Basically, there are 2 main components of hash table: Hash function and array.
*A value in hash table is null if no existing key has hash value equal to the index for the entry.
*The average time complexity needed to search for a record in hash table is O(1) under reasonable assumptions.
*The maximum size of the array is according to the amount of data expected to be hashed.
*
On the previous slide, we introduced the notion of hashing, mapping a piece of data such as a string to some kind of a representative integer value. We can then create a map by using this hash as an index into an array of key/value pairs. Such a structure is generally called a hash table or, Hashmap in Java, or dictionary in Python, or unordered_map in C++ ( Sorry C users, you will have to implement your own hashmap ). We saw that using the string length to create the hash, and indexing a simple array, could work in some restricted cases, but is no good generally: for example, we have the problem of collisions (several keys with the same length) and wasted space if a few keys are vastly larger than the majority.Buckets
Now, we can solve the problem of collisions by having an array of (references to) linked lists rather than simply an array of keys/values. Each little list is generally called a bucket.
Then, we can solve the problem of having an array that is too large simply by taking the hash code modulo a certain array size3. So for example, if the array were 32 positions in size (going from 0-31 ), rather than storing a key/value pair in the list at position 33, we store it at position (33 mod 32) = 1. (In simple terms, we “wrap round” when we reach the end of the array.) So we end up with a structure something like this: Laundromat casino road in bethlehem.
Each node in the linked lists stores a pairing of a key with a value. Now, to look for the mapping for, say, Ireland, we first compute this key’s hash code (in this case, the string length, 7). Then we start traversing the linked list at position 7 in the table. We traverse each node in the list, comparing the key stored in that node with Ireland. When we find a match, we return the value from the pair stored in that node (Dublin). In our example here, we find it on the second comparison. So although we have to do some comparisons, if the list at a given position in the table is fairly short, we’ll still reduce significantly the amount of work we need to do to find a given key / value mapping.
The structure we have just illustrated is essentially the one used by Java’s hash maps and hash sets. However, we generally wouldn’t want to use the string length as the hash code.
*Let us see how we can convert an object into a hash code more effectively by using hash functions in the next section rather than just using string length as the hashing algorithm.Hash Functions
A hash function is a function or algorithm that is used to generate the encrypted or shortened value to any given key. The resultant of hash function is termed as a hash value or simply hash.
Properties of good hash function:Number Of Slots In Hash Table Search
*Should be a one-way algorithm. In simple words, the hash value generated should not be converted back into the original key.
*Should be efficiently computable. In real applications, if our algorithm takes longer time to compute the hash value itself from a hash function, then we lose the purpose of hashing.
*Should uniformly distribute the keys among the available records.
Types of Hash Functions:
*
Index Mapping Method: The most trivial form of hashing technique is called “Index Mapping (or Trivial Hashing)”. Here:
*We consider an array where in the value of each position of the array corresponds to a key in the given list of keys.
*This technique is very effective when the number of keys are reasonably small. It ensures that allocating one position of the array for every possible key is affordable.
*Here, the hash function just takes the input and returns out the same value as output.
*This is effective only because it considers that any retrieval takes only O(1) time.
*But otherwise, this approach is very trivial and inefficient to be used in real life scenarios as it assumes that the values are integers whereas in real life, we might have any datatype of data. Also, even for the case of integers, if the data is large, this is not suitable.
*
Division method:
*Here, for hash functions, we map each key into the slots of hash table by taking the remainder of that key divided by total table size available i.e h(key) = key % table_length => h(key) = key MODULO table_length
*This method is quite fast since it takes help of a single division operation and is most commonly used.
*Things to keep in mind while using division method:
*Certain values of table size is avoided for effective performance. For example: the size of table should not be a power, say p of certain number, say r such that if table_length = rp, then h(key) accomodates p lowest-order bits of key. We need to ensure that hash function we design depends on all the bits of the key unless we are sure that all low-order p-bit patterns are equally likely.
*Research says that the hash function obtained from the division method gives the best results when the size of the table is prime. If table_length is prime and if r is the number of possible character codes on a system such that r % table_length = 1, then h(key) = sum of binary representation of characters in key % table_length.
*
Mid square method:
*Suppose we want to place a record of key 3101 and the size of hash table is 2000.
*Here, firstly the key is squared and then mid part of the resultant number is taken as the index for the hash table. So in our case: key = 3101 => (3101)2 = 3101*3101 = 9616201 i.e.
h(key) = h(3101) = 162 (by taking middle 3 digit from 9616201). Place the record in 162nd index of the hash table.
*
Digit folding method:
*Here, the key is divided into separate parts and by using simple operations these separated parts are combined to produce a hash.
*Consider a record of key 12345678. Divide this into parts say 123, 456, 78. After dividing the parts combine these parts by performing add operation on them. h(key) = h(12345678) = 123 + 456 + 78 = 657
*
Multiplication method:
*
In this method, the hash function implementation is as follows:
*we multiply key key by a real number c that lies in the range 0 < c < 1 and fractional part of their product is extracted.
*Then, this fractional part is multiplied by size of hash table table_size. The floor of the result is considered as the final hash. It is represented asHere, the function floor(x) returns the integer part of the real number x, and fractional(x) yields the fractional part of that number by performing fractional(x) = x – floor(x)
*
The main advantage of this method is that the value of table size (table_size) doesnt matter. Typically, this value is chosen as a power of 2 since this can be easily implemented on most computing systems.Collisions
*The phenomenon where two keys generate same hash value from a given hash function is called as a collision. A good hash function should ensure that the collisions are minimum.
*We will look in details about how to minimise collisions in the next section.Load Factor:Number Of Slots In Hash TablesNumber Of Slots In Hash Tableau
*The loa

https://diarynote.indered.space

コメント

最新の日記 一覧

<<  2025年7月  >>
293012345
6789101112
13141516171819
20212223242526
272829303112

お気に入り日記の更新

テーマ別日記一覧

まだテーマがありません

この日記について

日記内を検索