Piano Guidance
Photo by Juan Pablo Serrano Arenas Pexels Logo Photo: Juan Pablo Serrano Arenas

What happens if 2 keys are same in HashMap?

Whenever two different objects have the same hash code, we call this a collision. A collision is nothing critical, it just means that there is more than one object in a single bucket, so a HashMap lookup has to look again to find the right object.

How many hours should I practice piano per day?
How many hours should I practice piano per day?

Pianists should practice between 30 minutes to 4 hours per day. Beginners will benefit most from shorter practice sessions while advanced pianists...

Read More »
How do you fix a partially broken key?
How do you fix a partially broken key?

There is no way to fix a key that has broken in half so that it functions perfectly and is also not in danger of breaking again in that same spot...

Read More »

In Java, every object has a method hashCode that is simple to understand but still it’s sometimes forgotten or misused. Here are three things to keep in mind to avoid the common pitfalls. An object’s hash code allows algorithms and data structures to put objects into compartments, just like letter types in a printer’s type case. The printer puts all “A” types into the compartment for “A”, and he looks for an “A” only in this one compartment. This simple system lets him find types much faster than searching in an unsorted drawer. That’s also the idea of hash-based collections, such as HashMap and HashSet. In order to make your class work properly with hash-based collections and other algorithms that rely on hash codes, all hashCode implementations must stick to a simple contract.

The hashCode contract

The contract is explained in the hashCode method’s JavaDoc. It can be roughly summarized with this statement: Objects that are equal must have the same hash code within a running process

Please note that this does not imply the following common misconceptions:

Unequal objects must have different hash codes – WRONG!

Objects with the same hash code must be equal – WRONG!

The contract allows for unequal objects to share the same hash code, such as the “A“ and “µ” objects in the sketch above. In math terms, the mapping from objects to hash codes doesn’t have to be injective or even bijective. This is obvious because the number of possible distinct objects is usually bigger than the number of possible hash codes (2). Edit: In an earlier version, I mistakenly stated that the hashCode mapping must be injective, but doesn’t have to be bijective, which is obviously wrong. Thanks Lucian for pointing out this mistake!

This contract directly leads to the first rule:

1. Whenever you implement equals, you MUST also implement hashCode

If you fail to do so, you will end up with broken objects. Why? An object’s hashCode method must take the same fields into account as its equals method. By overriding the equals method, you’re declaring some objects as equal to other objects, but the original hashCode method treats all objects as different. So you will have equal objects with different hash codes. For example, calling contains() on a HashMap will return false, even though the object has been added. How to write a good hashCode function is beyond the scope of this article, it is perfectly explained in Joshua Bloch’s popular book Effective Java, which should not be missing in a Java developer’s bookshelf. [ Need expert advice for your project? Our Developer Support is here to resolve your questions. | Find more tips on how to write clean code on our Software Craftsmanship page. ]

How much does it cost to get a piano appraised?
How much does it cost to get a piano appraised?

These types of reports can generally range in price on average >$150 to >$250 per report when provided by qualified piano technicians within North...

Read More »
Can I learn guitar in 3 years?
Can I learn guitar in 3 years?

You can learn guitar in three months of dedicated practice – if you're able to commit hours every week to practicing and learning new techniques,...

Read More »

To protect yourself, you can also configure Eclipse to detect violations of this rule and display errors for classes that implement equals but not hashCode. Unfortunately, this options is set to “Ignore” by default: Preferences > Java > Compiler > Errors/Warnings, then use the quick filter to search for “hashcode”:

HashCode collisions

As laurent points out, the equalsverifier is a great tool to verify the contract of hashCode and equals. You should consider using it in your unit tests. Whenever two different objects have the same hash code, we call this a collision. A collision is nothing critical, it just means that there is more than one object in a single bucket, so a HashMap lookup has to look again to find the right object. A lot of collisions will degrade the performance of a system, but they won’t lead to incorrect results. But if you mistake the hash code for a unique handle to an object, e.g use it as a key in a Map, then you will sometimes get the wrong object. Because even though collisions are rare, they are inevitable. For example, the Strings "Aa" and "BB" produce the same hashCode: 2112 . Therefore:

2. Never misuse hashCode as a key

You may object that, unlike the printer’s type case, in Java there are 4,294,967,296 compartments (232 possible int values). With 4 billion slots, collisions seem to be extremely unlikely, right?

Turns out that it’s not so unlikely. Here’s the surprising math of collisions: Please imagine 23 random people in a room. How would you estimate the odds of finding two fellows with the same birthday among them? Pretty low, because there are 365 days in a year? In fact, the odds are about 50%! And with 50 people it’s a save bet. This phenomenon is called the Birthday paradox. Transferred to hash codes, this means that with 77,163 different objects, you have a 50/50 chance for a collision – given that you have an ideal hashCode function, that evenly distributes objects over all available buckets.

Example:

The Enron email dataset contains 520,924 emails. Computing the String hash codes of the email contents, I found 50 pairs (and even 2 triples) of different emails with the same hash code. For half a million strings, this is a pretty good result. But the message here is: if you have many data items, collisions will occur. If you were using the hashCode as a key here, you would not immediately notice your mistake. But a few people would get the wrong mail.

HashCodes can change

What is the most loved musical?
What is the most loved musical?

5 Most Popular Broadway Musicals Ever The Phantom of the Opera. Cats. Les Misérables. A Chorus Line. Oh! Calcutta. Sep 17, 2013

Read More »
How do you compliment a woman in Japanese?
How do you compliment a woman in Japanese?

How to compliment in Japanese 素敵 [Suteki] – Fantastic! Beautiful! ... かっこいい [Kakkoii] – Cool! ... かわいい [Kawaii] – Cute! ... すばらしい [Subarashii] –...

Read More »

Finally, there’s one important detail in the hashCode contract that can be quite surprising: hashCode does not guarantee the same result in different executions. Let’s have a look at the JavaDoc: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application. This is uncommon, in fact, some classes in the class library even specify the exact formula they use to calculate hash codes (e.g. String). For these classes, the hash code will always be the same. But while most of the hashCode implementations provide stable values, you must not rely on it. As this article points out, there are Java libraries that actually return different hashCode values in different processes and this tends to confuse people. Google’s Protocol Buffers is an example. Therefore, you should not use the hash code in distributed applications. A remote object may have a different hash code than a local one, even if the two are equal.

3. Do not use hashCode in distributed applications

Moreover, you should be aware that the implementation of a hashCode function may change from one version to another. Therefore your code should not depend on any particular hash code values. For example, your should not use the hash code to persist state. Next time you run the application, the hash codes of the “same” objects may be different. The best advice is probably: don’t use hashCode at all, except when you create hash-based algorithms.

An alternative: SHA1

You may know that cryptographic hash codes such as SHA1 are sometimes used to identify objects (Git does this, for example). Is this also unsafe? No. SHA1 uses 160-bit keys, which makes collisions virtually impossible. Even with a gigantic number of objects, the odds of a collision in this space are far below the odds of a meteor crashing the computer that runs your program. This article has a great overview of collision probabilities. There’s probably more to say about hash codes, but these seem to be the most important things. If there’s anything I’ve missed, I’m happy to hear about it!

Why is a HashSet faster than an array?
Why is a HashSet faster than an array?

HashSet is designed to have expected constant time add , contains and remove operations, meaning that the time won't change much regardless of how...

Read More »
Is C to F sharp a tritone?
Is C to F sharp a tritone?

C to F# is a tritone, because the distance between each of the notes is a whole step (a half step is the shortest distance between two notes, such...

Read More »
What kind of voice does Adele have?
What kind of voice does Adele have?

mezzo-soprano As a mezzo-soprano, Adele's songs sit in a range that suits most listeners, singing along. Adele can mix her chest voice up quite...

Read More »
Can Fur Elise be played on 61 keys?
Can Fur Elise be played on 61 keys?

It's possible to play the first two movements of Fur Elise by Beethoven on a 61 key-keyboard, but the third and final movement will need at least...

Read More »