When you are are newbie in programming objective-C then you might find somethings confusing when you start using strings. Coming from C you where used to using zero terminated C-Strings. Coming from other languages you might be challenged by the fact that there is no implicit type conversion like, for example, in BASIC.
In regular C strings are pointers of type “char *”, meaning that it’s the memory address of a one byte character. The length of a C-String is determined by a binary zero ” at the end of it. Objective-C rarely uses those, instead NSString means the world to us.
The core fundamental to realize first is that you are always dealing with pointers – that is addresses in memory – when using objects (instances of classes). So it simply does not make sense to compare strings with the == operator. Two variables pointing to NSString might or might not actually point to the same instance. (Actually the same was true for C-Strings, because the same text might or might not be contained in different memory regions referenced by char * pointers)
NSString *oneString = @"one"; NSString *twoString = [NSString stringWithFormat:@"%@", oneString]; NSString *threeString = [NSString stringWithCString:"one" encoding:NSUTF8StringEncoding]; NSString *fourString = @"one"; // fourString is optimized to be the same instance as oneString NSLog(@"strings 1: %@, 2: %@, 3: %@, 4: %@", oneString, twoString, threeString, fourString); NSLog(@"address 1: 0x%x, 2: 0x%x, 3: 0x%x, 4: 0x%x", oneString, twoString, threeString, fourString); |
From this example you see some different methods of initializing a string. Actually all of these are constants which get autoreleased when you leave their scope. The first uses the shorthand @, the second the NSString class method stringWithFormat and the third another class method to convert a C-String into an NSString.
The fourth one is of special interest to us. It turns out the compiler optimizes oneString and fourString to point to the very same memory as proven by the address. This is done for performance reasons but in no way are you allowed to rely on this happening. So this is the one example where == would actually work for comparing strings, if you are only comparing constants. But it’s smarter to always deal with strings in a manner that works for all of them.
This brings us to two methods of comparison that I have seen so far. And gotten confused by:
// isEqual expects any object, true if you pass a string that also compares true if ([oneString isEqual:twoString]) { NSLog(@"isEqual really compares hashes %d %d", [oneString hash], [twoString hash]); } if ([oneString isEqualToString:threeString]) { NSLog(@"the normal method of comparing is isEqualToString"); } |
Using the instance method isEqualToString: is the obvious choice you know from RTFM. But sometimes we also see isEqual:, is this legal? Until I knew better I would go and replace all isEqual: with isEqualToString: to be safe. Don’t really know what the shorter one does and also why not use the intended method.
If you dig in the documentation you find that isEqual is part of the NSObject protocol. The default implementation compares the integer value returned from [myObject hash], which every object has. This is ordinarily used when adding objects to container classes but isEqual also determines that two objects are equal if their hashes are equal.
NSString overrides hash such that the same text will have the same hash. Therefore isEqual will work in the majority of cases when comparing NSString. isEqual does not care which class you pass as parameter, the parameter is of type “id”. It can ignore the class because there is no comparison taking place except of hash values. So one might argue that isEqual is potentially faster than having to compare strings character by character every time. Though I have yet to see a case where one would prefer such trickery that might or might not give you a performance benefit over safety and readability of your code.
The same is also true for the mutable version of NSString: NSMutableString. If you modified any string, the hash changes and therefore isEqual continues to work. See the example below to see it working. So we can infer that hash is either modified every time a string is changed or at least once you query it.
NSMutableString *oneString = [NSMutableString stringWithString:@"one"]; NSString *twoString = @"one"; // hashes are same NSLog(@"Hashes: 1:%d 2:%d", [oneString hash], [twoString hash]); [oneString appendString:@"more"]; // modifying mutable string also modifies hash NSLog(@"Hashes: 1:%d 2:%d", [oneString hash], [twoString hash]); // remove appended string again [oneString deleteCharactersInRange:NSMakeRange(3, 4)]; // hashes are same again NSLog(@"Hashes: 1:%d 2:%d", [oneString hash], [twoString hash]); |
But even knowing this stuff does not change the fact that using isEqualToString is bad. Actually I continue to use it as the sole trusted comparator for all my string needs. I don’t trust this hash voodoo.
Finally it weren’t a post by me if I didn’t think of a way to use a class category as well. How about comparing a string with an integer? There are two ways to go about this: convert the string to integer and compare it, or to have a class category for it.
For the first “usual” method we use the NSString intValue instance method.
NSString *one = @"123"; NSInteger i = 123; if ([one intValue]==i) { NSLog(@"Same"); } |
For the second “sophisticated” method we extend NSString to also know how to do comparisons with integers.
NSString+integer.h
#import @interface NSString (integer) - (BOOL) isEqualToInteger:(NSInteger)i; @end |
NSString+integer.h
#import "NSString+integer.h" @implementation NSString (integer) - (BOOL) isEqualToInteger:(NSInteger)i { return [self intValue]==i; } @end |
This packages the ugliness away into a separate file and henceforth we can use our pretty comparator like this:
NSString *one = @"123"; NSInteger i = 123; if ([one isEqualToInteger:i]) { NSLog(@"Same"); } |
In the same way to can construct comparators for any kind of parameter be it scalar or object pointers. Of course the above mentioned category could also instead take an NSNumber. You would check that it’s really an NSNumber being passed by checking the class at the top of the method and return NO if it’s not a descendant of NSNumber.
But this is left as an exercise to you. Do you know of any other traps related to NSString that I have yet to discover? Let me know in the comments.
Categories: Q&A
Very nice write up and very comprehensive. Thank you.