In DTCoreText there is the DTCoreTextParagraphStyle class which represents an Objective-C wrapper around CTParagraphStyle. This has a method createCTParagraphStyle which creates the actual Core Text object to put in attributes of an NSAttributedString. It also knows how to create an NSParagraphStyle, but since this only exists from iOS 6 upwards and lacks a few features we’re still using the Core Text variant everywhere.
Due to the way how DTCoreText works I need to createCTParagraphStyle whenever I am constructing a sub string of the generated attributed string. This led to an unnecessarily large amount of CTParagraphStyle instances being created. So I had implemented a method long time ago to cache thusly created CoreText objects based on the ivars.
Though this was causing some problems in DTRichTextEditor and so I yanked the caching back out. Now the project has developed much further and so I felt I would want to give the caching another go. Here’s something interesting I learned.
As always please don’t hesitate to leave a comment if you have a better method to share …
The previous method of getting a unique key was using a naughty trick. Even Core Foundation objects usually have a description and this I was using as the key:
// this is naughty: CTParagraphStyle has a description NSString *key = [(__bridge id)ctParagraphStyle description]; |
This had several drawbacks. First creating an NSString description for many internal values is rather slow and also for reasons that I did not investigate much further there where scenarios where this would break. Possibly because I have 2 additional items inside the paragraph style that this would ignore: lists and blocks.
NSString as Cache Key
So the first approach I came up with this morning was to create my own NSString to be used as the cache key.
- (NSString *)_cacheKey { NSMutableString *key = [NSMutableString stringWithFormat:@"%d-%f-%f-%f-%f-%f-%f-%d-%f-%f-%f", _alignment, _firstLineHeadIndent, _defaultTabInterval, _paragraphSpacing, _paragraphSpacingBefore, _headIndent, _tailIndent, _baseWritingDirection, _lineHeightMultiple, _minimumLineHeight, _maximumLineHeight]; for (id tab in _tabStops) { CTTextTabRef tabStop = (__bridge CTTextTabRef)tab; CTTextAlignment alignment = CTTextTabGetAlignment(tabStop); double location = CTTextTabGetLocation(tabStop); [key appendFormat:@"-tab:%d-%f", alignment, location]; } for (DTTextBlock *textBlock in _textBlocks) { [key appendFormat:@"-block:%lx", (unsigned long)[textBlock hash]]; } for (DTCSSListStyle *listStyle in _textLists) { [key appendFormat:@"-list:%lx", (unsigned long)[listStyle hash]]; } return key; } |
To prove that this indeed works, I constructed a new unit test case.
@implementation DTCoreTextParagraphStyleTest - (void)testCache { // make a test style DTCoreTextParagraphStyle *paraStyle = [[DTCoreTextParagraphStyle alloc] init]; paraStyle.lineHeightMultiple = 2.0f; paraStyle.headIndent = 30; CTParagraphStyleRef para1 = [paraStyle createCTParagraphStyle]; CTParagraphStyleRef para2 = [paraStyle createCTParagraphStyle]; STAssertEquals(para1, para2, @"Two successife Paragraph Styles should be identical"); // change something paraStyle.tailIndent = -20; CTParagraphStyleRef para3 = [paraStyle createCTParagraphStyle]; STAssertTrue(para2!=para3, @"Paragraph Styles should not be identical after change"); // change back paraStyle.tailIndent = 0; CTParagraphStyleRef para4 = [paraStyle createCTParagraphStyle]; STAssertEquals(para1, para4, @"Paragraph Styles should be identical after change back"); // cleanup CFRelease(para1); CFRelease(para2); CFRelease(para3); CFRelease(para4); } @end |
The point of this test was to first show that creating two CTParagraphStyle instances in succession should return the same result. Then by modifying the underlying DTCoreTextParagraphStyle I would get a different result. Finally by reverting the change I would again get the same result as the first two.
Unfortunately this method is not very fast. Especially if it is called many times over a lengthy HTML document you would see it become a hot spot in Instruments. In this screenshot the marked purple mountain is me parsing the War&Peace HTML in the DTCoreText demo app.
You can see how the createCTParagraphStyle method is calling _cacheKey and this making up the lion share of the execution time in this method (432 ms of 553 ms). If you double-click on it you see a big red sea.
Apparently 97.7% of the samples taken execution was within this stringWithFormat: method. At this point it is quite clear that this is not an optimal approach. I suspect the main reason for lackluster performance on this line is that it takes quite some time to format floating point numbers as NSString.
I googled around a bit further, but couldn’t find a simple, yet easy to implement method for getting a better cache key. Then it dawned on me …
CommonCrypto
Why not simply putting all values in a struct and then creating an md5 checksum over that?
- (id )_cacheKey { NSMutableString *tabsBlocksListsDescription = [NSMutableString string]; for (id tab in _tabStops) { CTTextTabRef tabStop = (__bridge CTTextTabRef)tab; CTTextAlignment alignment = CTTextTabGetAlignment(tabStop); double location = CTTextTabGetLocation(tabStop); [tabsBlocksListsDescription appendFormat:@"-tab:%d-%f", alignment, location]; } for (DTTextBlock *textBlock in _textBlocks) { [tabsBlocksListsDescription appendFormat:@"-block:%x", [textBlock hash]]; } for (DTCSSListStyle *listStyle in _textLists) { [tabsBlocksListsDescription appendFormat:@"-list:%x", [listStyle hash]]; } struct { CGFloat firstLineHeadIndent; CGFloat defaultTabInterval; CGFloat paragraphSpacingBefore; CGFloat paragraphSpacing; CGFloat headIndent; CGFloat tailIndent; CGFloat listIndent; CGFloat lineHeightMultiple; NSInteger alignment; // make it full width, origin is uint8 NSInteger baseWritingDirection; // make it full width, origin is int8 NSUInteger tabsBlocksListsHash; } allvalues = {0,0,0,0,0,0,0,0,0,0,0,0, nil}; // pack all values in the struct allvalues.firstLineHeadIndent = _firstLineHeadIndent; allvalues.defaultTabInterval = _defaultTabInterval; allvalues.paragraphSpacingBefore = _paragraphSpacingBefore; allvalues.paragraphSpacing = _paragraphSpacing; allvalues.headIndent = _headIndent; allvalues.tailIndent = _tailIndent; allvalues.listIndent = _listIndent; allvalues.lineHeightMultiple = _lineHeightMultiple; allvalues.minimumLineHeight = _minimumLineHeight; allvalues.maximumLineHeight = _maximumLineHeight; allvalues.baseWritingDirection = _baseWritingDirection; allvalues.alignment = _alignment; allvalues.tabsBlocksListsHash = [tabsBlocksListsDescription hash]; // create md5 void *cStr = &allvalues; void *digest = malloc(CC_MD5_DIGEST_LENGTH); CC_MD5( cStr, (CC_LONG)sizeof(allvalues), digest); return [NSData dataWithBytesNoCopy:digest length:sizeof(digest) freeWhenDone:YES]; } |
There are some things that – due to their dynamic nature – still have to go in a string. Then I am putting all numbers into the allvalues struct. The dynamic string is represented by its hash. CC_MD5 creates an MD5 digest/checksum of a few bytes length. Finally we transfer malloc’ed digest into an NSData which takes over the responsibility of freeing the bytes when it is being deallocated.
Note how I am changing the width of baseWritingDirection and alignment in the struct. The reason for this – as I found out later – is that the compiler apparently pads narrower values to word boundaries, probably for performance reasons. If you leave these as 1-byte-wide then the assignment zeroes out the member variable, but the padding bytes are still random values.
This is the fasted method of getting bytes into an Objective-C object because it uses the bytes as they are without needing to copying them off somewhere. Speaking of copying, NSData kindly implements the NSCopying protocol which is needed if you want to use it as a cache key. Keys for NSCache and NSDictionary need to be copyable, i.e. implement the NSCopying protocol.
Another thing I learned only by finding that some other unit tests would suddenly break with the new implementation. I didn’t zero the contents of the struct instead relying on LLVM doing that for me. But apparently there are some instances where this is not true. By setting all values to 0 first I was able to get all unit tests back to working.
How does it Measure Up?
This can easily be determined by using CFAbsoluteTime to measure the duration it takes to execute this method.
- (CTParagraphStyleRef)createCTParagraphStyle { CFAbsoluteTime startTime = CFAbsoluteTimeGetCurrent(); id cacheKey = [self _cacheKey]; CFAbsoluteTime endTime = CFAbsoluteTimeGetCurrent(); NSLog(@"time: %f", endTime - startTime); //... |
Putting a few results for the same UI activity next to each other we find that the md5 method is 5 times as fast as the stringWithFormat: approach.
I’ll take the left column please, thank you.
Inspecting this in Instruments paints the same picture. Time spent in _cacheKey dropped to around 100 ms. Much better.
NSData Direct?
There is even another way to use NSData as cache key. Apparently NSData is using up to 80 bytes from its bytes payload as good hash as we can see in Apple’s Open Source code. They use an CFHashBytes macro which uses the ELF hashing algorithm, as explained in this question on Stack Overflow. Thanks Saurabh Garg for pointing this out to me!
This basically means that if the number of your ivar bytes is guaranteed to be less than 80, then you can also directly wrap the struct into an NSData and use this as key instead.
My first try actually seemed to reduce performance again, because having to unnecessarily copy around bytes takes a bit of time. The duration of _cacheKey went up from 100 to around 120 ms.
So I rolled up my sleeves and thought back to a time when I was still creating C-structures dynamically with malloc. Since we know that the struct’s data will be put into an NSData anyway there is no need to let the compiler create an allvalues variable on the stack (as all temporary variables are) but use malloc to create it on the heap right from the start.
Instead of creating a new variable we just define a variable type allvalues_t. Then we use the arrow operator to fill in the member variables.
// ... // a struct that takes on all sub-values typedef struct { CGFloat firstLineHeadIndent; CGFloat defaultTabInterval; CGFloat paragraphSpacingBefore; CGFloat paragraphSpacing; CGFloat headIndent; CGFloat tailIndent; CGFloat listIndent; CGFloat lineHeightMultiple; CGFloat minimumLineHeight; CGFloat maximumLineHeight; CTTextAlignment alignment; CTWritingDirection baseWritingDirection; NSUInteger tabsBlocksListsHash; } allvalues_t; allvalues_t *allvalues = malloc(sizeof(allvalues_t)); *allvalues = (allvalues_t){0,0,0,0,0,0,0,0,0,0,0,0, nil}; //memset(allvalues, 0, sizeof(allvalues_t)); // pack all values in the struct allvalues->firstLineHeadIndent = _firstLineHeadIndent; allvalues->defaultTabInterval = _defaultTabInterval; allvalues->paragraphSpacingBefore = _paragraphSpacingBefore; allvalues->paragraphSpacing = _paragraphSpacing; allvalues->headIndent = _headIndent; allvalues->tailIndent = _tailIndent; allvalues->listIndent = _listIndent; allvalues->lineHeightMultiple = _lineHeightMultiple; allvalues->minimumLineHeight = _minimumLineHeight; allvalues->maximumLineHeight = _maximumLineHeight; allvalues->baseWritingDirection = _baseWritingDirection; allvalues->alignment = _alignment; allvalues->tabsBlocksListsHash = [tabsBlocksListsDescription hash]; return [NSData dataWithBytesNoCopy:allvalues length:sizeof(allvalues_t) freeWhenDone:YES]; } |
Note the commented out memset. My experiments have shown it to be slightly slower than setting all values via the curly-bracketed list.
The result in Instruments is another saved 20 ms which causes _cacheKey to no longer appear in the list of hot spots in the right-hand panel.
It is a bit further down with 84ms. In fact you can see NSCache’s objectForKey take up even more time than our highly optimized _cacheKey method function. I would call this a good optimization success.
Conclusion
If you need a fast hashing function and you have less than 80 bytes you can wrap a temporary struct into an NSData and use the built-in hashing function which is slightly faster than even the highly optimized MD5 function from CommonCrypto.
For more than 80 bytes of key data you cannot do anything wrong with and MD5. No need to convert that into a string since NSData implements NSCopying and is fine for using as a cache key as well. Constructing long NSStrings from your key data should be the least favored option since the performance gap is as wide as a factor of 5.
Another thing though that I learned from my experiments today is that this kind of optimization needs a good set of unit tests to immediately know if your wonderful optimization doesn’t actually break something. I only noticed the need to zero the struct values from suddenly my Right-to-Left writing direction tests failing.
Categories: Administrative
Why not use calloc() if you want the memory block initialised to zero? The pattern of malloc/memset should pretty much always be outperformed by calloc(). After writing that, I wonder whether there isn’t something lurking somewhere there still – since you populate every value in the structure by hand anyway, there shouldn’t be any variable bits to affect your right-to-left test.
Since you are using the raw bytestream as your key, but are only explicitly initialising the individual fields, it is possible there are interstitial bits due to odd-ball packing of the fields to keep them “long-aligned”. All that should result in is different hash value
And finally, since you are trying to shave milliseconds, why not go the whole hog and use CFData – avoid the objective-C messaging overhead? Or better, a CFDictionary with a custom CFDictionaryHashCallBack that knows how to read the values directly from the DTCoreTextParagraphStyle (or at least the allvalues_t)?
Thanks for your comments. The point of this post was that any string operation is slower than a numeric one.
This says calloc is slower: http://stackoverflow.com/questions/1538420/c-difference-between-malloc-and-calloc
I felt the post was about improving your ability to cache arbitrary data structures, and most people do not realise what CFDictionary is actually capable of. Apple work hard to make them toll-free bridged, and people don’t realise that the underlying core dictionary can do far more than a regular NSDictionary.
With respect to stackoverflow and calloc(), I would measure performance rather than rely on heresay. Operating systems are quite capable of serving up zero’d pages to applications that ask for them – the linker relies on this to quickly load programs into memory.