Mylene has posted 2 posts at DZone. View Full User Profile

The Software IP Detective’s Handbook

09.06.2011
| 5242 views |
  • submit to reddit
Published by: Prentice Hall
ISBN: 0137035330

Reviewer Ratings

Relevance:
4

Readability:
3

Overall:
4

Buy it now

One Minute Bottom Line

If you want to read about IP, this book is absolutely interesting – if you don't know too much math, don't let that scare you – even then you can learn a lot... And even when you don't want to read about IP, I'd recommend you to read the parts on differentiation and correlation. They are worth it.

Review

The subject of this book is Software Intellectual Property, specifically software forensics. Intellectual Property (IP) refers to a “creation of the mind”, the result of someone's mental labour.

IP theft can occur intentionally, e.g. to get an advantage over a competitor, but also unintentionally, e.g. by a programmer using code written for one company in another, without permission. Sometimes even ideas someone has to solve a problem are owned by a company, that had that idea earlier, and holds the right to it – even when you would program it in a different language, with other algorithms, it's the functionality that counts in software forensics. It is not the writer's intention for people to read the book front to back but to let the reader determine which parts are interesting at a certain moment, by reading chapter 1 (or using the table presented in that chapter), and choosing the appropriate parts. Funny thing is, that the aforementioned table makes me (as developer) read the whole book, except for the part on software.

The first chapters of the book are intended for non-technical people, like lawyers or some managers. It offers a nice overview what software is, and all the different ways in which software can be presented (source code, object code, assembly, scripts, programs etc.).

The chapters on intellectual property (copyright, patents, trade secrets and software forensics) are sometimes a tough, boring read, but I read some other books on this subject, and to be honest, this was the least boring explanation I found. Overall it offers what is intended: an understanding of the concepts, and that is important if you really want to understand the issues at stake.

After these introductions, there are four parts dedicated to recognizing software intellectual property theft.
Source code differentiation, source code correlation, object and source/object code correlation and source code cross correlation.

The theoretical parts offer lots of mathematical formulas to clarify the subject at hand. The implementation parts are explained using commercial tools the writer's company wrote, and the application parts offer sometimes surprising, applications of the theory described, and their pro's and con's.

As every programmer will know the “diff” command, source code differentiation isn't too hard to understand, although it is much more than only “diff”. Source code differentiation finds literal similarities and is good for determining statistics about code changes. More useful, the next part, source code correlation describes how to find similarities even though changes have occurred, which is much more useful for finding cases of IP theft and the like. Source code differentiation is compared with detective’s work – it only needs small fragments of similarities, but doesn't determine guilt or innocence. Therefore human detectives are needed.

As every developer will know, even if you copied some code, after changing some identifiers, refactoring, splitting up methods etc. the original is almost no longer discernible. And it is this “almost” that is tackled by source code correlation.
The same theoretical background is used for object correlation and source/object correlation (comparison of source code with object code).
Source code cross correlation looks for commented lines of code – sometimes someone copies code , comments it out, and writes new code using the commented code, and forgets to delete the latter... This is what source code cross correlation searches for.

After reading all the theoretical and mathematical parts, the part “Detecting software IP theft and infringement” is where it all is combined: how do you use the mathematical models to prove IP theft? To use the quote at the start of this part: “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.” (Sherlock Holmes in 'The Sign of Four'). It is interesting, although I guess not much of the readers will ever be “expert witness” in court – and after reading this chapter, I'm really glad I definitely will never have to – it still remains, IMHO, a combination of science, common sense and experience.

The last part, some miscellaneous topics, is well, miscellaneous. I can imagine the writer added them, but they don't add too much interesting new stuff.

Published at DZone with permission of its author, Mylene Reiners.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)