Before jumping into this post, be sure to read part one for the background leading up to this point.
Unlike the other topics, working on the domain often feels murky and uncertain because it is typically the most challenging area of work. Infrastructural and performance tuning efforts are by no means easy, but they are often concrete problems. As with the reporting performance example, the solution is not necessarily clear from the start, but the problem is well defined. When you find yourself working on domain problems, figuring out what is being solved can be a significant challenge. It’s hard because speaking about technical things that you are comfortable with, such as, databases, event buses, and actors are not going to help you solve your domain dilemmas until you understand the problem at hand. This is exactly why maximizing effort to understand the problem domain is critical to having a successful technology team and ultimately a successful business.
What does focusing on the domain look like? It often starts with meetings with your stakeholders to visualize workflows. As a developer tasked with building something you might know little about, this is your chance to learn about the business. You guide discussion by asking questions. These questions will look like:
- How does event X happen? What happens before event X?
- When event Y happens, what else happens?
- What if Z occurs? Is it legal for Z to occur? Should X change because of Z?
To continue the RTB example, here are some of the types of questions I had as I first learned about what I was trying to build:
- So the bidder will be a software component listening to bid requests. Where do bid requests come from? What does a bidder need to do receive bid requests?
- When the bidder receives a bid request, what steps occur to determine the price to bid on the bid request? Does the bidder bid on every bid request or does it ignore some requests?
- Is it legal for a bidder to skip bidding on a bid request? If the bidder does not bid for some amount of time, would this be normal behavior? Should there be inactivity alerts?
It is hard to map out the problem domain and to identify the problems to be solved. But, it should excite you to participate in this process. This is where you learn what makes the business tick and how you can add value to the company.
Let’s assume a picture of the domain has begun to spring in your mind and your team begins working on new features in the newly discovered domain. What types of discussions might you have with team members if you are intently focusing on the domain? Reflecting on my own experiences, here are prototypical examples:
- Does it make sense for the bidder to be aware of the entire hierarchical user tree exposed by the configuration system? In the context of the bidder, what is the minimal amount of configuration information needed for the bidder to bid? Can the hierarchy be represented in an alternative form to avoid dealing with N-levels of nested rules?
- It looks like the bidder needs to know if a client has sufficient remaining budget before bidding on behalf of the client. What is the clearest set of events to represent budgeting in the bidder’s context? Does it make sense for the bidder to receive an event each time a bid occurs to decrement available budget? Instead, could it make sense to only send an event when the budget is either completely depleted or replenished?
- In the state machine that determines available client budget, a budget is only exhausted when zero dollars remain. But, the budget also has an end date. Can time also cause a budget to be exhausted? When time expires the budget, should the budget be exhausted or paused? In what contexts is it important to know if a budget is exhausted, but still playing versus being paused?
These questions boil down to asking what is the most correct and expressive way to model your domain. Hopefully you can see how these types of concerns arise in your domain as well. I’m trying to drive home the point that when you focus on your domain, it is largely possible to separate the question of modeling (i.e. the business problem being solved) from the infrastructural concerns.
A direct consequence of intense domain focus is a simpler design because there is an intrinsic understanding of the problem being solved. This potentially simplifies work in infrastructure and performance tuning. Here is an example from my experience building an RTB system where clean domain modeling simplified performance tuning. CPU profiling revealed significant latency in the bidding process was due to a set of boolean predicates (i.e. filters) that are invoked each time a bid request arrives to determine if a bid request is biddable. If any filter is unsatisfied, the request must be dropped.
Since significant effort was put into modeling distinct concepts separately and explicitly, bid filtering was already a distinct module (in the Scala sense of the word) without external dependencies. Reasoning about isolated logic is an easy exercise and particularly so because the module exposed a pure function to perform filtering. Taking a step back from the code and thinking about the inputs to the function, my team had an epiphany when we realized that all filtering stems from the bid request pixel width and height. In display advertising, ultimately a bid is placed on behalf of a slot on a page that has a defined width and height. Since my team was familiar with the domain through prior discovery, it was also known that the set of valid widths and height are finite and static. This domain insight allowed a reformulation of the problem.
Instead of evaluating a set of filters on-demand in the critical path of bid request evaluation, it is possible to model filtering as a separate workflow that continuously determines the set of biddable clients for a given width and height. The output of this workflow is one of two events:
- ClientBiddable(clientId: Long, creativeWidthPx: Int, creativeHeightPx: Int)
- ClientUnbiddable(clientId: Long, creativeWidthPx: Int, creativeHeightPx: Int)
Over time, clients become biddable or unbiddable contingent upon a whole slew of other events. This means that bidder can consume just the information it needs to know: What are the set of valid clients for a width and height? It can store this information in whatever way is most efficient for its purposes. In this case, the bidder maintained a map keyed by width and height pointing to a set of client IDs. This solution provided the performance boost needed by replacing predicate evaluation with a constant time lookup. Beyond the performance improvement, the domain is now more cleanly modeled. Since filtering is now an explicit workflow independent of the bid evaluation critical path, it can live outside the bidder process. This likely provides even more performance boosts because now the bidder process is more focused on just one task: bidding.
There is definitely context missing from this example, which is the issue with describing any complex domain problem. A lot of context is needed to fully understand the situation. But, I hope that this example provides you the intuition for understanding how beyond learning the business you are in, focusing on the domain can yield substantive improvements other areas of development work. And I hope I have convinced you to take fresh look at how you are prioritizing your development efforts to see if you are truly making the most of your development effort.
Think about the types of features or bugs (i.e. hidden features) you’ve worked on recently. If you bucketed your efforts into several high-level topics, what would those topics be? Which topics contain the largest number of features/bugs? Which topics should contain the largest number of features/bugs? That is, are you focusing your efforts where you think you should be?
When I reflected on my work, I came up with the following categories:
- Performance tuning
Part one of this two part series covers infrastructure and performance tuning.
The ‘infrastructure’ label covers tasks like:
- Implementing a new architecture. For example, designing a better way to distribute traffic/requests to scale up with increased business volume.
- Working with new libraries or frameworks to accomplish a business goal. A recent example from my own work is to leverage Akka clustering to help build a real-time bidding (RTB) system.
- Refactoring to minimize technical debt. An example from my work is when my team decided to replace a string representation of HTTP request errors with a strongly typed representation. This involved a transformation of numerous return types from String \/ R (where R is the return type when successful) to ApiError \/ R (where ApiError is an ADT). This refactoring simplified reasoning about and handling failed requests. Increased abilities to reason about code is a fantastic technical benefit, but does not bring any immediate benefit to the business. Ideally, value is added to the business when additional work is done in the refactored area because the team will be able to complete the work sooner.
Is it a good use of time to be working mostly on infrastructure? Like everything else in the world: it depends! During the inception of a project, it is inevitable that infrastructure becomes a focal point. Short of major changes in business objectives, the goal is to minimize infrastructural effort over time. To use a road building analogy, when no roads have been built, you need to design some paths and lay some concrete. Extending this analogy, when in the early stages of a project, I caution you to consider how many roads you build and how well you build them. Does the business require a complex interstate highway structure or just a few two lane roads? What are the risks to the business if instead of building roads that withstand years of tumultuous weather, the roads begin to breakdown after heavy usage? When building infrastructure, you should frequently revisit these types of concerns to ensure you are adding value to the business instead of building a ghost town.
When I first began working on the previously mentioned RTB system, my initial conversations led to requests for processing many thousands of bid requests per second plus tight latency requirements. When I first heard the desire to support a heavy request load from the onset, alarm bells went off in my mind because I knew accomplishing this goal requires significant infrastructural and performance tuning efforts. From previous posts, you should expect that I immediately questioned why the first iteration of an RTB system needs to be scaled up to handle significant load. As it turns out, when an RTB system is connected to an ad exchange to receive bid requests, there are few ways to filter traffic sent by the ad exchange to a bidding system. This means that the RTB system must be able to at least handle receiving a certain request load without toppling over. Alright, so in this particular scenario, it turns out that the infrastructure effort is justified. This is a great result because now the development team can build confidently knowing it is building according to the demonstrated needs of the business.
Performance tuning is a topic closely related to infrastructure. For the purpose of this discussion, I am treating it distinctly because I want to separate the construction of a component from measuring and improvement of its runtime characteristics. Over the lifetime of a project you should strive to focus on performance as little as possible. Realize that this implies that at times the bare minimum focus level can be significant.
Looking at another example from my work experience, consider a system that generates on-the-fly, client-facing media buying performance reports in XLS format. Up-front, the business was most concerned with correctness. This makes sense – clients should receive an accurate reflection of spend. Speed of report generation was a secondary concern. As long as reports are generated within minutes or dozens of minutes, the operations team will be able to service clients. Given the volume of data being processed at the start of this effort, the speed concern was a non-issue. Fantastic! At this stage in the project, effectively zero focus on performance is needed, which means that the team can focus on either infrastructure or domain concerns.
Fast-forward several months as business ramps up and now each report covers an order of magnitude of clients more than a few months ago. Due to the size increase, reports fail to generate and my team is endangering valuable client business. In an ideal world, report generation time would be monitored and my team would be proactive instead of reactive. This is water under the bridge and now my team is intently focused on performance. A quick triage reveals the issue to be in-memory processing needed to generate reports runs out of heap space. The resolution to this issue should balance immediacy with an understanding of what is core to your problem domain. Looking past expediency for a moment, we come back to the motivating question for this post: What type of work should you be focusing on? Is the company’s secret sauce baked into its reporting system? If yes, then it’s time to rethink implementation and find a more robust solution. If no, then perhaps a re-write is not the most effective use of time.
In my case, reporting is not the competitive advantage my company provides. My team sees more strategically important features to work on instead of reporting, but obviously this issue needs a pretty good band aid quickly. My team spoke with the client operations team to better estimate the upper-bounds of client report size for the foreseeable future. With estimates in hand, my team opted to fix the problem with hardware for at least the short-term. In a test environment, report generation was simulated until confident that enough RAM was added to support repeated report generation with maximum report size. In short order, new EC2 instances were spun up in production and more focus can be paid to the final topic: the domain.
Stay tuned for part two where I delve into the domain!
The Business is Right
Over several years, I have recognized that I often conflate “the business is right” with “the business knows what it needs”. I interpreted “the business is right” to imply that because the business team knows its domain, it must know what is needed. Without a doubt, the business team should be relied upon for domain insights (e.g. to help build a ubiquitous language). But, I strongly believe that as a developer, most feature requests should be treated as a belief rather than a requirement.
What’s the Difference?
In software development, the word “requirement” is used liberally to mean that a feature is needed for the business to operate, grow, execute, etc. A requirement implies certainty. A less concise way to express a requirement would be to say, “I am absolutely certain that the business must have feature X in order to accomplish Y.” There are two troubling issues with describing feature requests this way:
- Throughout my experience, feature requests failed to live up to the definition of a requirement. Prior to implementation, “must haves” morphed into a request that could be useful down-the-road, or soon after implementation, “necessities” lost all relevance and are deleted from the code base.
- There is no questioning a true requirement. By definition, it is something that must be done. The above statement provides evidence that the validity of a feature request should be openly challenged. Software developers should not accept all feature requests at face value.
Alternatively, a feature request can be defined as a belief or a hypothesis. Unlike the term, requirement, these terms explicitly broadcast uncertainty. Using terms that outwardly express doubt makes it clear that an idea is subject to change.
Why Does it Matter?
I believe a seemingly trivial change in vocabulary helps to alter a software developer’s mindset. A software developer strives to build the software right and to build the right software. To do the latter, a developer needs to maintain an inquisitive mindset that challenges the status quo. By shifting my frame of reference from requirements to beliefs, I find it natural to ask questions like:
- Why are we building this feature?
- Do we really need X to complete feature Y?
- Can we just do parts A and B and do parts C, D, and E later?
- Is the business really going to use feature X instead of feature Y now?
Try asking these types of questions next time you work on a feature request. This vetting process typically results in conversations with the business team that change the feature request. The results will vary. Sometimes the feature remains the same, other times just a few small things will be tweaked, and perhaps occasionally, it is discovered that the feature is irrelevant. All of these outcomes are great results because domain knowledge is enriched through communication with the business team.
It’s About Adding Value
Iterative development frequently produces new features, which enables you and the business team to constantly learn more about what is the right software to build. Vetting beliefs and hypotheses to be as fine grain as possible enables features to be implemented sooner. The sooner a feature is implemented and demoed, the sooner more insights can be gleaned. Questioning feature requests adds value because it helps to maximize the effectiveness of software development effort.
It took me some time to understand the relationship between questioning feature requests and the ability to iterate quickly in order to deliver business value. I think it’s also important to recognize that similar to improving one’s technical skills, improving one’s vetting ability is a continuous learning process. I hope this post offered a different lens to view the software development process. If you’re interested in more material on the topic, this blog post by Barry O’Reilly is a good read.
You’re excited because you are about to break ground on a new codebase when you encounter your first stumbling block: What do I name this repository? I’m often annoyed by this question because I don’t have a name in mind and I want to dive into writing code. At the same time, I realize this is a pseudo-permanent decision. Begrudgingly, the repository typically ends up with an extremely generic name, such as, “data engine”.
The Meaning of Effective
Without trying to belittle my fellow readers, take a moment to consider why repositories have names. Broadly speaking, a name creates an association between a group of related concepts, characteristics, features and a representation of those concepts in our minds. An effective name creates a mapping that is nearly one-to-one. Which of the two brings a clearer picture to your mind? “A small, four-door sedan produced by a Japanese car manufacturer” or “Corolla”?
The Ineffective Way
Equipped with an understanding of what makes a name effective, let’s return to the case of naming repositories. In my experience, I’ve worked on several “engines”, “processors”, and “servers”. I argue these words are ineffective names because they are too generic to produce a concrete representation in our minds.
Typically, to make these names more “effective”, an adjective is prepended to represent the business functionality. For example, there might be a “trading processor” or a “pricing server”. At best, this is a marginal improvement. Any financial company in the world can have a “trading processor”. How will this name create a unique representation in your mind of the code and functionality contained in that repository? It won’t.
In addition to being an ineffective style of naming, this style can also be detrimental to software development. An important part of the software development process is to adapt to changing requirements. For example, assume the “trading processor” is originally defined to pass trades from an exchange to a client and vice-versa. Now, a new requirement arises to perform risk checks before sending the trade to the exchange. Is it OK to add the risk check logic in the “trading processor”? Of course it is! In this contrived example, it is probably pretty straight-forward to convince yourself that risk checks are part of trading, so it is acceptable to add it to the “trading processor”. However, imagine you worked with a particular definition of “trading” for months or years, then perhaps it would not be so simple to change your point of view. By encoding implied functionality into a repository’s name, you are artificially constraining yourself, which can be counter-productive.
Effective Naming Suggestions
I see a couple of ways to approach project naming more effectively. I think it’s important to bear in mind that a repository’s name is similar to a brand. A brand can be defined over time to leave a strong impression. Going back to the car example, when I hear, “Corolla”, I don’t think about flowers.
- Find a word or phrase that has an association with the functionality, but is not typically used in a computer science context. For example, Twitter’s snowflake, is a service for generating unique IDs. Snowflake implies uniqueness and over time, if you used this code, snowflake will be more memorable than say, “unique-ID-generator”.
- Use a completely unrelated word or phrase and build the association over time. For example, I use Netflix’s astyanax, which is a Cassandra client. While astyanax does not imply Cassandra client, over time it became straight-forward to associate this relatively unique word with “Netflix Cassandra client”.
- Thematically name repositories with an unrelated set of terms. The idea builds on the previous one to constrain the range of terms that can be used to name a repository. Some examples are bridges, subway stations, and last names of scientists. This idea can be fun if you can relate the theme to your organization’s purpose.
Take a moment to reflect on whether or not your repository names are slowing you down from adapting to new requirements or if your repository name defines a weak association. It will be worth the effort to improve the effectiveness of repository names.
Committing code is an integral part of life as a software developer. In the last eight months alone, I’ve committed nearly 800 times to my work repository. Yet, throughout my professional career, commit commenting style has been rarely discussed. It’s easy to brush away the task of describing your work with ineffective messages like, “Fixed it”. Even worse is to commit without a message. What is a better way to go about writing commit messages?
Knowing Your Audience
Commit messages, like other forms of writing, require you to know your audience. Your audience is anyone on your team, including yourself, at the present time and at any time in the future. To further complicate matters, a majority of the people reviewing your commit are likely unfamiliar with certain regions of the codebase. Given these considerations, how can you structure your messages effectively? I structure my messages by asking two questions:
- What have I done?
- Why I did I do it?
The response to the first question typically falls into one of three categories: (1) adding/updating/removing a feature, (2) fixing a bug, or (3) refactoring existing logic. To be effective, this response must be a single succinct and expressive sentence. The response to the second question is the opportunity to elaborate on the change’s intent and context. Until recently, this is a question that I did not ask myself. This led to situations where I could easily understand what had changed in a commit, but it would be difficult or impossible to understand why I had made a change. You do not want to end up in this position while tracking down bugs. Help your future self and others by explaining the rationale for the change. I have found my intent typically revolves around one of two broad categories: (1) business decisions or (2) side-effects.
Example Commit Messages
Let’s look at a few trading-related examples from my recent work to see how this works in practice. Each example is composed of a single sentence explaining what was done, followed by a newline and additional elaboration about why a change was made.
Prevent account balance updates out of chronological order.
Enforce that the previous account balance timestamp must match the preceeding trading day’s timestamp. This defends against updating the account balance when trades have not been loaded for all prior trading days.
This example falls into the bug fix category. Notice how an action verb begins the summary sentence explaining what was done. This pattern continues throughout all the examples. Although it might be straight-forward to understand why out of order updates are a bad idea, consider reading this commit among dozens of other commits. Explaining the change’s context reduces the cognitive load required to understand this change.
Warn about stale market data after two minutes.
Previously, the system waited ten minutes to warn in order to avoid issuing many warnings during intraday trading breaks. Since all trading breaks are now modeled within MarketHours, there is no reason to keep the warning threshold at ten minutes.
Here is an example of updating a feature with an extremely straight-forward change that carries significant context. You can imagine this change diff consists of changing a ’10’ to a ‘2’. But, why was this value changed at this time? Without the second part of this commit message, you need to remember why the value was set to 10min and what changed in the system to allow you to change the value to 2min. Now, at a bare minimum, you know that a change in the modeling of trading breaks is why this change is now possible. Again, consider a commit log with only the first sentence and one with the entire message. Which would you find more helpful?
Default to requiring 1+ trades before committing transactions to database.
This change prevents the account balance from being updated when there are zero trades available due to a technical error. An override flag is provided in the event that it is not an error to have zero trades.
To drive home the point that there are often external reasons for a change, the example above shows another straight-forward change. Without reading why the change was made, what would you guess is the reason for the change? Could it be a performance issue? Perhaps business requirements have changed? It’s unclear to me. The second part of this message suggests that the change is due to business requirements. The context indicates that it is likely a technical error (e.g. a remote service was unavailable) to have zero trades available when updating an account balance.
Let’s presume you discovered a bug in this change. Without having any context provided in the commit message, you might assume there is no need for this one or more trade requirement. You may ultimately end up reverting this entire change without reworking the logic to satisfy the business requirement. Now, you ‘fixed’ the bug, but are failing to fulfill your business requirements.
Practice, Practice, Practice
I hope your takeaway from this post is a realization that adding more context to your commit messages will likely simplify your life and your team members lives in the future. As you practice answering the two questions when writing your commit message, you may find situations where added context is unnecessary. That’s fine. Not all changes require in-depth explanations. Identifying the right level of detail becomes easier with practice. As you become better at identifying when more detail is needed, you will discover that reviewing your commit messages can be informative rather than mysterious.
When you Can’t Drop the Database
Like high-level requirements, live testing goals appear deceptively simple. “Exercise the trading API implementation with my trading partner” is a single statement that to the trained eye reveals numerous scenarios. Ideally, you already considered most scenarios and wrote automated tests to verify functionality. But, unlike sandbox testing, when something goes wrong in a live testing scenario, clearing a database probably won’t fix anything.
Consider the situation I found myself in as I was testing my trading API implementation. I am on a conference call with my trading partners preparing to manually submit trades. I submit a trade that should not be filled when I am told, “That order shouldn’t have been filled, but you were executed. Close your position immediately!” I need to act quickly. Real money is the on the line and people from another company are awaiting my response.
I admit this is a dramatic example. In live testing scenarios, I tend to be a bit tense and nervous. Even with a suite of automated tests, I still fear something will go wrong. I don’t do my best thinking under-the-gun, so I prepare plans for handling unexpected outcomes in advance. This helps keep me calm and in the event something goes wrong, I will hopefully have a fallback available.
How to Prepare
Reflecting on the different areas of preparation led me to the following categories: communication, flexibility, and instrumentation. Let me detail questions I answer in each category to determine readiness.
Have I explicitly stated the test scope?
While the purpose of a test may seem obvious to you, the third party may be unaware of special cases you wish to exercise. Express your test cases up-front to avoid discovering during the test that a subsequent test is needed for a special case.
Has the third party confirmed that it can take corrective action in the event of X?
X is a disaster scenario where you lose control to fix the situation. For example, if an order erroneously executes and you lose network connectivity, it is imperative that the third party can close a position on your behalf.
How easily can my test harness test each known scenario?
Using the trading example, it is helpful to be able to interactively specify all order properties on-the-fly. It becomes increasingly more important to have a flexible test harness when live tests are hard to coordinate.
In the event of an emergency, what is the fastest way to deploy a new build?
Ideally this option is rarely or never used. But, in the event that you need to build a kludge for testing purposes, knowing the deployment process will minimize delays.
When I perform test case X, what information will the third-party want me to confirm?
Asking this question of each test scenario is a good way to ensure all the requisite information will be on-hand.
If X goes wrong, what information would help me debug?
When a test scenario goes awry, it is extremely discomforting to discover a piece of relevant information is not visible.
Are log levels configured appropriately?
This sounds trivial, but I’ve missed this item before and nullified all my efforts to view my debug statements.
And a useful tip for testing on remote machines: use Screen! This way when you lose network connectivity at the worst possible moment, your foreground process continues running and you can reconnect and resume as if nothing happened.
I learned some of these lessons the hard way. Learn from my experiences so you don’t fall prey to the same mistakes!
Source code shown in examples is available on Github.
If you answered, “double” or “BigDecimal”, I think you can do better. These data types capture the value of the price, but what do they say about its units? Absolutely nothing! I believe this is a big problem. How can you prevent a price expressed in U.S. Dollars from being added to a price expressed in Euros? Naming conventions, like priceInDollars, provide zero enforcement of a policy and result in unwieldy variable names. Ideally, the solution to this problem will provide compile-time safety. Compile-time safety ensures that there will never be an operation that violates dimensional analysis. But, how can we do it?
One approach is to construct a case class for each unit of a price. A trivial example may look like:
This approach provides the compile-time safety we want, but it is unwieldy for several reasons. Its usage requires an invocation of ‘value’ member, which adds noise to the code. Each new unit of price requires creating a new class and then implementing every operation (e.g. addition, subtraction, etc.). Fortunately, in Scala, it is possible to leverage language features to come closer to the goal without so much overhead.
In Scala, there is this concept of ‘tagging’ a type to add context to its meaning that is enforced at compile-time. Originally, I came across this concept from Miles Sabin’s gist, which I use for the examples below. Abstractly, tagged types makes use of Scala mixins and structural typing to enable one type U, to be attached to a type T, such that the API of type T is still accessible. If we are representing a price with tagged types, then T is BigDecimal and type U is the unit of the price.
This is a powerful concept because defining a new unit of price (e.g. U.S. Dollars or Euros) will only involve defining a new type. Using this new type to ‘tag’ a BigDecimal exposes all of the operations of (i.e. API) BigDecimal. Let’s take a look at a concrete example:
The example defines a ProfitCalculator to demonstrate how the compiler enforces the type safety. ProfitCalculator is able to calculate profits only with prices of the same unit. As is shown in the commented out code, mixing units of price will cause compilation to fail. Using tagged types, it is possible to perform arithmetic because the underlying referenced value is a BigDecimal. And since the definition of a unit of price is a type instead of a class that references a value, there are no awkward ‘value’ references.
In addition to removing noise from the code, I find this approach valuable because it increases one’s ability to reason about a program. Explicitly referencing the types in the argument list of ProfitCalculator.calculateProfit(), makes it transparent to the API consumer what values are acceptable. In comparison, the case class approach wraps the BigDecimal value, which makes it more challenging to understand the price’s data type.
Units of Measurement
In my opinion, applying tagged types to the representation of a price is a substantial improvement to simply representing it with a BigDecimal. However, one should not mistake tagged types for units of measurement. From the compiler’s point of view, the following is valid:
tag[Usd](BigDecimal(5)) + tag[Eur](BigDecimal(5))
The plus operator acts on a BigDecimal, so here tagged types cannot defend against arithmetic that violates dimensional analysis. The only way to prevent this type of mistake is to implement a system for units of measurement.
This is a topic that I will be exploring. In principle, I believe representing units in code is a good idea. However, in practice, I think it needs to be done in such a way that there is minimal overhead to write code expressing units. Two references on the topic I have found are Scala macros and ScalaQuantity. If you are interested seeing additional examples with tagged types, see this excellent blog, Practical Uses for Unboxed Tagged Types.
The Logic of Failure has nothing to do with software, but every once in awhile, it’s fun to read something different. Check out the amusing book cover below. I think it captures the intent of the title perfectly.
Dörner decomposes large, complex failures (e.g. Chernobyl) in order to explain how a situation becomes a failure. Let’s take a look at several parts of the book that resonate with software development.
Dörner stresses that working with a complex system necessitates thinking about all of the interconnected components and downstream effects. This seems blindingly obvious, but on multiple occasions, I have introduced a feature or fixed a bug without realizing that I’ve created a new downstream bug. Coping is challenging because of our limited capacity to maintain different thoughts in our mind concurrently. Applying proven techniques, such as, functional programming and refactoring, and understanding the domain are part of my coping strategy. I increasingly find myself asking more questions about the code I write or review. For example, “The new alerting feature will inform us when market data providers are not sending data. How will the system behave in off-market hours when it is normal for data providers to be offline?” I think leaving no stone unturned is one of the best defenses against introducing downstream bugs.
Consider Implicit Goals
Another idea I found interesting relates to implicit goals. Dörner states, “If we do not concern ourselves with the problems we do not have, we soon have them.” An example he gives is that an individual wants to be in good health, but unless he/she makes good health a personal priority, he/she will likely become ill. In the software world, I consider clean code to be an implicit goal. This goal will never be reached unless the team constantly evaluates the code looking for opportunities to refactor and to simplify existing logic. At the same time, I believe that this principle must be applied in moderation. Constantly worrying about performance rather than focusing on iterating on deliverables that add business value might lead to a situation where the system is extremely performant, but never reaches production.
Avoid Clinging to Hypotheses
I particularly like Dörner’s statement about making hypotheses, “We are infatuated with the hypotheses we propose because we assume they give us power over things.” Sometimes we so badly want our claims to be true that we will only consider evidence in our favor. Perhaps you’ve designed a domain model that fits reality perfectly. Suddenly, requirements change and you begin mangling your beloved model to fit the new picture. It may be judicious to consider starting from a clean slate rather than expending great effort to preserve your outdated view of the world. I believe it is important to avoid clinging to old code. Take pride in deleting code!
Dörner’s thoughts on unclear goals resonate well with my experience interpreting requirements, “By labeling a bundle of problems with a single conceptual label, we make dealing with that problem easier – provided we’re not interested in solving it.” Only by deconstructing one feature do we realize that there are multiple, potentially conflicting, features required. Like other complex problems, software cannot be successfully completed with simple labels, such as, “the system must support trading”. This one feature is actually a multifaceted problem. What happens when a trade is rejected, partially filled, or replaced? How does the system handle a trade request that does not receive a response?
Reflect to Improve
The ideas Dörner presents are not revolutionary. I’ve previously thought about these notions in some form (as I am sure you have). But, I think there is significant value in reviewing and reflecting on these critically important ideas to better our ability to solve real-world problems. Arming yourself with defenses against your own limitations will help you be a better software engineer and a better problem solver.
Old Habits Die Hard
It’s a simple idea, but it took one year of Scala programming and the urging of my co-workers to realize that functions are interfaces. Coming from an object-oriented (OO) background, it is difficult to stop creating hierarchies of traits and classes in favor of injecting a function as a dependency.
Harnessing the power of functional programming to minimize (boilerplate) code has been a pervasive theme throughout my functional programming evolution. There was a time when it was foreign to iterate over a collection using foldLeft, instead of creating a class called ‘ComposedFoo’ to handle the iteration. I no longer have the overhead of creating an OO abstraction that requires testing, introduces another layer of indirection, and makes the code less expressive.
As my friend, Noah Cornwell, pointed out in the comments of my previous post, the architecture of my solution is OO. It involves defining and implementing a one function trait. Just like mutability, one function traits have become a code smell. Instead of defining a class hierarchy, Noah suggests using functions alone to solve the problem. In other words, embrace the notion that functions are interfaces.
Instead of injecting dependencies via a constructor, provide them as part of the function definition. Concretely, this means that the requested trade repository and the set of price mutators should be provided at the call-site instead of the constructor. This approach explicitly defines the resources needed to do something (e.g. compute a profit, store a trade, etc.), as opposed to the OO approach, where dependencies are provided to the entire class, but may not be used by each function.
This approach is impractical for an OO language (e.g. Java) because it is impossible to partially apply a function. Instead of creating a new class, partially applying a function results in a new function that accepts fewer arguments. Partial function application allows a function that requires a request ID, execution price, requested trade repository, and set of price mutators to be transformed into a function that only requires a request ID and execution price.
Functions Are Flexible
Extending this notion, I would replace passing around interfaces/classes as constructor and function arguments with functions. Consider how ExtensibleTradeExecutionProcessor requires a RequestedTradeRepository. In a real world application, it is likely that the repository has many functions (e.g. storage, various retrieval methods, removal), but in this context the repository is required for only one purpose: to yield a Option[RequestedTrade] given a String ID. Therefore, I would replace the RequestedTradeRepository dependency with String => Option[RequestedTrade]. Now, the dependencies explicitly state intent and cannot be used for unintended purposes (e.g. storage).
Code becomes less brittle when functions are used as interfaces because now the ExtensibleTradeExecutionProcessor is no longer arbitrarily limited to the RequestedTradeRepository interface to retrieve trades. Any function that obeys the contract, String => Option[RequestedTrade], can be applied. This flexibility implies the code is more loosely coupled, which is always a desirable quality.
It’s Not Perfect
During my limited experience writing software using this approach, I’ve discovered it is harder to discern the contextual meaning of a type in a function argument list. Given, (Int, Int) => Int, what do each of the three integers represent?
Consider the function definition below:
def area(length: Int, width: Int): Int
The argument context is immediately clear because the arguments have names. In contrast, (Int, Int) => Int does not define the argument context. The question I ask when I see these integers is, “What type of integer is required?” In this example, the length and width are interchangeable, but it should not be a stretch to imagine examples where the type of an object matters. To learn about defining the type required for a function argument (or return type) in a way that reveals its context check out this post on tagging types.
I hope this article introduced a new way of thinking about how software can be wired together. The only way to gain an appreciation for this approach and to understand if it works for you is to try it out.
The Merits of Object-Oriented Design
Check out Github for the full source code used in this post.
Although I have shown that object-oriented design is not the only way to organize code, I would like to share two object-oriented principles that I believe are worth rigorously applying: (1) single responsibility principle (SRP), and (2) open/closed principle (OCP). When applied judiciously, these principles lead you to writing loosely coupled code that narrows the scope of each class/function. In my experience, this is the only kind of code that can be maintained in a large project. Limiting the responsibilities of a code segment allows you to effectively reason about its logic without concerning yourself with the complexities of the entire code base. As is often the case in software development, proper abstraction is a core concept to SRP and OCP.
In Dire Need of Refactoring
Let’s explore an example that exposes the weaknesses of complecting code and then take a look at how to make it better. Consider a financial system processing trades that applies special (legal) rules to modify the execution price of a trade before it is reported to the trader. For simplicity, the execution price is only modified when:
- The symbol associated with a trade matches a configurable symbol.
- The volume associated with a trade is less than or equal to a configurable volume threshold.
The execution price modifications are reflected in the reported price. With these two rules in mind, consider one implementation of a TradeExecutionProcessor:
Well, it works, but is it great? The answer is a resounding “No!” Here are questions I would raise if I encountered this approach to the problem:
- How will this code be unit tested?
As written, it is impossible to isolate testing individual rules. To exercise the volume price modification, one must always also consider the symbol price modification. This unit test will become increasingly harder to maintain as rules are added. Eventually, when it is too complicated, the unit test will just not be updated.
- How will this code be extended?
Inserting an additional rule requires modifying the internals of the TradeExecutionProcessor. The lack of separation between what rules are available and how they are applied limits one’s ability to reason about rules in isolation. Another way of expressing this is that ComplectingTradeExecutionProcessor exhibits tight coupling between rule definition and rule application.
- How will rule ordering be changed?
The structure of this logic implies ordering. Symbol-based price modification must occur before volume-based modifications. From a business perspective, is this true? In this case, switching the order of rule application yields the same result. This logic does a poor job of explicitly express this notion. However, for the moment, let’s assume that rule application order matters. In this scenario, if a change in rule ordering is required, one must modify the internals of ComplectingTradeExecutionProcessor and risk breaking unit tests and other functionality.
How Did it Happen?
In my experience, I’ve (unfortunately) encountered numerous analogs to ComplectingTradeExecutionProcessor. Worse yet, I’m sure I’ve been involved in the fabrication of these maintainability and extensibility nightmares. Bad code often begins as OK code that iteratively morphs through successive (unexpected) feature requests. Here’s how it might have happened:
- You break ground on an exciting new task: processing trade executions. The super simple process is to look up a trade request provided a request ID and then set the executed price to the given executed price.
- Your business realizes it can make money by occasionally modifying the execution price in its favor. Your product manager requests that the price sent to the trader is changed when the trade matches a certain symbol, let’s say, EURUSD. You are also informed that it is imperative for the business to maintain a record of the initially received price and the modified price. Armed with the awesome power of Scalaz, you make quick work of this story by adding a reported price property to ExecutedTrade and by introducing the symbol-based rule shown in ComplectingTradeExecutionProcessor.Here is where this code begins breaking SRP. While it is true, that the symbol rule is part of trade execution processing, it should not be the responsibility of this function to define the rules to be applied. Similarly, I expect you would find it strange if there was a SQL statement to find the requested trade embedded in this logic.
- It turns out that traders submitting small volume trades are ruining the profit margins of your business. Your business decides to reign in costs by charging traders making small trades more. And now we come full circle to the current picture. While this picture is not too scary, I think it is easy imagine how this process continues.
Let’s Make it Better
The ideal solution to this problem will make it easy to answer the questions posed earlier. It will apply SRP and OCP to lead us to a solution that is extensible and maintainable. Given the ComplectingTradeExecutionProcessor, I would refactor it in the following ways.
Let’s analyze how this implementation allays our earlier concerns:
- How will this code be unit tested?
The ExtensibleTradeExecutionProcessor unit test will exclusively focus on ensuring that a set of rules are applied to the executed price and that the resulting reported price calculation is mathematically correct. There will be unit tests for each rule that are free from the side-effects of other rules. Successfully separating which rules are applied from how the rules are applied fulfills the desire to write code that applies the SRP.
- How will this code be extended?
Unlike before, inserting an additional rule no longer requires modifying the internals of ExtensibleTradeExecutionProcessor. This is a successful application of the OCP. As a developer, this should be a welcomed relief because rules may be added or removed without fear of breaking the reported calculation. An additional benefit of this approach is that since each rule is now a function, each rule has an easily identifiable name. The function names should use the same vocabulary as the business, which makes it simpler to review rule implementation with project stakeholders.
- How will rule ordering be changed?
Rule application is now a configuration concern when the application is instantiated, instead of being embedded in the TradeExecutionProcessor implementation. The current implementation clearly indicates that rule ordering does not matter because the price mutators are a set. Should requirements change, price mutators can be changed to a list to denote that rule ordering matters. In this case, it is still a configuration concern to order rule application. This approach yields dividends when rule application ordering is changed and as a developer, you can configure rule ordering without fear of breaking unit tests or the reported price calculation.
I hope that the refactored solution struck you as being simple. Simplicity is often identified with software that obeys SRP and OCP. Study the unit tests to see how this new approach to trade execution processing removes complexity. Then, go find some complecting code in your own project and make it simpler!