It all started when my new boss stepped into my door and asked me if I had a clue how our system was working. In an ideal world we would have teams and developers who work on a isolated context with clear boundaries. They develop, test, deploy, monitor and operate everything by themselves. They are on 24/7 call and therefore they do everything possible that nothing fails. The system is stable and brings in the money.
In my world however, things are a little different.
Being there for over 2 years and a developer I knew from history vaguely what was going on, who did what and even sometimes why. But it struck me that we had no clear overview. My boss wanted to know what he is dealing with and could only find (outdated) wiki pages and knowledge of some people. Wikis are great (Where Information Kills Itself) but developers tend to not updated them. People come and leave or forget things if they not constantly working on it. The only person knowing what is going on is the developer (and through incidents operations). The former person does not like to leave his development environment and documenting stuff is mostly boring. The truth is in the source. I’m guilty of that as well.
To make things worse we were (and are) in a transition from a monolithic application (but distributed over a dozen services) to a much more ‘microservice’ landscape. So new services were popping up left and right, teams were slicing the old ones and creating many much smaller ones. We have now more than 70 different services composing our platform.
How do you know what’s going on?
I ended up in a role taking care of the whole platform and I was wondering do I really need to know? I can’t possible understand the whole system. We do have a team architect in every team. We have created a set of rules how we develop services: ‘API backwards compatible, REST, HATEOAS, phoenix’ style. The team knows it the best when and how to create new services and how to deal with them. They know the business logic.
The rule set took a while to sink in to peoples minds (we talk about months > 18) but finally it started to show. In case you are wondering, we do have a couple of more rules then the ones I just stated ;-). Still the teams were working mostly by themselves and it turns out to be okish.
After a couple of month I had the gut feeling that I know nothing about the platform and that this is not so good. Not even about programming languages used. We have our rules but one guy managed to sneak in Scala anyway. Not a bad choice but those things have a broader impact on the organisation.
As we hired more and more people (internal & consultants) and with over 70 services it is hard to understand the system. Especially when you grew from a distributed monolith and still (of course) have the roots of that in the system. So that is a problem. Another thing we ran into is that sometimes you need to make breaking API changes. In this case it is useful to know who is your client to get that code changed as well. Maybe strict CDC tests would also reveal that. So an overview of all software to get started/get more information as well as the overall picture is sometimes needed.
Our own tool.
We always want to know of every piece of our software:
- The name
- Which team is responsible
- Description what it does
- What kind is it? Library/Tool/Service
- Where is the wiki page with more information
- Where is the build chain
- All license/dependency information
- VCS root
So every team put a yaml file in the source root of the software and modified the CI template to run our tool to gather/generate all the information and upload them to the wiki. Every time a developer now checks in something we update the information. It took some weeks to get the teams to introduce this file but finally we are done.
But there is more to it …
Seeing this information for some time I realized how much more we could do, e.g. if you run on virtual machines and secure them with iptables. If you introduce a new connection from one service to another, and depending on network zones and hardware firewalls, it will trigger a (half) manual action by operations. If I would know the connections and dependencies between the services we could generate it automatically, I could draw the whole architecture. I drew a picture of our bounded contexts some time ago by hand and I keep it updated but what if I can put a mapping between services and contexts in the described yaml file? So again I set out to see if I can find a piece of software which does it for me. We wrote our own tool for the first use case but I was wondering if somebody else came up with something interesting in the mean time.
I found many monitoring and documentation tools which run in production and capture the connections there. I think this is very useful and we may need something like this as well to see how the current situation is. But I could not find tools which generate how the situation should be (plus all the other information we wanted to know about the software). Phil Calcado of (Ex)Soundcloud pointed me to some slides where you see something similar and during GOTO Berlin one guy from Spotify talked about their ‘System-Z’ which does something like we wanted. But nothing of that was/is available as Open Source.
Let’s do it.
So, here we go again.
The last couple of days my colleague Felix and I started to enhance our current code.
First I thought about what we might need (and maybe not just us) and what we could easily gather automatically or the teams would have to contribute. Here is the first try:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
As I came up with my fictional example and trying to figure out what information we would require to have I thought this whole thing is totally customisable by the user. Almost all parameters are optional. The tool should work like this:
- A client runs in the source directory of the software and can gather (or not) additional information. It might transform a yaml file (like we choose to do) into json, enrich it with generated license and vcs information and upload it to a server. This client could also be just a curl command and just reading in a file in json format.
- The server indexes it and provides an api to access the data.
- Other clients query the server and generate firewall rules, docker-compose files, visualizes the architecture, list all services and their description and so on.
- The format needs to be flexible and I do not want to force anybody to use things which might not be interesting for them. So we settled for now as mandatory attributes: ID, Name, Type, Team, Description, VCS. This is the minimal information we need. I guess we need to discuss the VCS when we want to model our whole architecture which includes a commercial database.
- The server accepts everything in json which has the mandatory fields. Your client needs to make sense of it. We want to provide a recommended set of data structure and some clients which do something useful with it.
This is work in progress and we coded a few hours on a prototype to see if this is doable. We see this tool to serve multiple goals:
- Document your microservices
- Visualize your Architecture
- Generate data for your infrastructure (firewall, docker-compose, etc)
- Find used libraries (with security problems)
We already made some progress and even got some cool visualization. D3 is cool if you know how to use it (or just stackoverflow it).
We have tons of ideas what might be possible with that. But first things first.
Thanks for your time and for reading so far. I would love to know :
- What do you think of all this?
- How do you handle the above questions?
- Which tools do you use?
- What other use case do you have?
- Would be something like this interesting to you? I love coding but I’m happy to use a better tool if you know one.
- Would that be interesting if we open source it ?
Any thoughts ?