834 stories
·
8 followers

Pike is wrong on bloat

1 Share

This is my response to Rob Pike’s words On Bloat.

I’m not surprised to see this from Pike. He’s a NIH extremist. And yes, in this aspect he’s my spirit animal when coding for fun. I’ll avoid using a framework or a dependency because it’s not the way that I would have done it, and it doesn’t do it quite right… for me.

And he correctly recognizes the technical debt that an added dependency involves.

But I would say that he has two big blind spots.

  1. He doesn’t recognize that not using the available dependency is also adding huge technical debt. Every line of code you write is code that you have to maintain, forever.

  2. The option for most software isn’t “use the dependency” vs “implement it yourself”. It’s “use the dependency” vs “don’t do it at all”. If the latter means adding 10 human years to the product, then most of the time the trade-off makes it not worth doing at all.

He shows a dependency graph of Kubernetes. Great. So are you going to write your own Kubernetes now?

Pike is a good enough coder that he can write his own editor (wikipedia: “Pike has written many text editors”). So am I. I don’t need dependencies to satisfy my own requirements.

But it’s quite different if you need to make a website that suddenly needs ADA support, and now the EU forces a certain cookie behavior, and designers (in collaboration with lawyers) mandate a certain layout of the cookie consent screen, and the third party ad network requires some integration.

What are you going to do? Demand funding for 100 SWE years to implement it yourself? And in the mean time, just not be able to advertise during BFCM? Not launch the product for 10 years? Just live with the fact that no customer can reach your site if they use Opera on mobile?

I feel like Pike is saying “yours is the slowest website that I ever regularly use”, to which the answer is “yeah, but you do use it regularly”. If the site hadn’t launched, then you wouldn’t be able to even choose to use it.

And comparing to the 70s. Please. Come on. If you ask a “modern coder” to solve a “1970s problem”, it’s not going to be slow, is it? They could write it in Python and it wouldn’t even be a remotely fair fight.

Software is slower today not because the problems are more complex in terms of compute (yet they very very very much are), but because the compute capacity of today simply affords wasting it, in order that we are now able to solve complex problems.

People do things because there’s a perceived demand for it. If the demand is “I just like coding”, then as long as you keep coding there’s no failure.

Pike’s technical legacy has very visible scars from these blind spots of his.

Read the whole story
Flameeyes
6 days ago
reply
London, Europe
Share this story
Delete

The job market is a market

1 Share

(Like this article? Read more Wednesday Wisdom! No time to read? No worries! This article will also become available as a podcast on Thursday)

My first job was at a bank . When it came time to discuss compensation, HR sent me a letter that explained that my future job was categorized as a level 8 job and that per the collective agreement negotiated between the unions of bank employees and the organization of bank employers, my salary would be X. It was my first job, I had no idea what X meant and I still lived at home, so I barely needed money anyway. The collectively bargained agreement also came with standard increases every year to account for growing experience and of course there was the yearly inflation correction. It was highly organized and nobody worried about compensation.

Things definitely changed. I earn 10X now and I need it. Also, things are not highly organized anymore, except for Wednesday Wisdom which appears every Wednesday like clockwork.

A few years later, I worked for another bank and by that time the shortage of IT talent was starting to hurt. Because of the constraints of the wage tables negotiated by the unions, it was getting hard for banks (and other sectors of the economy that were ruled by similar agreements) to pay enough money to stay competitive. There was a healthy (or perhaps unhealthy) stream of people who left these organizations to work for “software bureaus”. These were body shops that paid a higher salary than organizations in sectors run by collectively bargained agreements could offer and then contracted staff out to their original employers.

Example: One of my colleagues left on Friday and came back on Monday as a contractor, but with a higher salary and a nice company car. He was very good and very nice so we didn’t begrudge him that.

Unions were unwilling to make allowances for certain types of expertise, all in the name of fairness. In their eyes, a software engineering role that required a four year college degree should be paid about the same as any other role that required a four year college degree, market conditions be damned. In later years, the unions grudgingly agreed to a ”labor market bonus”. With this bonus certain types of jobs could be offered a higher salary than other jobs that required about the same amount of education and expertise.

A cursory web search indicated that this is still a thing in Holland. In one sector’s agreement I found a clause that said that the labor market bonus could be as much as 10%. This being Holland, the consent of the workers council is of course required because generally people hate it when other people earn more than they do.

Even with the labor market bonus, employers had a difficult time hiring into certain jobs because there were other benefits that the body shops could offer and they couldn’t, such as a company car, luxurious trips, and relatively generous expense accounts. For people who thought their skills were worth even more, there was the option to start their own one-man body shop. This came with the obvious advantage that you could bag all of the contracting hourly rate. On top of that, you did not have to pay certain types of social security taxes that applied to employees but not to individuals who ran their own small businesses. I started my own one-man operation in 1992 (and didn’t have a regular job again until I joined Google in 2006).

Examples of such social security taxes are the unemployment tax and the disability tax. Most of these “one-cylinder” operations figured they would never be unemployed. Some (like myself) insured themselves privately for disability. Being a one-cylinder operation also came with options to lower your total tax burden by stashing any money you made but did not need in an LLC, thereby hiding it from income tax.

Then the 90s hapened and the official motto of society became: “Laissez les bons temps rouler”. The whole economy moved in the direction of laissez-faire capitalism and everyone in the IT sector benefited, as tech companies enjoyed the benefits of the network effect and disrupted one market after another. The world went from “a few computers, somewhere” to “lots of computers, everywhere” and so the demand for people who knew how to build, program, and maintain these computers grew and grew. On top of that, social democracy was out and neoliberalism was in, and that ideology prescribes that everyone must get as much out of the market for themselves as possible, whatever the consequences.

This had two effects: First and foremost, salaries went up because the regulations went down and the job market is, well, a market. Secondly: Lots of new people entered the market, drawn in by the high prices even a junior engineer could demand; these were the times where we had to offer equity to interns! The result of this was that anyone working in information technology did phenomenally well as the price of employment went through the roof. Everyone got accustomed to high and growing compensation as well as to other forms of remuneration: Free food, fun offices with pinball machines, gym memberships, lots of flexibility to work from home, free shuttles to the office, unlimited PTO, generous 401(k) programs, employee stock purchase programs, free phones, paid-for Internet connection at home, and free snacks. No idea was too wild if it could help get scarce talent through the door.

One often-neglected insight from this is that it is really expensive to be poor. If you are struggling in a low wage job, you probably also have to buy all of your own food, pay for your own gym membership (if you can even afford it), pay for your own Internet, and pay for your own commute. Nobody will invite you to cushy trips or give you $1500 for Christmas just because. On the other hand at Google we had people complaining that they had to arrange for their own food during the weekend and people got terribly upset when the catering staff organized a meatless Monday, with the money not spent on meat donated to the World Food Program. One engineer was so upset that he loudly complained that “it is impossible to work under these circumstances!”

It is not hard to understand that these very pleasant conditions made software engineers a happy lot. Unfortunately, part of that happiness was caused by having a really short memory and most people had forgotten what working conditions were actually like before this holiday from history.

One of the more annoying aspects of our hyper anxious news cycle is that whenever things are going well, there are lots of pundits that will declare that things have fundamentally shifted and that this time, really, the trees will grow into the heavens forever and will never stop growing. We regularly hear economists confidently declare that there will never be a recession again. Other economists have been know to state equally confidently that the economy will continue to grow at this amazing pace and that everybody will be rich. The understanding that many aspects of the human experience are a sine wave seems to go right out of the window every time we are nearing the top of the curve.

Book tip: Y2K: How the 2000s Became Everything (Essays on the Future That Never Was) by Colette Shade is a must-read book for people who grew into adulthood in the 1990s and early 2000s. One of the essays is about the outrageous economic claims that were made.

These people are obviously clowns and have conveniently forgotten that in a perfect market all profits eventually go to epsilon. If there is money being made anywhere, that is either a temporary condition or a sign that there are unnatural barriers to entry, such as borders, limited availability of work visas, or state supported monopolies. That said, companies that were on the hunt for scarce talent until very recently continued to offer ever more outrageous compensation: Free massages in the office, a concierge service to help you make restaurant reservations at the French Laundry, or a 10% pay rise for everyone. The “bon temps” were really “rouler-ing”!

But, pilots know that everything that goes up, must eventually come down, and in the case of the demand/supply ratio for software engineers, that landing was rough! Many people had started to believe that the good times would go on forever and, even more weirdly, had started to believe that the cushy job conditions that they had become accustomed to was some sort of natural state of affairs. I can potentially forgive you for these mistaken beliefs if you were born after the year 1994 or so, but everyone older than that really should have known better.

With the supply/demand ratio suddenly inverted, companies were freed from having to offer high compensation and cushy work conditions, so things quickly reverted to a more normal state of affairs, with downward pressure on compensation, downleveling on hire, layoffs, and “voluntary” exits.

Here is a fun side story about voluntary exits. Years ago, a large Dutch bank I was working for wanted to let go of some people but the unions strenuously objected. They struck a deal where 25% of people were categorized as A-class employees and the rest as B-class employees. The A-class employees got a letter saying: “You are in the A-class, we are happy with you, please don’t leave”. The B-class employees got a letter that said: “You are in the B-class, we are not going to fire you, but if you leave of your own accord, you’ll get a package”. The result: 100% of employees were unhappy! The A-class people were unhappy because they could easily get a job somewhere else and they wouldn’t mind a package. The B-class people were unhappy because, well, they were now officially labeled as B-class.

Local VPs had significant leeway in assigning people to the A or B class. Some VPs had a rule that if you were with the bank for less than one year, you were automatically in the B class. A friend of mine who was very, very, good at his job had been with the bank for 11 months and so was declared to be in the B-class. During meetings, when asked difficult questions, he would answer: “That seems more of a question for someone in the A-class 🙂”. Of course some A-class employees left because they were upset and some of the best of the B-class also left for other jobs, but with a package. The real stragglers in the B-class stayed of course because they knew they would never get a job anywhere else. So overall: Lots of employees were upset, some very good people left, and the people they most desperately wanted to get rid off, stayed.

I see lots of stories on LinkedIn where people are complaining about companies engaging in practices that decrease the comfort and remuneration of employees. Return to the office is a big hot button topic, but so are complaints about wage decreases, lay-offs, and other practices that are inspired by companies’ desire to lower their cost base in a market where there is ample supply.

I fear that most of these people have forgotten that the job market is, in fact, a market. During the holiday from history, companies did not offer high salaries and cushy benefits because they wanted to be nice, they did so because otherwise they would not be able to hire any staff! Of course you should not be afraid to ask a high price for your skills when they are in high demand, but you should also be ready to flex when market conditions change. You might think it dumb for companies to get all of their employees back into the office, but all things considered, that is not your call to make. If it is dumb, then the market will eventually punish these companies because they will not be able to get the best people at the best price. But if a free market means anything, it definitely also means that companies are free to offer the working conditions that they want to (bar their responsibility vis-a-vis the law of course); freedom also means the freedom to be wrong.

It is of course possible to change the law and to make some of the cushier conditions that we like mandatory. For instance in Holland you have a legal right to part-time work and in Sweden the parents have 480 days of parental leave (with a minimum of 90 days for the father, giving rise to the phenomenon of the Latte Dad). Everything is possible, really the only laws are the laws of physics. But you are not free from the consequences of any choices that are made. These countries have higher taxes and lower wages for the upper echelons of the job market. They have near-free education, cheaper healthcare and better retirement for most people, but less access to experimental treatments and drugs. Really everything is a market when you look at it from the right angle. If you think we should change the laws to change the deal between employees, employers, and the state, my advice is to become politically active. I know I am…

Anyway, back to employment conditions. If for instance working from home is super important to you, that is your good right and if it was important to me I would definitely find an employer who would offer that. But, the market being what it is, I might want to compromise on that in favor of other things. I look at my entire employment package holistically; there are things in there I like, things I don’t care about, and things that I do not like but which I can stomach because, all things considered, I want a good-sized paycheck and this is the best way to get one. Telling companies they are doing it all wrong is your good right, but I fear it is not very productive in a market where you also want to be hired by these companies. Remember: You are selling, the companies are buying, and as my dear old mother used to say: “Nobody has ever won an argument with a customer.”

We are sadly (for us) in a time where software engineers have less market power and companies are over eager to exploit that situation to the fullest, just like we were ready to exploit the situation to the fullest when the situation was reversed. But, rejoice, as long as the engines are still working, everything that comes down pretty much always goes back up again. Again, pilots know that when you are in an unaccelerated flight and push on the throttle, the plane will go up; that’s just physics.

That said, I am very happy I am not a new-grad these days, because that surely must suck major balls. But then again, when I graduated from college in 1988 we thought that there would be people who would never ever have a job and clearly most everyone did. It might be a bit of a rough ride for a while, but the good times will come back. They always do. In the meantime, make sure you have a competitive offering for any prospective employers!

Another thing that would suck major balls is when you miss Wednesday Wisdom! Subscribe today and stop the suckage.



Read the whole story
Flameeyes
13 days ago
reply
London, Europe
Share this story
Delete

No more shell scripts!

1 Share

(Like this article? Read more Wednesday Wisdom! No time to read? No worries! This article will also become available as a podcast on Thursday)

It might be difficult to imagine for young people, but in the olden days it used to be quite difficult to write software. And I do not mean this in a “requirements are hard and so is code correctness” kind of way; I mean that if you got yourself a computer, you were just not able to open up an editor, write some code, and then compile and/or run it for lack of any software engineering related tools.

Oh how the times have changed. Not only do you get a free C compiler with almost every platform, you can even subscribe to wonderful newsletters like Wednesday Wisdom! Also for free!

At that time, selling editors, compilers, and other programming tools was a business and computers did not come with any tools that would allow you to start writing serious code. The best you could hope for was a crappy BASIC interpreter that lacked any features for serious software development. If you wanted to compile your code, use a decent programming language, use a “database”, or develop user interfaces, you often had to shell out amounts of money that were beyond the means of most individuals. This fact gave rise to a very active software piracy movement and I partook liberally in that, copying software packages from friends and spending hours at the Xerox machine to copy manuals. The dearth of programming tools existed up and down the platform hierarchy, from measly personal computers to large mainframes, where the companies I worked for had to shell out thousands, if not tens of thousands, of guilders per month for the privilege of using IBM’s COBOL compiler. Open source did not really exist yet and to the extent that it did, the absence of the Internet made it very hard to distribute anything.

Unix was notably different. It came bundled with a C compiler and a suite of tools that enabled software development, like make, lex, and yacc. This was amazing and novel! An operating system that came with a compiler and other development tools? Unheard of! Unfortunately, Unix was not widely available because it required powerful hardware that was beyond the means of individuals and small companies. To top it off, commercial Unix implementations, like HP-UX and SunOS, often only contained a minimal K&R C compiler that was just enough to configure the kernel, while they charged an arm and a leg for their more powerful ANSI C compiler.

When gcc came around it contained an intricate bootstrap where it used the platform’s K&R C compiler to create a small ANSI C compiler that was then used to compile itself again and then the entire compiler collection.

Even when you did have access to a system with basic development tools, writing software was arduous because of the absence of libraries and components that solved higher level problems. If you wanted to parse an email address, use a hash map, or solve a linear programming problem, you were completely on your own. There was no GitHub, NPM, CPAN, or equivalent repository of code that you were free to download and use. Consequently, coding was a slog, and only the most committed individuals could get stuff done. Every piece of code you needed to write was a project and, because of the time needed, a very costly project at that.

This sad fact became double plus sad once you started considering the needs of system administrators. The process of running a complicated system requires issuing myriads of commands to deal with operational chores and to solve problems. Many procedures consist of issuing the same commands every time in more or less the same order or with only minimal variation. These system administration tasks lend themselves well to automation, but most systems did not come with enough programming tools to develop this automation.

Unix changed all that.

In one of their many strokes of brilliance, the original Unix designers created a system that was modular and with components that could easily be combined in elegant ways to create more powerful functionality. It is hard to see in hindsight how incredibly revolutionary and brilliant their design was. A proof of this is that it held up remarkably well over time, whereas other models for building operating systems either disappeared or had to embrace Unix’s features as best they could.

A consequence of Unix’s design was that the majority of operating system commands, even the ones that were provided by third parties, were composable. Whenever I taught Unix Fundamentals, people often asked me: “Does Unix have a command for <this or that>”. “No”, my answer would typically be, “but you can build it yourself by stringing together <this set of Unix commands>.” This way of working proved to be very future proof, so I still regularly type commands like docker container rm $(docker container ps -a | cut -f1 -d\ | grep -v CONTAINER) in order to remove all dangling docker containers on my MacBook. Maybe a better command exists, maybe the Docker designers gave me a combination of commands and flags to do this with one command invocation, but courtesy of Unix, that is not how my mind works anymore. Pipes and command substitution means that I look at every Unix command as a simple filter to be used in a pipeline.

Another stroke of brilliance of the Unix designers was to blur the line between the command line interface and the interpreter of a simple programming language. The Unix shell contains statements like if, while, and case; it allows the definition of functions, understands the concept of variables, and can read commands from a file as well as from the terminal; a feature greatly helped by the fact that in Unix every stream of bytes is a file, whether you type them from the keyboard or whether they come from a disk.

This is yet another thing that might seem obvious now but which was revolutionary when it was first mentioned.

This design allows writing simple programs that are executed by the command line interpreter. Eat that COMMAND.COM! Apart from the aforementioned control statements, the original shell’s programming language did not contain many special built-in statements like PRINT or CLS; the shell was “just” a more powerful way to compose Unix’s basic commands into more powerful units of functionality. And so, the shell script was bourne; the ideal programming tool for automating system administration.

When I first came across shell scripts and understood how everything hung together, I had a veritable religious experience. “If there is a God”, I thought, “this is surely Her operating system design”. I was not the only one who thought that and shell script programming took off like wildfire as an unleashed generation of system administrators automated everything in sight with ever more intricate shell scripts. The shell itself morphed too: The hippies over at Berkeley invented the C Shell, AT&T retaliated with the Korn Shell, the bureaucrats over at Posix standardized the Posix shell, the open source world built and released the Bourne Again Shell, and for some reason that I cannot quite figure out, I seem to have standardized on the Z-shell.

Probably because it allows themes, which solidly brings the shell in the 21st century with its focus on mindless fluff and window dressing.

Since their inception in 1979, Shell scripts have held up remarkably well as a vehicle for automating system administration tasks. Pretty much every SRE and Linux software engineer I know writes or edits shell scripts on an almost daily basis in order to keep their systems going. Shell scripts configure the system, make backups, remove old log files, and run CI/CD pipelines. This fact is quite remarkable, given that the shell script language is actually an awful programming language. It has weird syntax, is untyped, cannot do math very well, does not deal kindly with strings containing spaces, and has terrible error recognition and recovery. Nobody in their right mind would invent a programming language like that today.

I am saying that, but Microsoft relatively recently gave us PowerShell which shows their unparalleled capability to take good ideas and implement them badly.

On top of that, shell scripts are not very portable. They are (by design) completely dependent on their environment and therefore you cannot even reliably run a shell script designed for one Linux distribution on another one, unless you take extreme precautions in the code. Because shell scripts string existing commands together, they are very sensitive to the tools installed. If your shell script depends on awk, you might be well advised to wonder if it is resistant against the subtle differences between Berkeley’s original awk, GNU awk, and SunOS’s “new” awk.

A lot of shell scripts parse the output of command invocations, which is great and powerful, until such time as the command outputs something unexpected because of an abnormal condition or a newer version of the command comes with changes to the output, like, I don’t know, maybe add an extra space or perhaps changes the output altogether.

I dare you to export LANG=nl_NL and see if your scripts still work 🙂.

Shell scripts are also extremely sensitive to characters that are special to the shell such as the the asteriks, white space, question marks, quotes, parenthesis, and backslashes. These characters have been breaking shell scripts since 1979, for instance when they appear in file names, variable values, or the output of commands that get substituted back into the command line.

In recent times we have gotten many new programming languages and many great libraries for all sorts of common problems. Some of these newer programming languages are terrible, most are decent and some are great. For most tasks, there is by really no good reason anymore to code shell scripts. If you know what you are doing, coding a Go program is about as fast; it will yield you a binary that might be bigger, but is also faster and, most importantly, way more robust.

During a recent project I ran a secure database in a container and I needed to take regular backups of that database that needed to be sent to an off-site cloud storage location. None of this is particularly problematic, so I started with a simple shell script. However, over time the requirements grew: I needed to integrate with a secure way to obtain database credentials, I wanted to emit metrics upon termination, and I had to interface with different cloud object stores. So the shell script grew and grew. The fact that I implemented all this in a shell script also added other complexities. For instance: To run my shell script on an hourly schedule, I needed to run cron in my container, which meant I had to run the container start command as root and then drop privileges later. This, as well as all other problems that were created by using a shell script, can of course be overcome with some additional shell script software engineering, but it did start to look like a wobbly pile of software. And as my friend Doug R. says: “You cannot make a wobbly pile of software more reliable by making it higher.”

So, one fine day I fired up the Go compiler and rewrote the shell script as a Go program with a built-in cron scheduler, with interfaces to different cloud providers, while emitting metrics to Prometheus. The resulting binary is faster and more robust. It is bigger, but it is better.

The defenders of shell scripts will say that it is possible to write robust cross-platform shell scripts and they are right, you can! But the fact that you can, doesn’t mean that you should. You can write a docker-like container system and a web server in shell script, but that doesn’t mean it is a good idea. With modern programming languages and libraries we can use a better language to create a better program in about the same amount of time.

So, it is time to say goodbye to shell scripts. They were amazing while we needed them, but we moved on and we got better things now.

Among the better things we have now is Wednesday Wisdom. It is more beautifully typeset than most shell scripts and easier to read too. Subscribe today!



Read the whole story
Flameeyes
22 days ago
reply
London, Europe
Share this story
Delete

Faster computers afford dumber solutions

1 Share

(Like this article? Read more Wednesday Wisdom! No time to read? No worries! This article will also become available as a podcast on Thursday)

I started my professional life as an MVS Systems Programmer (a sort of SRE avant la lettre) at a sizable Dutch bank. The team I was in took care of the core computing platform of the bank, which consisted of no fewer than three computers. Yes, you read that right. Three computers! These three machines ran all of the core ledger processing, all batch bank transactions, all mortgage calculations, and all of the loan administration for the entire bank. In the second year of my career, we upgraded the “big” computer in our primary datacenter to a whopping six CPUs and 64MB of RAM! Now we were truly playing with power!

You might have seem me write “Now you are playing with power” before. It is from a Nintendo ad. Subscribe to Wednesday Wisdom. It comes with zero ads!

In parallel, I dabbled in software development on personal computers. My home PC had a single 4.77Mhz CPU, a 20MB hard disk and 512KB of RAM. Even then, 20MB hard disk space was not an awful lot, so eventually I installed a compressing disk driver (Stacker) that increased the virtual capacity of my disk to about 38MB at the cost of some CPU speed and the added bonus of hopelessly corrupting my file system approximately once per year.

In the late 1990s I went through a similar phase when I migrated my Linux laptop to ReiserFS. Because of improvements in the block management layer, ReiserFS was better suited for systems with lots of small files. This gave me more free disk space, but it also provided the feature of hopelessly corrupted my file system about once per year.

ReiserFS went out of fashion when Hans Reiser killed his lovely wife Nina and after a grueling trial, where he behaved like the world class a-hole we all knew him to be, he went to prison. The book about this saga is a must-read for anyone who was involved with Linux in the early 2000s. By the way, Hans seems to be a changed man these days, as witnessed by a letter he wrote to the Linux community earlier this year.

For most of my career, computers were so underpowered that whatever we wanted to do was either not possible or required really smart solutions. Consequently, I spent a lot of time either explaining to customers that whatever they wanted was not possible or concocting up all sorts of ways to do whatever it was they wanted. This was not merely an issue of underpowered hardware; the software we worked with was equally underpowered. For instance, for one piece of code I had to write on the mainframe I wanted a basic regular expression parser. Since none was available in any of the libraries, I rolled my own in S/370 assembler. Super fun, but you can imagine what this does for development velocity in terms of story points per second. Not that we knew what story points where, but that is a topic for another day 🙂.

When I started working, a lot of (now) basic computer science had not been invented yet. For instance the Paxos algorithm was only submitted for review in 1989, a year after I graduated from college. So in those days, if we needed distributed consensus, we were straight out of luck or we needed a shared device such a channel-to-channel controller (CTC) or a shared SCSI-disk so that we could abuse the SCSI lock command to turn it into a distributed locking service.

Pro tip: Do not forget to change the SCSI bus address on one of the systems’ controllers from 7 to 6 and set the dip switches on the shared disk to address 5.

A lot of very bright people wrote a lot of incredible software for these underpowered machines with little or no help from libraries or the operating system. It was all very impressive.

But, computers became cheaper and more powerful and in the mid 1990s we reached a point where they were powerful enough to do pretty much all of the things that we wanted done. It was a joyful time and if someone needed an 80GB database we just laughed at it! Then the Internet came and the problems we needed to solve became global and distributed. Our hitherto more than sufficient computers and networks were once again underpowered for the problems. Fortunately, we had even more smart people on the problem by now, so solutions kept up. We got global file systems, distributed lock managers, key/value stores, data streaming platforms, and eventually globally distributed ACID-style databases.

A problem with solutions for complex problems is that they are complicated themselves and therefore seldom easy to use. In the 1990s we spent a lot of time helping people figuring out their data models and optimizing their SQL queries. In a similar vein, the 2000s saw us help people design key formats for their key-value store and deal with the intricacies of multi-master replication, ordering guarantees, load balancing, and once-and-only-once delivery semantics.

The super smart systems that were developed in the 2000s and 2010s allowed for globally distributed services to be built from the still relatively underpowered machines that were available at the time. So even though life was already much better, we were still in the mode of doing very smart things to create services that were more demanding than what the hardware could comfortably support.

Fortunately, Gordon Moore (the “inventor” of Moore’s law) continued to come round every Christmas to give us bigger CPUs, faster CPUs, more memory, faster memory, bigger disks, faster disks, and faster network interconnects. And to top it off, the scale of the problems didn’t grow anymore, because once you have solved globally distributed services, where is the problem going to grow into? The size of the planet and the speed of light are good upper bounds for anything that we need to do.

So we now find ourselves in the happy space that computing capacity continues to grow but the scale of the problems really doesn’t anymore.

Bar training AI models of course, but that is an entirely different kettle of fish that most people will never have to deal with.

So what are we going to do with all of that extra computing power? The answer: Build dumber solutions faster! That might sound a tad counterintuitive, so let me explain…

Take Kafka as an example. I love Kafka, it is a great data streaming solution that was developed by many smart people to offer a great service, namely distributing data from producers to consumers in more-or-less real-time, while doing its best to do the right thing when it comes to buffering data, dealing with outages, preventing applications from seeing the same message twice (to the extent possible), and, most importantly, dealing with massive amounts of data. All that good work comes at a price, namely non-trivial complexity in the Kafka API. The amazing book “Kafka – The definitive guide” (which can be downloaded for free from Confluent’s website) spends almost a 100 pages explaining how to reliably read and write data from and to Kafka. In the course of that 100 pages it does a deep dive into Kafka’s internals because, in order to do this right, you need to understand brokers, sharding, partitions, offsets, commits, rebalancing, acknowledgments, and other fun topics. All of this complexity gets you something amazing though: Real-time and high volume data streaming using a fleet of cheap and small computers.

But, do we really need all of this advanced technology?

A large part of Kafka’s complexity comes from the fact that it can deal with “slow” network interfaces, even slower hard disks, and systems with limited memory. These are not typically the computers we have these days. If I go to AWS these days, I can easily get a machine with 72 cores, 512 GB of RAM, 15 TB SSD, and 25 Gbit/sec of networking. I can probably implement all of my data streaming needs on this machine alone, using a Postgres database and a few smart queries to find all the data that I haven’t seen yet. True, it is not as advanced as Kafka, but a lot simpler to use and probably adequate for most use cases.

Or take file drop boxes. Since time immemorial we have had this pattern where someone drops a file into a directory (e.g. using FTP) and then some task needs to pick it up and process it. It is easy to poll the directory regularly, but in order to make processing snappy, the polling frequency needed to be high and for the longest time that was a waste of valuable resources. So instead, we implement all sorts of cleverness using the inotify(7) API or its equivalents. Unfortunately, inotify has some annoying edge cases and it is surprisingly tricky to get it right.

But, with today’s fast computers, we can just poll the directory once per second using a Python script and not really notice any slowdown. The fast CPUs makes the Python script fast enough, the fast SSD obviates the need to move the disk heads around like a mofo, and lots of memory makes Python’s terrible memory usage go away.

Really fast computers allow for dumb solutions that serve most use cases well. So whenever someone calls for an advanced solution (most likely because they want to put this technology on their resume), look at the problem carefully to see if a dumb and straightforward solution doesn’t get you there as well!

The time I save writing dumb software, I use to write smart columns, like Wednesday Wisdom. Subscribe today to see if it too becomes dumber over time.



Read the whole story
Flameeyes
58 days ago
reply
London, Europe
Share this story
Delete

Why the CrowdStrike bug hit banks hard

2 Shares
Why the CrowdStrike bug hit banks hard

Programming note: I recently launched a weekly podcast, Complex Systems with Patrick McKenzie. About 50% of the conversations cover Bits about Money's beat. The remainder will be on other interesting intersections of technology, incentives, culture, and organizational design. The first three episodes covered teaching trading, Byrne Hobart on the epistemology of financial firms, and the tech industry vs. tech reporting divide. Subscribe to it anywhere you listen to podcasts. If you enjoy it, writing a review (in your podcast app or to me via email) helps quite a bit.

On July 19th, a firm most people have sensibly never heard of knocked out a large portion of the routine operations at many institutions worldwide. This hit the banking sector particularly hard. It has been publicly reported that several of the largest U.S. banks were affected by the outage. I understand one of them to have idled tellers and bankers nationwide for the duration. (You’ll forgive me for not naming them, as it would cost me some points.) The issue affected institutions across the size spectrum, including large regionals and community banks.

You might sensibly ask why that happened and, for that matter, how it was possible it would happen.

You might be curious about how to quickly reconstitute the financial system from less legible sources of credit when it is down. (Which: probably less important as a takeaway, but it is quite colorful.)

Brief necessary technical context

Something like 20% of the readership of this column has an engineering degree. To you folks, I apologize in advance for the following handwaviness. (You may be better served by the Preliminary Post Incident Review.)

Many operating systems have a distinction between the “kernel” supplied by the operating system manufacturer and all other software running on the computer system. For historical reasons, that area where almost everything executes is called “userspace.”

In modern software design, programs running in userspace (i.e. almost all programs) are relatively limited in what they can do. Programs running in kernelspace, on the other hand, get direct access to the hardware under the operating system. Certain bugs in kernel programming are very, very bad news for everything running on the computer.

CrowdStrike Falcon is endpoint monitoring software. In brief, “endpoint monitoring” is a service sold to enterprises which have tens or hundreds of thousands of devices (“endpoints”). Those devices are illegible to the organization that owns them due to sheer scale; no single person nor group of people understand what is happening on them. This means there are highly variable levels of how-totally-effed those devices might be at exactly this moment in time. The pitch for endpoint monitoring is that it gives your teams the ability to make those systems legible again while also benefitting from economies of scale, with you getting a continuously updated feed of threats to scan for from your provider.

One way an endpoint might be effed is if it was physically stolen from your working-from-home employee earlier this week. Another way is if it has recently joined a botnet orchestrated from a geopolitical adversary of the United States after one of your junior programmers decided to install warez because the six figure annual salary was too little to fund their video game habit. (No, I am not reading your incident reports, I clarify for every security team in the industry.)

In theory, you perform ongoing monitoring of all of your computers. Then, your crack security team responds to alerts generated by your endpoint monitoring solution. This will sometimes merit further investigation and sometimes call for immediate remedial work. The conversations range from “Did you really just install cracked Starcraft 2 on your work PC? … Please don’t do that.” to “The novel virus reported this morning compromised 32 computers in the wealth management office. Containment was achieved by 2:05 PM ET, by which point we had null routed every packet coming out of that subnet then physically disconnected power to the router just to be sure. We have engaged incident response to see what if any data was exfiltrated in the 47 minutes between detection and null routing. At this point we have no indications of compromise outside that subnet but we cannot rule out a threat actor using the virus as a beachhead or advanced persistent threats being deployed.”

(Yes, that does sound like a Tom Clancy novel. No, that is not a parody.)

Falcon punched

Falcon shipped a configuration bug. In brief, this means that rather than writing new software (which, in modern development practice, hopefully goes through fairly extensive testing and release procedures), CrowdStrike sent a bit of data to systems with Falcon installed. That data was intended to simply update the set of conditions that Falcon scanned for. However, due to an error at CrowdStrike, it actually caused existing already-reviewed Falcon software to fail catastrophically.

Since that failure happened in kernelspace at a particularly vulnerable time, this resulted in Windows systems experiencing total failure beginning at boot. The user-visible symptom is sometimes called the Blue Screen of Death.

Configuration bugs are a disturbingly large portion of engineering decisions which cause outages. (Citation: let’s go with “general knowledge as an informed industry observer.” As always, while I’ve previously worked at Stripe, neither Stripe nor its security team necessarily endorses things I say in my personal spaces.)

However, because this configuration bug hit very widely distributed software running in kernelspace almost universally across machines used by the workforce of lynchpin institutions throughout society (most relevantly to this column, banks, but also airlines, etc etc), it had a blast radius much, much larger than typical configuration bugs.

Have I mentioned that IT security really likes military metaphors? “Blast radius” means “given a fault or failure in system X, how far afield from X will we see negative user impact.” I struggle to recall a bug with a broader direct blast radius than the Falcon misconfiguration.

Once the misconfiguration was rolled out, fixing it was complicated by the tiny issue that a lot of the people needed to fix it couldn’t access their work systems because their machine Blue Screen of Death’ed.

Why? Well, we put the vulnerable software on essentially all machines in a particular institution. You want to protect all the devices. That is the point of endpoint monitoring. It is literally someone’s job to figure out where the devices that aren’t endpoint monitored exist and then to bring them into compliance.

Why do we care about optimizing for endpoint monitoring coverage? Partly it is for genuinely good security reasons. But a major part of it is that small-c compliance is necessary for large-C Compliance. Your regulator will effectively demand that you do it.

Why did Falcon run in kernelspace rather than userspace?

Falcon runs in kernelspace versus userspace in part because the most straightforward way to poke its nose in other programs’ business is to simply ignore the security guarantees that operating systems give to programs running in userspace. Poking your nose in another program’s memory is generally considered somewhere between rude and forbidden-by-very-substantial-engineering-work. However, endpoint monitoring software considers that other software running on the device may be there at the direction of the adversary. It therefore considers that software’s comfort level with its intrusion to be a distant secondary consideration.

Another reason Falcon ran in kernelspace was, as Microsoft told the WSJ, Microsoft was forbidden by an understanding with the European Commission from firmly demoting other security software developers down to userspace. This was because Microsoft both a) wrote security software and b) necessarily always had the option of writing it in kernelspace, because Microsoft controls Windows. The European Commission has pushed back against this characterization and pointed out that This Sentence Uses Cookies To Enable Essential Essay Functionality.

Regulations which strongly suggest particular software purchases

It would be an overstatement to say that the United States federal government commanded U.S. financial institutions to install CrowdStrike Falcon and thereby embed a landmine into the kernels of all their employees’ computers. Anyone saying that has no idea how banking regulation works.

Life is much more subtle than that.

The United States has many, many different banking regulators. Those regulators have some desires for their banks which rhyme heavily, and so they have banded into a club to share resources. This lets them spend their limited brainsweat budgets on things banking regulators have more individualized opinions on than simple, common banking regulatory infrastructure.

One such club is the Federal Financial Institutions Examination Council. They wrote the greatest crossover event of all time if your interests are a) mandatory supervisory evaluations of financial institutions and b) IT risk management: the FFIEC Information Technology Examination Handbook's Information Security Booklet.

The modal consumer of this document is probably not a Linux kernel programmer with a highly developed mental model of kernelspace versus userspace. That would be an unreasonable expectation for a banking supervisor. They work for a banking regulator, not a software company, doing important supervisory work, not merely implementation. Later this week they might be working on capital adequacy ratios, but for right now, they’re asking your IT team about endpoint monitoring.

The FFEITC ITEH ISB (the acronym just rolls off the tongue) is not super prescriptive about exactly what controls you, a financial institution, have to have. This is common in many regulatory environments. HIPAA, to use a contrasting example, is unusual in that it describes a control environment that you can reduce to a checklist with Required or Optional next to each of them. (HIPAA spells that second category “Addressable”, for reasons outside the scope of this essay, but which I’ll mention because I don’t want to offend other former HIPAA Compliance Officers.)

To facilitate your institution’s conversation with the examiner who drew the short straw, you will conduct a risk analysis. Well, more likely, you’ll pay a consulting firm to conduct a risk analysis. In the production function that is scaled consultancies, this means that a junior employee will open U.S. Financial Institution IT Security Risk Analysis v3-edited-final-final.docx and add important client-specific context like a) their name and b) their logo.

That document will heavily reference the ITEH, because it exists to quickly shut down the line of questioning from the examiner. If you desire a career in this field, you will phrase that as “guiding the conversation towards areas of maximum mutual interest in the cause of 'advanc[ing] the nation’s monetary, financial, and payment systems to build a stronger economy for all Americans.'” (The internal quotation is lifted from a job description at the Federal Reserve.)

Your consultants are going to, when they conduct the mandatory risk analysis, give you a shopping list. Endpoint monitoring is one item on that shopping list. Why? Ask your consultant and they’ll bill you for the answer, but you can get my opinion for free and it is worth twice what you paid for it: II.C.12 Malware Mitigation.

Does the FFEITC have a hugely prescriptive view of what you should be doing for malware monitoring? Well, no:

Management should implement defense-in-depth to protect, detect, and respond to malware. The institution can use many tools to block malware before it enters the environment and to detect it and respond if it is not blocked. Methods or systems that management should consider include the following: [12 bullet points which vary in specificity from whitelisting allowed programs to port monitoring to user education].

But your consultants will tell you that you want a very responsive answer to II.C.12 in this report and that, since you probably do not have Google’s ability to fill floors of people doing industry-leading security research, you should just buy something which says Yeah We Do That.

CrowdStrike’s sales reps will happily tell you Yeah We Do That. This web page exists as a result of a deterministic process co-owned by the Marketing and Sales departments at a B2B software company to create industry-specific “sales enablement” collateral. As a matter of fact, if you want to give CrowdStrike your email address and job title, they will even send you a document which is not titled Exact Wording To Put In Your Risk Assessment Including Which Five Objectives And Seventeen Controls Purchasing This Product Will Solve For.

CrowdStrike is not, strictly speaking, the only vendor that you could have installed on every computer you owned to make your regulators happy with you. But, due to vagaries of how enterprise software sales teams work, they sewed up an awful lot of government-adjacent industries. This was in part because they aggressively pursued writing the sort of documents you need if the people who read your project plans have national security briefs.

I’m not mocking the Federal Financial Institutions Examining Council for cosplaying as having a national security brief. (Goodness knows that that happens a lot in cybersecurity... and government generally. New York City likes to pretend it has an intelligence service, which is absolutely not a patronage program designed to have taxpayers fund indefinite foreign vacations with minimal actual job duties.)

But money is core societal infrastructure, like the power grid and transportation systems are. It would be really bad if hackers working for a foreign government could just turn off money. That would be more damaging than a conventional missile being fired at random into New York City, and we might be more constrained in responding.

And so, we ended up in a situation where we invited an advanced persistent threat into kernelspace.

It is perhaps important to point out that security professionals understand security tools to themselves introduce security vulnerabilities. Partly, the worry is that a monoculture could have a particular weakness that could be exploited in a particular way. Partly, it is that security tools (and security personnel!) frequently have more privileges than is typical, and therefore they can be directly compromised by the adversary. This observation is fractal in systems engineering: at every level of abstraction, if your control plane gets compromised, you lose. (Control plane has a specific meaning in networking but for this purpose just round it to “operating system (metaphorical) that controls your operating systems (literal).”)

CrowdStrike maintains that they do not understand it to be the case that a bad actor intentionally tried to bring down global financial infrastructure and airlines by using them as a weapon. No, CrowdStrike did that themselves, on accident, of their own volition. But this demonstrates the problem pretty clearly: if a junior employee tripping over a power cord at your company brings down computers worldwide, the bad guys have a variety of options for achieving directionally similar aims by attacking directionally similar power cords.

When money stops money-ing

I found out about the CrowdStrike vulnerability in the usual fashion: Twitter. But then my friendly local bank branch cited it (as quote the Microsoft systems issue endquote) when I was attempting to withdraw cash from the teller window.

My family purchased a duplex recently and is doing renovation prior to moving in. For complex social reasons, a thorough recitation of which would make me persona non grata across the political spectrum, engaging a sufficient number of contractors in Chicago will result in one being asked to make frequent, sizable payments in cash.

This created a minor emergency for me, because it was an other-than-minor emergency for some contractors I was working with.

Many contractors are small businesses. Many small businesses are very thinly capitalized. Many employees of small businesses are extremely dependent on receiving compensation exactly on payday and not after it. And so, while many people in Chicago were basically unaffected on that Friday because their money kept working (on mobile apps, via Venmo/Cash App, via credit cards, etc), cash-dependent people got an enormous wrench thrown into their plans.

I personally tried withdrawing cash at three financial institutions in different weight classes, as was told it was absolutely impossible (in size) at all of them, owing to the Falcon issue.

At one, I was told that I couldn’t use the tellers but could use the ATM. Unfortunately, like many customers, I was attempting to take out more cash from the ATM than I ever had before. Fortunately, their system that flags potentially fraudulent behavior will let a customer unflag themselves by responding to an instant communication from the bank. Unfortunately, the subdomain that communication directs them to runs on a server apparently protected by CrowdStrike Falcon.

It was not impossible at all financial institutions. I am aware of a few around Chicago which ran out of physical cash on hand at some branches, because all demand for cash on a Friday was serviced by them versus by “all of the financial institutions.” (As always happens during widespread disturbances in infrastructure, there quickly arises a shadow economy of information trading which redirects relatively sophisticated people to the places that are capable of servicing them. This happens through offline social networks since time immemorial and online social networks since we invented those. The first is probably more impactful but the second is more legible, so banking regulators pretend this class of issues sprang fully formed from the tech industry just in time to bring down banks last year.)

I have some knowledge of the history of comprehensive failures of financial infrastructure, and so I considered doing the traditional thing when convertibility of deposits is suspended by industry-wide issues: head to the bar.

A hopefully unnecessary disclaimer: the following is historical fact despite rhyming with stereotype.

Back in 1970, there was a widespread and sustained (six months!) strike in the Irish banking sector. Workers were unable to cash paychecks because tellers refused to work. So, as an accommodation for customers, operators of pubs would cash the checks from the till, trusting that eventually checks drawn on the accounts of local employers would be good funds again. 

Some publicans even cashed personal checks, backed by the swift and terrible justice of the credit reporting bureau We Control Whether You Can Ever Enjoy A Pint With Your Friends Again. This kept physical notes circulating in the economy.

As I told my contractors, to their confusion, I was unable to simply go down to the local bar to get them cash with the banks down. I don’t have sufficient credit with the operator of the local bar, as I don’t drink.

I told them, to their even greater confusion, that I had considered going down to the parish and buying all their cash on hand with a personal check. Churches, much like bars, have much of their weekly income come through electronic payments but still do a substantial amount of cash management through the workweek heading into the weekend. I’m much more a known quantity at church than I am at the friendly neighborhood watering hole. (Also, when attempting to workaround financial infrastructure bugs to get workers their wages, consider relying on counterparties with common knowledge of James 5:4.)

I eventually resolved the issue in a more boring fashion: I texted someone I reasonably assumed to have cash and asked them to bring it over.

Financial infrastructure normally functions to abstract away personal ties and replace favor-swapping with legibly-priced broadly-offered services.

Thankfully, while this outage was surprisingly deep and broad, banks were mostly back to normal on the following Monday.

Read the whole story
Flameeyes
199 days ago
reply
London, Europe
Share this story
Delete

The year of the enterprise Linux desktop

1 Share

...will never happen more than once at a company.

I say this knowing that chunks of Germany's civil infrastructure managed to standardize on SuSE desktops, and some may still be using SuSE. Some might view this as proof it can be done, I say that Linux desktops not spreading beyond this example is proof of why it didn't happen. The biggest reason we have the German example is because the decision was top down. Government decision making is different than corporate decision making, which is why we're not going to see the same thing happen, a Linux desktop (actually laptop) mandate from on high, more than few times; especially in the tech industry.

it all comes down to management and why Linux laptop users are using Linux in the first place.

You see, corporate laptops (hereafter referred to as "endpoints" to match management lingo) have certain constraints placed upon them when small companies become big companies:

  • You need some form of anti-virus and anti-malware scanning, by policy
  • You need something like either a VPN or other Zero Trust ability to do "device attestation", proving the device (endpoint) is authentic and not a hacker using stolen credentials from a person
  • You need to comply with the vulnerability management process, which means some ability to scan software versions on and endpoint and report up to a dashboard.
  • The previous three points strongly imply an ability to push software to endpoints

Windows has been able to do all four points since the 1990s. Apple came somewhat later, but this is what JAMF is for.

Then there is Linux. It is technically possible to do all of the above. Some tools, like osquery, were built for Linux first because the intended use was on servers. However, there is a big problem with Linux users. Get 10 Linux users in a room, and you're quite likely to get 10 different combination of display manager (xorg or wayland), window manager (gnome, kde, i3, others), and OS package manager. You need to either support heterogeneity or commit to building the Enterprise Linux that has one from each category and forbid others. Enterprise Linux is what the German example did.

Which is when the Linux users revolt, because banning their tiling window manager in favor of Xorg/Gnome ruins their flow -- and similar complaints. The Windows and Apple users forced onto Linux will grumble about their flow changing and why all their favorite apps can't be used, but at least it'll be uniform. If you support all three, you'll get the same 5% Linux users but the self-selected cranky ones who can't use the Linux they actually want. Most of that 5% will "settle" for another Linux before using Windows or Apple, but it's not the same.

And 5% Linux users puts supportability of the platform below the concentration needed to support that platform well. Companies like Alphabet are big enough the 5%  is big enough to make a supportable population. For smaller companies like Atlassian, perhaps not. Which puts Enterprise Linux in that twilight state between outright banned and just barely supported so long as you can tolerate all the jank.

Read the whole story
Flameeyes
256 days ago
reply
London, Europe
Share this story
Delete
Next Page of Stories