802 stories
·
8 followers

Technical Decision Making

1 Share

Technical Decision Making



There’s absolutely no poverty of technical advice to be found these days, be it on social media or on blog posts or at technical conferences or in publications.

Tags:

via Pocket <a href="https://copyconstruct.medium.com/technical-decision-making-9b2817c18da4" rel="nofollow">https://copyconstruct.medium.com/technical-decision-making-9b2817c18da4</a>

November 25, 2022 at 10:14PM

Read the whole story
Flameeyes
6 days ago
reply
London, Europe
Share this story
Delete

No way to parse integers in C

1 Share

There are a few ways to attempt to parse a string into a number in the C standard library. They are ALL broken.

Leaving aside the wide character versions, and staying with long (skipping int, long long or intmax_t, these variants all having the same problem) there are three ways I can think of:

  1. atol()
  2. strtol() / strtoul()
  3. sscanf()

They are all broken.

What is the correct behavior, anyway?

I’ll start by claiming a common sense “I know it when I see it”. The number that I see in the string with my eyeballs must be the numerical value stored in the appropriate data type. “123” must be turned into the number 123.

Another criteria is that the WHOLE number must be parsed. It is not OK to stop at the first sign of trouble, and return whatever maybe is right. “123timmy” is not a number, nor is the empty string.

Failing to provide the above must be an error. Or at least as the user of the parser I must have the option to know if it happened.

First up: atol()

Input Output
123timmy 123
99999999999999999999999999999999 LONG_MAX
timmy 0
empty string 0
" " 0

No. All wrong. And no way for the caller to know anything happened.

For the LONG_MAX overflow case the manpage is unclear if it’s supposed to do that or return as many nines as it can, but empirically on Linux this is what it does.

POSIX says “if the value cannot be represented, the behavior is undefined” (I think they mean unspecified).

Great. How am I supposed to know if the value can be represented if there is no way to check for errors? So if you pass a string to atol() then you’re basically getting a random value, with a bias towards being right most of the time.

I can kinda forgive atol(). It’s from a simpler time, a time when gets() seemed like a good idea. gets() famously cannot be used correctly.

Neither can atol().

Next one: strtol()

I’ll now contradict the title of this post. strtol() can actually be used correctly. strtoul() cannot, but if you’re fine with signed types only, then this’ll actually work.

But only carefully. The manpage has example code, but in function form it’s:

bool parse_long(const char* in, long* out)
{
  // Detect empty string.
  if (!*in) {
    fprintf(stderr, "empty string\n");
    return false;
  }

  // Parse number.
  char* endp = NULL;  // This will point to end of string.
  errno = 0;          // Pre-set errno to 0.
  *out  = strtol(in, &endp, 0);

  // Range errors are delivered as errno.
  // I.e. on amd64 Linux it needs to be between -2^63 and 2^63-1.
  if (errno) {
    fprintf(stderr, "error parsing: %s\n", strerror(errno));
    return false;
  }

  // Check for garbage at the end of the string.
  if (*endp) {
    fprintf(stderr, "incomplete parsing\n");
    return false;
  }
  return true;
}

It’s a matter of the API here if it’s OK to clobber *out in the error case, but that’s a minor detail.

Yay, signed numbers are parsable!

How about strtoull()?

Unlike its sibling, this function cannot be used correctly.

The strtoul() function returns either the result of the conversion or, if  there
was  a  leading  minus sign, the negation of the result of the conversion repre‐
sented as an unsigned value

Example outputs on amd64 Linux:

Input raw Input Output raw Output
-1 -1 18446744073709551615 2^64-1
-9223372036854775808 -2^63 9223372036854775808 2^63
-9223372036854775809 -2^63-1 9223372036854775807 2^63-1
" " just spaces Error: endp not null  
-18446744073709551614 -2^64+2 2 1
-18446744073709551615 -2^64+1 1 1
-18446744073709551616 -2^64 Error ERANGE  

Phew, finally an error is reported.

This is in no way useful. Or I should say: Maybe there are use cases where this is useful, but it’s absolutely not a function that returns the number I asked for.

The title in the Linux manpage is convert a string to an unsigned long integer. It does that. Technically it converts it into an unsigned long integer. Not the obviously correct one, but it indeed returns an unsigned long.

Interesting note that an non-empty input of just spaces is detectable as an error. It’s obviously the right thing to do, but it’s not clear that this is intentional.

So check your implementation: If passed an input of all isspace() characters, is this correctly detected as an error?

If not then strtol() is probably broken too.

Maybe sscanf()?

A bit less code needed, which is nice:

bool parse_ulong(const char* in, unsigned long* out)
{
  char ch; // Probe for trailing data.
  int len;
  if (1 != sscanf(in, "%lu%n%c", out, &len, &ch)) {
    fprintf(stderr, "Failed to parse\n");
    return false;
  }

  // This never triggered, so seems sscanf() doesn't stop
  // parsing on overflow. So it's safe to skip the length check.
  if (len != (int)strlen(in)) {
    fprintf(stderr, "Did not parse full string\n");
    return false;
  }
  return true;
}
Input raw Input Output raw Output
" " just spaces Failed to parse  
-1 -1 18446744073709551615 2^64-1
-9223372036854775808 -2^63 9223372036854775808 2^63
-9223372036854775809 -2^63-1 9223372036854775807 2^63-1
-18446744073709551614 -2^64+2 2 1
-18446744073709551615 -2^64+1 1 1
-18446744073709551616 -2^64 18446744073709551615 2^64-1

As we can see here this is of course nonsense (except the first one). Extra fun that last one. You’d expect that from the two before it that it would be 0, or at least an even number. But no.

That last number is simply “out of range”, and that’s reported as ULONG_MAX.

But you cannot know this. Getting ULONG_MAX as your value could be any one of:

  1. The input was exactly that value.
  2. The input was -1.
  3. The input is out of range, either greater than ULONG_MAX, or less than negative ULONG_MAX plus one.

There is no way to detect the difference between these.

So sscanf() is out, too.

Why does this matter?

Garbage in, garbage out, right? Why does it matter that someone might give you -18446744073709551615 knowing you’ll parse it as 1?

Maybe it’s a funny little trick, like ping 0.

First of all it matters because it’s wrong. That is not, in fact, the number provided.

Maybe you’re parsing a bunch of data from a file. You really should stop on errors, or at least skip bad data. But incorrect parsing here will make you proceed with processing as if the data is correct.

Maybe some ACL only allows you to provide negative numbers, and you use this trick to make it parse as negative in some contexts (e.g. Python), but positive in others (strtoul()).

I even saw a comment saying “when you have requirements as specific as this”. As specific as “parse the number, correctly”?

It should matter that programs do the right thing for any given input. It should matter that APIs can be used correctly.

Knives should have handles. It’s fine if the knives are sharp, but no knife should be void of safe places to hold it.

It should be possible to check for errors.

Can I work around it?

You cannot even assemble the pieces here into a working parser for unsigned long.

Maybe you think you can can filter out the incorrect cases, and parse the rest. But no.

You can detect negative numbers with strtol(), range checked and all, and discard all these. But you can’t tell the difference between being off scale low between -2^64…-2^63, and perfectly valid upper half of unsigned long, 2^63-1…2^64-1.

It’s not a solution to go one integer size bigger, either. long is long long is intmax_t on my system.

So what do I do in practice?

Do you need to be able to parse the upper half of unsigned long? If not, then:

  1. use strtol()
  2. Check for less than zero
  3. Cast to unsigned long

If all you need is unsigned int, then maybe on your system sizeof(int)<sizeof(long), and this can work. Just cast to unsigned int in the last step.

Do you need the upper half? Sorry, you’re screwed. Write your own parser.

These numbers are very high, yes, and maybe you’ll be fine without them. But one day you’ll be asked to parse a 64bit flag field, and you can’t.

0xff02030405060708 cannot be unambiguously parsed by standard parsers, even though there’s ostensibly a perfectly cromulent strtoul() that handles hex numbers and unsigned longs.

Any hope for C++?

Not much, no.

C++ method std::stoul()

bool parse_ulong(const std::string& in, unsigned long* out)
{
  size_t pos;
  *out = std::stoul(in, &pos);
  if (in.size() != pos) {
    return false;
  }
  return true;
}
Input raw Input Output raw Output
" " just spaces throws std::invalid_argument  
timmy text throws std::invalid_argument  
-1 -1 18446744073709551615 2^64-1
-9223372036854775808 -2^63 9223372036854775808 2^63
-9223372036854775809 -2^63-1 throws std::out_of_range  

Code is much shorter, again, which is nice.

And std::istringstream(in) >> *out;?

Same.

In conclusion

Why is everything broken? I don’t think it’s too much to ask to turn a string into a number.

In my day job I deal with complex systems with complex tradeoffs. There’s no tradeoff, and nothing complex, about parsing a number.

In Python it’s just int("123"), and it does the obvious thing. But only signed.

Maybe Google is right in saying just basically never use unsigned. I knew the reasons listed there, but I was not previously aware that the C and C++ standard library string to int parsers were also basically fundamentally broken for unsigned types.

But even if you follow that advice sometimes you need to parse a bit field in integer form. And you’re screwed.

Read the whole story
Flameeyes
35 days ago
reply
London, Europe
Share this story
Delete

Temperature monitoring

1 Share
Xiaomi Mijia Temperature Sensor

Xiaomi Mijia Temperature Sensor

I've been having some temperature problems in my house, so I wanted to set up some thermometers which I could read from a computer, and look at trends.

I bought a pack of three cheap Xiaomi IoT thermometers. There's some official Xiaomi tooling to access them from smartphones and suchlike, but I wanted something more open. The thermometers have some rudimentary security on them to try and ensure you use the official tooling. This is pretty weak, and the open-source Home Assistant (HA) has support for querying them. I wasn't already running HA and it looked to do more than I needed right now.

gathering

A friend told me that it was trivial to write custom firmware to the devices. It's so easy you can do it from a web-based flasher: in fact, anyone in range can. There's a family of custom firmwares out there, and most move the sensors readings into the BTLE announce packets, meaning, you can scrape the temperature by simply reading and decoding the announcement packets, no need to handshake at all, and certainly no need to navigate Xiaomi's weird security. This is the one I used.

I hacked up a Python script to read the values with the help of this convenience library1.

Next, I needed to set up somewhere to write the values.

reporting

The study is thankfully cooler today

The study is thankfully cooler today

It's been long enough since I last looked at something like this that the best in class software was things like multi router traffic grapher, and rrdtool, or things that build on top of them like Munin. The world seems to have moved on (rightly or wrongly) with a cornucopia of options like Prometheus, Grafana, Graphite/Carbon, InfluxDB, statsd, etc.

I ruled most of these out as being too complex for what I want to do, and got something working with Graphite (front-end) and Carbon (back-end). I was surprised that this wasn't packaged in Debian, and opted to try the Docker container. This works, although even that is more complex than I need: it's got graphite and carbon, but also nginx and statsd too; I'm submitting directly to carbon, side-stepping statsd entirely. So as I refine what I'm doing I might possibly strip that back.

next steps

I might add more sensors in my house! My scripts also need a lot of tidying up. But, I think it would be useful to add some external temperature data, such as something from a Weather service. I am also considering pulling in some of the sensor data from the Newcastle University Urban Observatory, which is something I looked at a while ago for my PhD but didn't ultimately end up using. There are several temperature sensors nearby, but they seem to operate relatively sporadically.

There's a load of other interesting sensors in my vicinity, such as air quality monitors.

I'm currently ignoring the humidity data from the sensors but I should collect that too.

It would be useful to mark relevant "events", too: does switching on or off my desktop PC, or printer, etc. correlate to a jump in temperature?


  1. I might get rid of that in the future as I refine my approach
Read the whole story
Flameeyes
108 days ago
reply
London, Europe
Share this story
Delete

Old Posts, Still Relevant

1 Share

This is going to be a bit unusual post, but I effectively ran out of finished posts for a couple more weeks. The reasons are to be found in my previously-announced COVID experience, which as I said sucked – the last post I wrote in the middle of it turned out to be a much worse rambling mess than usual – and the fact that work had me on tight deadline for most of the summer.

So instead of writing something new, I’m gathering some of the topical commentary that I left on other venues that link to a number of old (sometimes very old) posts of mine. It’s going to be a very link heavy post, rather than an usual “essay”, but hopefully it will also bring out some previously buried posts of mine.

GitLab, Self Hosting, and FLOSS Cooperatives

GitLab was a darling in many FLOSS spaces because they are not affiliated with Microsoft, but in the past few weeks they have been through a huge storm when The Register reported on their plans to delete inactive repositories.

As usually happens when a hosting provider realises they can’t afford to stay around forever (this happened before, and will keep happening), there’s a vocal minority of FLOSS people who will try to convince authors and maintainers that the only option to survive is to run their own infrastructure.

Unfortunately, despite the cries of “the cloud is just someone else’s computer” (“The bakery is just someone else’s oven”), there’s a lot of things that are also someone else’s problem when you use a solution provided by a third party. Maintaining a solid infrastructure, particularly for more complex projects, is very time consuming, particularly when you want to not depend on ready-made solutions.

The last time this topic came out, I wrote that in my opinion what we need is FLOSS Cooperatives, but just as back then I don’t think it’s going to be a feasible option: the moment when money is involved, there are commitments to expect and respect, and given that the comparison would be with staffed and funded solutions such as GitHub, it would take quite a bit of money and userbase to maintain a 24/7 SLO — to the point of competing with paid solutions from companies such as GitLab as well.

To plug more of my previous writing, this is also what I would like to see more of, in terms of non-profits (or maybe B Corporations?) rather than focusing nearly only on privacy, as FSFE appeared to do.

Hyperboles, Personality, and Books

I have a strong dislike for cults of personality in all forms, and have been over time applied the maxim «Follow principles, not people» which, funnily enough, I heard from a person I wouldn’t follow to the bathroom. That appears to make me a renegade in Tech, where everyone appears to accept the words of their heroes with little questioning.

A couple of weeks ago, Mikko Hypponen released a book, titled If It’s Smart, It’s Vulnerable — catchy name, catchy premise, and someone who appears to be widely accepted as being smart. Maybe the premise applies to people as well. I honestly felt annoyed by the amount of uncritical noise in social media over the book, although I admit I am not going to read the book, because I do not believe that Hypponen should be lent the credibility for it.

I’m not trying to argue that he doesn’t have the experience, or the insight, to know what he’s talking about. I’m arguing that just at the end of last year he amplified a silly take about smart thermostats, because it fit his narrative. The same narrative that this book appears to be making front and centre.

This is where for me credibility falls: there’s significant problems with the way current “throwaway” smart devices are deployed and sold, that we don’t need to create fake takes around them. Scaring users won’t help if we are actually trying to help the public.

The whole situation reminded me of how I similarly stay away from Doctorow: much as his early tech coverage has been instrumental at pointing out privacy problems that many had up to them ignored, either out of self interest or simple ignorance, his later takes have been hyperbolic, in my opinion just feeding the caricature of privacy advocates as tinfoil hat wearing weirdos. Case in point? The figurative “literally” when misrepresenting Abbott’s takedown.

Midwife To the Container Revolution

I stumbled across a awkwardly phrased, 13 years old post of mine, which I found it quite fascinating to look at: it was written at a time when I was still finding it interesting to play around with PAM, and with complicating my single-user system to build an understanding of how to secure multi-user systems as well.

That post predates the systemd announcement by a number of months, but it talks about concepts that systemd made popular and effectively omnipresent even on non-systemd Linux installations nowadays, such as /run and its user-specific directories. I do not know if I happened to make the same discovery as Lennart or if he was vaguely inspired by my experiments – we used to chat a lot for a long while, since I was packaging PulseAudio among others – but at the very least I can see that I wasn’t too far off the mark on those concepts.

It wasn’t the only time. The year prior I noted the memory wasted by parsing pci.ids files at runtime. Eventually, the hardware IDs database became a binary format that could be directly mapped from the filesystem. And user services, which again systemd implements nicely nowadays, were basically drafted in February 2009. Again, I don’t expect to have been the direct source for the ideas, but at least I can say that I was sensing a need of some kind.

As I was reflecting on these posts, I joked that I sometimes refer to myself as the midwife of the container revolution. Nowadays everything appears to be using Docker (that was first released in 2013) or a variation thereof, but the Gentoo Tinderbox I ran moved to containers (based on LXC) all the way back in 2009!

You can indeed see that I had a lot of content early on about containers, and I was active in lxc-devel when the project was still managed by IBM. Gentoo Linux was an early easy target to support as a container guest among other reasons because I needed it to be for the tinderbox to run successfully. I can’t take the merit of having made containers a mainstream technology, but I have had my hands dirty in the process.

Similarly, while Roy deserves all the credit for OpenRC, I feel like I had a bit of a part to play in that success as well: what became OpenRC started as part of baselayout2, and it was separated explicitly to make it easier to use in Gentoo/FreeBSD, which was the first project I worked on in Gentoo. And indeed, while Roy is now possibly better known for being a NetBSD developer, he was the original member of Gentoo/FreeBSD/SPARC64, and got hooked on NetBSD while trying to make Gentoo/NetBSD a thing. Roy is awesome, if you didn’t know that!

Closing Thoughts

Have you read something you like on the blog? Please, share it with others! In this world and age it seems like the only way to be heard is to have spicy hot takes and stir up controversy, but personally I don’t have the energy to follow that.

Read the whole story
Flameeyes
114 days ago
reply
London, Europe
Share this story
Delete

Upscaling and an Important Note About Photo “AI”

1 Share
One of these is a photo. One is a digital illustration.
John Scalzi

Because I’m a digital photography nerd, I have a lot of programs and Photoshop plugins designed to tweak photos and make them better, or, maybe more accurately, less obviously bad. One of the hot new sectors of digital photography programs is the one where “Artificial Intelligence” is employed to do all manner of things, including colorizing, editing and upscaling. Some of this is baked into Photoshop directly — Adobe has a “Neural Filters” section for this — while other companies are supplying standalone programs and plugins.

Truth be told, all of these companies have been touting “AI” for a while now. But in the last couple of iterations of these tools and programs, there’s been a real leap in… well, something, anyway. The quality of the output of these tools has become strikingly better.

As an example, I present to you the before and after picture above. The original picture on the left was a 200-pixel-wide photo of Athena as a toddler. There had been a larger version of it way back when, but I had cropped it way down for an era when monitors were 640 x 480, and then tossed or misplaced the original photo. So the blocky, blotchy low-resolution picture of my kid is the only one I have now. The picture on the right is a 4x upscaling using a program called Topaz GigaPixel AI, which takes the information from the original picture, and using “AI,” makes guesses at what the picture should look like at a higher resolution, then applies those guesses. In this case, it guessed pretty darn well.

Which is remarkable to me, because even just a couple iterations of the GigaPixel program back, it wasn’t doing that great of a job to my eye — it could smooth out jagged edges on photos just fine, but it was questionable on patterns and tended to make a hash of faces. Its primary utility was that it could do “just okay” attempts at upscaling much faster than I could do that “just okay” work on my own. This iteration of the program, however, does better than “just okay,” more frequently than not, and now does things well beyond my own skill level.

It’s still not perfect; some other pictures of Athena from this era that I upsampled didn’t quite guess her face correctly, so she didn’t look as much like she actually did at the time, and more like a generic toddler. But that generic toddler looked perfectly reasonable, and not like a machine-generated mess. That counts as an improvement.

Now, it’s important to acknowledge a thing about these new “AI”-assisted pictures, which is that they are no longer photographs. They’re something different, closer to a digital illustration than anything else. The upscaled picture of Athena here is the automated equivalent of an artist making an airbrushed painting of my kid based on a tiny old photo. It’s good, and it’s pretty accurate, and I’m glad I have a larger version of that tiny image. But it’s not a photograph anymore. It’s an illustrated guess at what a more detailed version of the photograph would have been.

Is this a problem? Outside of a courtroom, probably not. But it’s still worth remembering that the already-extremely-permeable line between photograph and illustration is now even more so. Also, if you weren’t doing so already, you should treat any “photo” you see as an illustration until and unless you can see the provenance, or it’s from a trusted source. This is why, incidentally, AP and most other news organizations have strict limits on how photos can be altered. I’d guess that a 4x “AI”-assisted enhancement would fall well outside the organization’s definition of acceptable alteration. So, you know, build that into your world view. In a world of social media filters turning people into cats or switching their gender presentation, this internalization may not be as much of a sticking point as it once was.

With that said, it’s still a pretty nifty thing, and I will play with it a lot now, especially for older, smaller digital pictures I have, and to (intentionally) make illustrations that are based from those upscaled originals. I’m glad to have the capability. And that capability is only going to get more advanced from here.

— JS

Read the whole story
Flameeyes
127 days ago
reply
London, Europe
Share this story
Delete

Plans open a disused railway bridge to pedestrians

1 Share

A section of the Thames with few bridges could become a lot easier for pedestrians and cyclists to cross if plans to convert a disused railway bridge for pedestrian use go ahead.

(c) Moxon Architects

The disused bridge crosses the Thames at Barnes, which may confuse some people as the bridge there is in daily use by trains. That’s because little noticed by most people, this is actually two bridges. The railway bridge in use today was built in the 1890s, as a replacement for an earlier cast iron bridge that was built in 1849. That older disused bridge sits right next to the railway bridge, even though few people realise they are two separate structures.

There is an existing walkway on the live railway bridge side, but it’s narrow and lacks any step-free access options. It’s also, in theory, not open to cyclists.

A plan, supported by both councils on either side of the river is to open up the disused bridge as a wider pedestrian and cycle route, with gentle gradient slopes on either side to provide an accessible and pleasant way to cross the river. Another benefit is that on the southern side, the slope up to footbridge will also offer step-free access to the outward-bound platform at Barnes Bridge station that’s next to the railway bridge.

As the bridge will be open to cyclists, to discourage speeding while maintaining at least 2 metres of width along the route, the footpath meanders around planters and integrated seating.

(c) Moxon Architects

The south side in Barnes is largely residential, while the north side in Hounslow is mostly fields and sports facilities. The river walk on the north side is also being upgraded at the moment with a new pedestrian path under the railway bridge to make that route easier to use. The architects who developed that new pedestrian link are the same as the ones working on this new project to open up the disused railway bridge, Moxon Architects, so they’re familiar with the area.

They also have support from Network Rail to carry out the plans.

The organiser’s official website says that the new bridge will also offer views of the annual Boat Race, as the existing narrow footbridge is closed on Boat Race day to prevent overcrowding.

The current estimate is that the project will cost around £3 million to complete. The bulk of the costs are for the step-free access at either end of the railway bridge, and then there’s landscaping work, moving some power cables from the live railway and restoring a Victorian turnstile at the Hounslow end. Studies have already been carried out on the structure, so they don’t anticipate any huge surprises there.

Subject to securing the funding, they expect to open the disused railway bridge to the public in 2026.

This article was published on ianVisits

SUPPORT THIS WEBSITE

This website has been running now for just over a decade, and while advertising revenue contributes to funding the website, but doesn't cover the costs. That is why I have set up a facility with DonorBox where you can contribute to the costs of the website and time invested in writing and research for the news articles.

It's very similar to the way The Guardian and many smaller websites are now seeking to generate an income in the face of rising costs and declining advertising.

Whether its a one-off donation or a regular giver, every additional support goes a long way to covering the running costs of this website, and keeping you regularly topped up doses of Londony news and facts.

If you like what your read on here, then please support the website here.

Thank you

Read the whole story
Flameeyes
128 days ago
reply
London, Europe
Share this story
Delete
Next Page of Stories