@the_sisko

the_sisko@startrek.website · 8 months ago

the_sisko@startrek.website · 2 years ago

But it’s actually not that bad… It’s not good beer but whatever it is, it’s nice 🙂

the_sisko@startrek.website · 2 years ago

I do believe that’s a freezer.

the_sisko@startrek.website · 2 years ago

Usually it’s a bunch of different string hashes of the text content. They could be different hashing algorithms, but it’s more common to take a single hash algorithm and simply create a bunch of hash functions that operate on different parts of the data.

If it’s not text data, there’s a whole bunch of other hashing strategies but I only ever saw bloom filters used with text.

the_sisko@startrek.website · 2 years ago

A classic use for them is spam filtering.

Suppose you have a set of spam detection systems/rules which are somewhat expensive to execute, eg a ML model or keyword blocklist. Spam tends to come in waves, and frequently it can be as simple as reposting the same message dozens of times.

Once your systems determine a piece of content is spam (or you manually flag content), it’s a good idea to insert the content into a bloom filter. This means that future posts of the identical content will be flagged without needing to execute the expensive checks, especially if there’s a surge of content stressing your systems.

Since it’s probabilistic, you can’t use this unless you have some sort of manual reviewing queue or system, as it’s possible for false positives to be flagged. However, you can also run more intensive checks once you’ve flagged content, to detect false positives.

The false positives can also be a feature, not a bug: with careful choice of hash functions, your bloom filter can actually detect slightly modified content, since most of the hashes may still be the same.

I’ve worked at companies which use this strategy so it’s very real world.

the_sisko@startrek.website · 2 years ago

In other news, emacs still didn’t ship my init.el as part of the default configuration! Lol

the_sisko@startrek.website · 2 years ago

I’d argue that’s not true. That’s what the extern keyword is for. If you do #include , you don’t get the actual printf function defined by the preprocessor. You just get an extern declaration (though extern is optional for function signatures). The preprocessed source code that is fed to cc is still not complete, and cannot be used until it is linked to an object file that defines printf. So really, the unnamed “C preprocessor output language” can access functions or values from elsewhere.

the_sisko@startrek.website · 2 years ago

I know this is a joke, but assuming you’re the author, then you’re under no obligation to follow the license. Only people to whom you transmitted the code are bound by its terms.

the_sisko@startrek.website · 2 years ago

Sphinx has warnings for these already. They’re just suppressed and ignored :)

the_sisko@startrek.website · 2 years ago

I see what you mean. The python ML ecosystem is… not far off from what you describe.

But please consider Python as a language outside the pytorch/numpy/whatever else ecosystem. The vast majority of Python doesn’t need you to setup a conda environment with a bunch of ML dependencies. It’s just some code and a couple of libraries in a virtualenv. And for system stuff, there’s almost never any dependency except the standard library.

the_sisko@startrek.website · 2 years ago

They probably know what it is, but it’s a bad point if they’re trying to paint DAGs as esoteric CS stuff for the average programmer. I needed to use a topological sort for work coding 2 weeks ago, and any time you’re using a build system, even as simple as Make, you’re using DAGs. Acting like it’s a tough concept makes me wonder why I should accept the rest of the argument.

Can’t say I have a strong feeling about Gradle though 🤷‍♀️

the_sisko@startrek.website · 2 years ago

You might be even more concerned to find that your Fedora package manager, DNF, is also written in Python: https://github.com/rpm-software-management/dnf

Fact of the matter is that Python is a language that gets used all the time for system level things, and frequently you just don’t know it because there is no “.py” extension.

I’m not sure I understand your concerns about python…

Performance is worse than C, yes. But writing performance sensitive code in Python is quite silly, it’s common to put that in a C library and use that within python to get the best of both worlds. DNF does this with libdnf.
“It feels like an extension of proprietary hardware planned obsolescence and manipulation.” This is very confusing to me. There has been one historic version change (2->3) which broke compatibility in a major way, and this version change had a literal decade of help and resources and parallel development. The source code for every Python interpreter version is freely available to build and tweak if you’re unhappy with a particular version. Most python scripts are written and used for ages without any changes.
“i don’t consider programs written in Python to have permanence or long term value because their toolchains become nearly impossible to track down from scratch.” Again, what? As I said, every Python version is available to download, build, and install, and tweak. It’s pretty much impossible for python code to every become unusable.

Anyway, people like the Fedora folks working on anaconda choose a language that makes sense for their purpose. Python absolutely makes sense for this purpose compared to C. It allows for fast development and flexibility, and there’s not much in an installer program that needs high performance.

That’s not to say C isn’t a very important language too. But it’s important to use the best tool for the job.

the_sisko@startrek.website · edit-2 2 years ago

Anaconda is just an OS installer program. At least, the Anaconda that you’re referring to. After installation, it’s gone.

There is also Anaconda which is a Python platform/package system/whatever. Maybe you’re confusing the two?

the_sisko@startrek.website · 2 years ago

It’s a cathartic, but not particularly productive vent.

Yes, there are stupid lines of time.sleep(1) written in some tests and codebases. But also, there are test setUp() methods which do expensive work per-test, so that the runtime grew too fast with the number of tests. There are situations where there was a smarter algorithm and the original author said “fuck it” and did the N^2 one. There are container-oriented workflows that take a long time to spin up in order to run the same tests. There are stupid DNS resolution timeouts because you didn’t realize that the third-party library you used would try to connect to an API which is not reachable in your test environment… And the list goes on…

I feel like it’s the “easy way out” to create some boogeyman, the stupid engineer who writes slow, shitty code. I think it’s far more likely that these issues come about because a capable person wrote software under one set of assumptions, and then the assumptions changed, and now the code is slow because the assumptions were violated. There’s no bad guy here, just people doing their best.