I ended up using cmd+shift+i which opens the find/replace panel with the default set to backwards.
SatvikBeri
So, one of the arguments you’ve made at several points is that we should expect Vim to be slower because it has more choices. This seems incorrect to me, even a simple editor like Sublime Text has about a thousand keyboard shortcuts, which are mostly ad-hoc and need to be memorized separately. In contrast Vim has a small, (mostly) composable language. I just counted lsusr’s post, and it has fewer than 30 distinct components – most of the text is showing different ways to combine them.
The other thing to consider is that most programmers will use at least a dozen editors/IDEs in their careers. I have 5 open on my laptop right now, and it’s not because I want to! Vim provides a unified set of key bindings among practically every editor, which normally have very different ways of doing things.
So that’s on the order of a 10x-100x order of magnitude reduction in vocabulary size, which should at least make you consider the idea that Vim has lower latency.
I did :Tutor on neovim and only did commands that actually involved editing text, it took 5:46.
Now trying in Sublime Text. Edit: 8:38 in Sublime, without vim mode – a big difference! It felt like it was mostly uniform, but one area where I was significantly slower was search and replace, because I couldn’t figure out how to go backwards easily.
This is a great experiment, I’ll try it out too. I also have pretty decent habits for non-vim editing so it’ll be interesting to see.
Some IDEs are just very accommodating about this, e.g. PyCharm. So that’s great.
Some of them aren’t, like VS Code. For those, I just manually reconfigure the clashing key bindings. It’s annoying, but it only takes ~15 minutes total.
I would expect using VIM to increase latency. While you are going to press fewer keys you are likely going to take slightly longer to press the keys as using any key is more complex.
This really isn’t my experience. Once you’ve practiced something enough that it becomes a habit, the latency is significantly lower. Anecdotally, I’ve pretty consistently seen people who’re used to vim accomplish text editing tasks much faster than people who aren’t, unless the latter is an expert in keyboard shortcuts of another editor such as emacs.
There’s the paradox of choice and having more choices to accomplish a task costs mental resources. Vim forces me to spent cognitive resources to chose between different alternatives of how to accomplish a task.
All the professional UX people seem to advocate making interfaces as simple as possible.
You want simple interfaces for beginners. Interfaces popular among professionals tend to be pretty complex, see e.g. Bloomberg Terminal or Photoshop or even Microsoft Excel.
As far as I know there’s almost no measurement of productivity of developer tools. Without data, I think there are two main categories in which editor features, including keyboard shortcuts, can make you more productive:
By making difficult tasks medium to easy
By making ~10s tasks take ~1s
An example of the first would be automatically syncing your code to a remote development instance. An example of the first would be adding a comma to the end of several lines at once using a macro. IDEs tend to focus on 1, text editors tend to focus on 2.
In general, I think it’s very likely that the first case makes you more productive. What about the second?
My recollection is that in studies of how humans respond to feedback, there are large differences between even relatively short periods of latency. Something like vim gives you hundreds of these (learning another editor’s keyboard shortcuts very well probably does too.) I can point to dozens of little things that are easier with vim, conversely, nothing is harder because you can always just drop into insert mode.
I agree that this isn’t nearly as convincing as actual studies would be, but constructing a reasonable study on this seems pretty difficult.
Very cool, thanks for writing this up. Hard-to-predict access in loops is an interesting case, and it makes sense that AoS would beat SoA there.
Yeah, SIMD is a significant point I forgot to mention.
It’s a fair amount of work to switch between SoA and AoS in most cases, which makes benchmarking hard!
StructArrays.jl
makes this pretty doable in Julia, and Jonathan Blow talks about making it simple to switch between SoA and AoS in his programming language Jai. I would definitely like to see more languages making it easy to just try one and benchmark the results.
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” – Donald Knuth
Yup, these are all reasons to prefer column orientation over row orientation for analytics workloads. In my opinion data locality trumps everything but compression and fast transmission is definitely very nice.
Until recently, numpy and pandas were row oriented, and this was a major bottleneck. A lot of pandas’s strange API is apparently due to working around row orientation. See e.g. this article by Wes McKinney, creator of pandas: https://wesmckinney.com/blog/apache-arrow-pandas-internals/#:~:text=Arrow’s%20C%2B%2B%20implementation%20provides%20essential,optimized%20for%20analytical%20processing%20performance
I see where that intuition comes from, and at first I thought that would be the case. But the machine is very good at iterating through pairs of arrays. Continuing the previous example:
function my_double_sum(data) sum_heights_and_weights = 0 for row in data sum_heights_and_weights += row.weight + row.height end return sum_heights_and_weights end @btime(my_double_sum(heights_and_weights)) > 50.342 ms (1 allocation: 16 bytes)
function my_double_sum2(heights, weights) sum_heights_and_weights = 0 for (height, weight) in zip(heights, weights) sum_heights_and_weights += height + weight end return sum_heights_and_weights end @btime(my_double_sum2(just_heights, just_weights)) > 51.060 ms (1 allocation: 16 bytes)
Moving Data Around is Slow
There’s also a transcript: https://www.cs.virginia.edu/~robins/YouAndYourResearch.html
I’ve re-read it at least 5 times, highly recommend it.
Of course it’s easy! You just compare how much you’ve made, and how long you’ve stayed solvent, against the top 1% of traders. If you’ve already done just as well as the others, you’d in the top 1%. Otherwise, you aren’t.
This object-level example is actually harder than it appears, performance of a fund or trader in one time period generally has very low correlation to the next, e.g. see this paper: https://www.researchgate.net/profile/David-Smith-256/publication/317605916_Evaluating_Hedge_Fund_Performance/links/5942df6faca2722db499cbce/Evaluating-Hedge-Fund-Performance.pdf
There’s a fair amount of debate over how much data you need to evaluate whether a person is a consistently good trader, in my moderately-informed opinion a trader who does well over 2 years is significantly more likely to be lucky than skilled.
An incomplete list of caveats to Sharpe off the top of my head:
We can never measure the true Sharpe of a strategy (how it would theoretically perform on average over all time), only the observed Sharpe ratio, which can be radically different, especially for strategies with significant tail risk. There are a wide variety of strategies that might have a very high observed sharpe over a few years, but much lower true Sharpe
Sharpe typically doesn’t measure costs like infrastructure or salaries, just losses to the direct fund. So e.g. you could view working at a company and earning a salary as a financial strategy with a nearly infinite sharpe, but that’s not necessarily appealing. There are actually a fair number of hedge funds whose function is more similar to providing services in exchange for relatively guaranteed pay
High-sharpe strategies are often constrained by capacity. For example, my friend once offered to pay me $51 on Venmo if I gave her $50 in cash, which is a very high return on investment given that the transaction took just a few minutes, but I doubt she would have been willing to do the same thing at a million times the scale. Similarly, there are occasionally investment strategies with very high sharpes that can only handle a relatively small amount of money
This is very, very cool. Having come from the functional programming world, I frequently miss these features when doing machine learning in Python, and haven’t been able to easily replicate them. I think there’s a lot of easy optimization that could happen in day-to-day exploratory machine learning code that bog standard pandas/scikit-learn doesn’t do.
If N95 masks work, O95-100 and P95-100 masks should also work, and potentially be more effective—the stuff they filter is a superset of what N95 filters. They’re normally more expensive, but in the current state I’ve actually found P100s cheaper than N95s.
I don’t really understand what you mean by “from first principles” here. Do you mean in a way that’s intuitive to you? Or in a way that includes all the proofs?
Any field of Math is typically more general than any one intuition allows, so it’s a little dangerous to think in terms of what it’s “really” doing. I find the way most people learn best is by starting with a small number of concrete intuitions – e.g., groups of symmetries for group theory, or posets for category theory – and gradually expanding.
In the case of Complex Analysis, I find the intuition of the Riemann Sphere to be particularly useful, though I don’t have a good book recommendation.
One major confounder is that caffeine is also a painkiller, many people have mild chronic pain, and I think there’s a very plausible mechanism by which painkillers improve productivity, i.e. just allowing someone to focus better.
Anecdotally, I’ve noticed that “resetting” caffeine tolerance is very quick compared to most drugs, taking something like 2-3 days without caffeine for several people I know, including myself.
The studies I could find on caffeine are highly contradictory, e.g. from Wikipedia, “Caffeine has been shown to have positive, negative, and no effects on long-term memory.”
I’m under the impression that there’s no general evidence for stimulants increasing productivity, although there are several specific cases, such as e.g. treating ADHD.
If you’re using non-modal editing, in that example you could press Alt+rightarrow three times, use cmd+f, the end key (and back one word), or cmd+righarrow (and back one word). That’s not even counting shortcuts specific to another IDE or editor. Why, in your mental model, does the non-modal version feel like fewer choices? I suspect it’s just familiarity – you’ve settled on some options you use the most, rather than trying to calculate the optimum fewest keystrokes each time.
Have you ever seen an experienced vim user? 3-5 seconds latency is completely unrealistic. It sounds to me like you’re describing the experience of being someone who’s a beginner at vim and spent half their life into non-modal editing, and in that case, of course you’re going to be much faster with the second. And to be fair, vim is extremely beginner-unfriendly in ways that are bad and could be fixed without harming experts – kakoune(https://kakoune.org/) is similar but vastly better designed for learning.
As a side note, this is my last post in this conversation. I feel like we have mostly been repeating the same points and going nowhere.