Some tech stacks / tools / resources for research. I have used most of these and found them good for my work.
TODO: check out https://www.lesswrong.com/posts/6P8GYb4AjtPXx6LLB/tips-and-code-for-empirical-research-workflows#Part_2__Useful_Tools
Finetuning open-source language models.
Docker images: Nvidia CUDA latest image as default, or framework-specific image (e.g Axolotl)
Orchestrating cloud instances: Runpod
Connecting to cloud instances: Paramiko
Transferring data: SCP
Launching finetuning jobs: Axolotl
Efficient tensor ops: FlashAttention, xFormers
Multi-GPU training: DeepSpeed
[Supports writing custom cuda kernels in Triton]
Monitoring ongoing jobs: Weights and Biases
Storing saved model checkpoints: Huggingface
Serving the trained checkpoints: vLLM.
[TODO: look into llama-cpp-python and similar things for running on worse hardware]
Finetuning OpenAI language models.
End-to-end experiment management: openai-finetuner
Evaluating language models.
Running standard benchmarks: Inspect
Running custom evals: [janky framework which I might try to clean up and publish at some point]
AI productivity tools.
Programming: Cursor IDE
Thinking / writing: Claude
Plausibly DeepSeek is now better
More extensive SWE: Devin
[TODO: look into agent workflows, OpenAI operator, etc]
Basic SWE
Managing virtual environments: PDM
Dependency management: UV
Versioning: Semantic release
Linting: Ruff
Testing: Pytest
CI: Github Actions
Repository structure: PDM
Repository templating: PDM
Building wheels for distribution: PDM
[TODO: set up a cloud development workflow]
Research communication.
Quick updates: Google Slides
Extensive writing: Google Docs, Overleaf
Some friends have recommended Typst
Making figures: Google Draw, Excalidraw
Some tech stacks / tools / resources for research. I have used most of these and found them good for my work.
TODO: check out https://www.lesswrong.com/posts/6P8GYb4AjtPXx6LLB/tips-and-code-for-empirical-research-workflows#Part_2__Useful_Tools
Finetuning open-source language models.
Docker images: Nvidia CUDA latest image as default, or framework-specific image (e.g Axolotl)
Orchestrating cloud instances: Runpod
Connecting to cloud instances: Paramiko
Transferring data: SCP
Launching finetuning jobs: Axolotl
Efficient tensor ops: FlashAttention, xFormers
Multi-GPU training: DeepSpeed
[Supports writing custom cuda kernels in Triton]
Monitoring ongoing jobs: Weights and Biases
Storing saved model checkpoints: Huggingface
Serving the trained checkpoints: vLLM.
[TODO: look into llama-cpp-python and similar things for running on worse hardware]
Finetuning OpenAI language models.
End-to-end experiment management: openai-finetuner
Evaluating language models.
Running standard benchmarks: Inspect
Running custom evals: [janky framework which I might try to clean up and publish at some point]
AI productivity tools.
Programming: Cursor IDE
Thinking / writing: Claude
Plausibly DeepSeek is now better
More extensive SWE: Devin
[TODO: look into agent workflows, OpenAI operator, etc]
Basic SWE
Managing virtual environments: PDM
Dependency management: UV
Versioning: Semantic release
Linting: Ruff
Testing: Pytest
CI: Github Actions
Repository structure: PDM
Repository templating: PDM
Building wheels for distribution: PDM
[TODO: set up a cloud development workflow]
Research communication.
Quick updates: Google Slides
Extensive writing: Google Docs, Overleaf
Some friends have recommended Typst
Making figures: Google Draw, Excalidraw