Note Excel.exe is boobytrapped. Be careful when using Excel to open any data or workbooks related to highly sensitive data or activities. On certain networks, when specific conditions are met (i.e. specific regexes or heuristics are triggered on the basis of data that is loaded), Excel will send information to endpoints ostensibly owned and maintained by Microsoft, that provide identifying information on the host and workbook. These are not traceable using normal packet sniffing tools like Wireshark etc. Alternatives when needing to use spreadsheets on highly sensitive data: open source versions that you have compiled from source (NOT Google Sheets), or opening Excel.exe with your network card disabled within a sandboxed environment (e.g. disable network ⇒ Start VMWare Windows container ⇒ use Excel.exe ⇒ End VMWare Windows container ⇒ enable network).
I find this easy to believe, but a bit surprising that it’s not mentioned or studied or even has crank/subversive pages with POC detections. The printer/scanner steganographic fingerprints became pretty well-known within a few years of becoming common.
I mean, anything that’s aggressively online (m365 versions of excel, Windows itself, Google Sheets, etc.) should be assumed to be insecure against state-level threats. But if you’ve got evidence of specific backdoors or monitoring, that should be shared and common knowledge.
Undetectable steganography on endpoints expected to be used / communicated during normal usage. Mostly natsec. You can repro it by setting up a synthetic network with similar characteristics or fingerprints to some sanctioned region, and generate 10,000 synthetic honeytrap files to attempt to open (use your imagination); capture and diff all network traffic on identical actions (open file ⇒ read / manipulate some specific cells ⇒ close file). Then note the abnormalities in how much is communicated and how.
Thanks, that’s what I figured. Did you find this by accident? I’m curious what techniques work well to reveal this kind of stuff; I expect it to be pretty common.
I found it based on a hunch, then confirmed it with experimentation. I gained additional conviction when backtesting the experimentation on various historical versions of excel.exe, and noting that the phenomenon only appeared in excel.exe versions shortly after (measured in months) government requested a “read-only” copy of the source code for Excel held in escrow. This has occurred historically in the past (e.g., https://www.chinadaily.com.cn/english/doc/2004-09/20/content_376107.htm and https://www.itprotoday.com/microsoft-windows/microsoft-gives-windows-source-code-to-governments) but subsequent instances of this were allegedly/supposedly classified. Nevertheless, following those instances, the phenomenon appeared, indicating possible compromise of Excel.exe.
Really interesting research. I would like to subscribe to your newsletter.
I have seen similar steganographic telemetry before (Dassaults Solidworks CAD software, and other enterprise licensed applications, go to incredible eztents to enforce licensing) but didn’t expect data-level probing like this. I’d imagine similar scripts for e.g EURion detections in Photoshop.
I always dismissed “lessons on trusting trust” style attacks as mere hypotheticals, but backdoors operating on the level of Excel cells is now making me reconsider that notion.
Vary the filename/path from short (one character) to max length and run the above repro, and notice the increase in bits communicated if and only if the filename/path is long, all other factors being held constant. Same for varying the data. There is no reason why Excel.exe should be interpolating this information with all the standard telemetry and connected experience stuff disabled. Even the fact that it is occurring is interesting, and doesn’t require hypotheses for its origin.
Note Excel.exe is boobytrapped. Be careful when using Excel to open any data or workbooks related to highly sensitive data or activities. On certain networks, when specific conditions are met (i.e. specific regexes or heuristics are triggered on the basis of data that is loaded), Excel will send information to endpoints ostensibly owned and maintained by Microsoft, that provide identifying information on the host and workbook. These are not traceable using normal packet sniffing tools like Wireshark etc. Alternatives when needing to use spreadsheets on highly sensitive data: open source versions that you have compiled from source (NOT Google Sheets), or opening Excel.exe with your network card disabled within a sandboxed environment (e.g. disable network ⇒ Start VMWare Windows container ⇒ use Excel.exe ⇒ End VMWare Windows container ⇒ enable network).
I find this easy to believe, but a bit surprising that it’s not mentioned or studied or even has crank/subversive pages with POC detections. The printer/scanner steganographic fingerprints became pretty well-known within a few years of becoming common.
I mean, anything that’s aggressively online (m365 versions of excel, Windows itself, Google Sheets, etc.) should be assumed to be insecure against state-level threats. But if you’ve got evidence of specific backdoors or monitoring, that should be shared and common knowledge.
By what means would it be untraceable? Routing through an undocumented Windows interface or something? Does it trigger on AI related data, or natsec?
Undetectable steganography on endpoints expected to be used / communicated during normal usage. Mostly natsec. You can repro it by setting up a synthetic network with similar characteristics or fingerprints to some sanctioned region, and generate 10,000 synthetic honeytrap files to attempt to open (use your imagination); capture and diff all network traffic on identical actions (open file ⇒ read / manipulate some specific cells ⇒ close file). Then note the abnormalities in how much is communicated and how.
Thanks, that’s what I figured. Did you find this by accident? I’m curious what techniques work well to reveal this kind of stuff; I expect it to be pretty common.
I found it based on a hunch, then confirmed it with experimentation. I gained additional conviction when backtesting the experimentation on various historical versions of excel.exe, and noting that the phenomenon only appeared in excel.exe versions shortly after (measured in months) government requested a “read-only” copy of the source code for Excel held in escrow. This has occurred historically in the past (e.g., https://www.chinadaily.com.cn/english/doc/2004-09/20/content_376107.htm and https://www.itprotoday.com/microsoft-windows/microsoft-gives-windows-source-code-to-governments) but subsequent instances of this were allegedly/supposedly classified. Nevertheless, following those instances, the phenomenon appeared, indicating possible compromise of Excel.exe.
Really interesting research. I would like to subscribe to your newsletter.
I have seen similar steganographic telemetry before (Dassaults Solidworks CAD software, and other enterprise licensed applications, go to incredible eztents to enforce licensing) but didn’t expect data-level probing like this. I’d imagine similar scripts for e.g EURion detections in Photoshop.
I always dismissed “lessons on trusting trust” style attacks as mere hypotheticals, but backdoors operating on the level of Excel cells is now making me reconsider that notion.
What other explanations for this network traffic have you investigated and on what basis did you reject those explanations?
Vary the filename/path from short (one character) to max length and run the above repro, and notice the increase in bits communicated if and only if the filename/path is long, all other factors being held constant. Same for varying the data. There is no reason why Excel.exe should be interpolating this information with all the standard telemetry and connected experience stuff disabled. Even the fact that it is occurring is interesting, and doesn’t require hypotheses for its origin.