Following the discussion here, if you want to have a secure class for storing sensitive information (e.g passwords) on memory, you have to:
Here is another program that reproduces the problem more directly:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
inline void SecureWipeBuffer(char* buf, size_t n){
volatile char* p = buf;
asm volatile("rep stosb" : "+c"(n), "+D"(p) : "a"(0) : "memory");
}
void mymemcpy(char* b, const char* a, size_t n){
char* s1 = b;
const char* s2= a;
for(; 0<n; --n) *s1++ = *s2++;
}
int main(){
const size_t size1 = 200;
const size_t size2 = 400;
char* b = new char[size1];
for(int j=0;j<size1-10;j+=10){
memcpy(b+j, "LOL", 3);
memcpy(b+j+3, "WUT", 3);
sprintf((char*) (b+j+6), "%d", j);
}
char* nb = new char[size2];
memcpy(nb, b, size1);
//mymemcpy(nb, b, size1);
SecureWipeBuffer(b,size1);
SecureWipeBuffer(nb,size2);
*((int*)NULL) = 1;
return 0;
}
If you replace memcpy
with mymemcpy
or use smaller sizes the problem goes away, so my best guess is that the builtin memcpy does something that leaves part of the copied data in memory.
I guess this just shows that clearing sensitive data from memory is practically impossible unless it is designed into the entire system from scratch.
The string literals will be stored in memory and not managed by the SecByteBlock class.
This other SO question does a decent job of explaining it: Is a string literal in c++ created in static memory?
You can try and confirm whether the grep matches can be accounted for by the string literals by seeing how many matches you get. You could also print out the memory locations of the SecByteBlock buffers and try to see if they correspond with the locations in the core dump that match your marker.
Despite showing up in the coredump, the password isn’t actually in memory
anymore after clearing the buffers. The problem is that memcpy
ing a
sufficiently long string leaks the password into SSE registers, and those
are what show up in the coredump.
When the size
argument to memcpy
is greater than a certain
threshold—80 bytes on the mac—then SSE instructions are used to do the
memory copying. These instructions are faster because they can copy 16
bytes at a time in parallel instead of going character-by-character,
byte-by-byte, or word-by-word. Here’s the key part of the source code from
Libc on the mac:
LAlignedLoop: // loop over 64-byte chunks
movdqa (%rsi,%rcx),%xmm0
movdqa 16(%rsi,%rcx),%xmm1
movdqa 32(%rsi,%rcx),%xmm2
movdqa 48(%rsi,%rcx),%xmm3
movdqa %xmm0,(%rdi,%rcx)
movdqa %xmm1,16(%rdi,%rcx)
movdqa %xmm2,32(%rdi,%rcx)
movdqa %xmm3,48(%rdi,%rcx)
addq $64,%rcx
jnz LAlignedLoop
jmp LShort // copy remaining 0..63 bytes and done
%rcx
is the loop index register, %rsi
is the source address register,
and %rdi
is the destination address register. Each run around the loop,
64 bytes are copied from the source buffer to the 4 16-byte SSE registers
xmm{0,1,2,3}
; then the values in those registers are copied to the
destination buffer.
There’s a lot more stuff in that source file to make sure that copies occur only on aligned addresses, to fill in the part of the copy that’s leftover after doing 64-byte chunks, and to handle the case where source and destination overlap.
However—the SSE registers are not cleared after use! That means 64 bytes
of the buffer that was copied is still present in the xmm{0,1,2,3}
registers.
Here’s a modification of Rasmus’s program that shows this:
#include <ctype.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <emmintrin.h>
inline void SecureWipeBuffer(char* buf, size_t n){
volatile char* p = buf;
asm volatile("rep stosb" : "+c"(n), "+D"(p) : "a"(0) : "memory");
}
int main(){
const size_t size1 = 200;
const size_t size2 = 400;
char* b = new char[size1];
for(int j=0;j<size1-10;j+=10){
memcpy(b+j, "LOL", 3);
memcpy(b+j+3, "WUT", 3);
sprintf((char*) (b+j+6), "%d", j);
}
char* nb = new char[size2];
memcpy(nb, b, size1);
SecureWipeBuffer(b,size1);
SecureWipeBuffer(nb,size2);
/* Password is now in SSE registers used by memcpy() */
union {
__m128i a[4];
char c;
};
asm ("MOVDQA %%xmm0, %0": "=x"(a[0]));
asm ("MOVDQA %%xmm1, %0": "=x"(a[1]));
asm ("MOVDQA %%xmm2, %0": "=x"(a[2]));
asm ("MOVDQA %%xmm3, %0": "=x"(a[3]));
for (int i = 0; i < 64; i++) {
char p = *(&c + i);
if (isprint(p)) {
putchar(p);
} else {
printf("\\%x", p);
}
}
putchar('\n');
return 0;
}
On my mac, this prints:
0\0LOLWUT130\0LOLWUT140\0LOLWUT150\0LOLWUT160\0LOLWUT170\0LOLWUT180\0\0\0
Now, examining the core dump, the password only occurs one single time,
and as that exact 0\0LOLWUT130\0...180\0\0\0
string. The core dump has to
contain a copy of all registers, which is why that string is there—it’s the
values of the xmm{0,1,2,4}
registers.
So the password isn’t actually in RAM anymore after calling
SecureWipeBuffer
, it only appears to be because it is actually in some
registers that only appear in the coredump. If you’re worried about
memcpy
having a vulnerability that could be exploited by RAM-freezing,
worry no more. If having a copy of the password in registers bothers you,
use a modified memcpy
that doesn’t use the SSE2 registers, or clears them
when it’s done. And if you’re really paranoid about this, keep testing your
coredumps to make sure the compiler isn’t optimizing away your
password-clearing code.
Without inspecting the details of memcpy_s
, I suspect that what you are seeing is a temporary stack buffer used by memcpy_s
to copy out small memory buffers. You can verify this by running in a debugger and seeing if LOLWUT
shows up when viewing stack memory.
[The implementation of reallocate
in Crypto++ uses memcpy_s
when resizing memory allocations, which is why you would be able to find some number of LOLWUT
strings in memory. Also, the fact that many different LOLWUT
strings overlap in that dump suggest that it's a temporary buffer that's being reused.]
The custom version of memcpy
that is just a simple loop does not require temporary storage beyond counters, so that would certainly be more secure than how memcpy_s
is implemented.
I would suggest that the way to do it is to encrypt the data in memory. In that way, the data is always secure whether it is still in memory or not. The drawback, of course, is an overhead in terms of encrypting/decrypting the data each time it is accessed.