问题
I am attempting to build C++ wrappers (using SWIG) for System.Text.Json.Utf8JsonWriter and System.Text.Json.Utf8JsonReader that allow C++ code to read and write JSON to the same reader/writer as client C# code. To do this I'm creating abstract classes in the C++ that match the interface of the Utf8JsonWriter and Utf8JsonReader class and then using SWIG director feature to enable me to create a derived class in C# that implements this interface and acts as a proxy calling the corresponding methods on the actual Reader/Writer. For instance in C++ the JsonWriter class is defined as:
class JsonWriter
{
public:
virtual ~JsonWriter() {}
virtual void WriteStartObject() = 0;
virtual void WriteStartObject(const char* propertyName) = 0;
virtual void WriteEndObject() = 0;
// .. etc
};
SWIG (with the director feature enabled) generates a matching class in C# and hooks up the methods by passing delegates (callbacks) to a derived class it automatically generates in the C++. This allows me to implement a derived class in the C# that overrides each of the methods and acts as a proxy to the actual Utf8JsonWriter as shown below:
public class JsonWriterProxy : JsonWriter
{
private readonly Utf8JsonWriter _writer;
public JsonWriterProxy(Utf8JsonWriter writer)
: base()
{
_writer = writer;
}
public override void WriteStartObject() => _writer.WriteStartObject();
public override void WriteStartObject(string propertyName) => _writer.WriteStartObject(propertyName);
public override void WriteEndObject() => _writer.WriteEndObject();
// ... etc
}
This works beautifully for the Writer and the performance is surprisingly good given all the pinvoke calls going on under the covers. The problem is with the System.Text.Json.Utf8JsonReader implementation. In this case Microsoft have chosen to implement the Utf8JsonReader as a "ref struct" rather than a class. This means that I can't simply store a reference to the original Utf8JsonReader object in my proxy class (because this is prohibited by the language). I can get around this by passing the Utf8JsonReader to the C++ deserialization method along with the the proxy object. The C++ deserialization method looks like:
typedef void* RefJsonReader;
void Deserialize(cylite::io::JsonReader* reader, RefJsonReader refReader)
{
reader->SetRefReader(refReader);
if (reader->TokenType() != JsonTokenType::StartObject) throw new std::exception();
while (reader->Read())
{
// .. etc
}
}
And the associated C# pinvoke definition:
[DllImport("IO", EntryPoint="CSharp_fIO_Exam_Deserialize___")]
public static extern void Deserialize(global::System.Runtime.InteropServices.HandleRef jarg1, global::System.Runtime.InteropServices.HandleRef jarg2, ref System.Text.Json.Utf8JsonReader jarg3);
The C++ deserialization code calls the SetRefReader method on the C++ JsonReader to save this reference which is then passed back in each of the calls to the C# delegates as shown below:
typedef void* RefJsonReader;
class JsonReader
{
private:
RefJsonReader _refReader = nullptr;
public:
virtual ~JsonReader() {}
inline void SetRefReader(RefJsonReader reader)
{
_refReader = reader;
}
inline bool Read()
{
return Read(_refReader);
}
inline void Skip()
{
return Skip(_refReader);
}
// .. etc
protected:
virtual bool Read(RefJsonReader reader) = 0;
virtual void Skip(RefJsonReader reader) = 0;
virtual unsigned char TokenType(RefJsonReader reader) = 0;
// .. etc
}
And the corresponding C# Reader proxy class (which takes the ref Utf8JsonReader parameter in each of the methods):
public class JsonReaderProxy : JsonReader
{
protected override bool Read(ref Utf8JsonReader reader) => reader.Read();
protected override void Skip(ref Utf8JsonReader reader) => reader.Skip();
protected override byte TokenType(ref Utf8JsonReader reader) => (byte)reader.TokenType;
// .. etc
}
This code actually works. But it is somewhat messy having to pass both the proxy class and the ref Utf8JsonReader to each of the deserialization methods (making it not symmetric with the writing). What I would like to do is effectively call the SetRefReader method from the constructor of the proxy class which would mean that I would not have to pass it as an additional parameter to each deserialization method. This however does not work (crashes with memory errors). At first I thought it was simply because what is being passed when passing by ref is actually the address of the local stack variable that in turn holds the address to the actual struct. So I changed my SetReader call to take the address of the parameter and save the dereferenced address in the _refReader variable then pass the address of this variable back in the calls to the Proxy as shown below:
inline void SetRefReader(RefJsonReader* reader)
{
_refReader = *reader;
}
inline bool Read()
{
return Read(&_refReader);
}
This looked promising as while I could see the value of the reader parameter was different, depending on whether where in the call stack it was called from, the dereferenced *reader value was always the same address. However this also crashes when calling back to the proxy via the delegate method that takes the ref Utf8JsonReader parameter.
There appears to be something I don't fully understand about how ref struct parameters are passed by C#. If anyone has some insight it would be much appreciated. I apologize for the length of the question - but I thought it was worth trying to provide some context for what I'm trying to achieve.
EDIT: Some further information. To try to understand this better I created my own ref struct that looks like:
public ref struct TestStruct
{
public int value;
public Span<byte> data;
}
I changed just the C# import definitions to pass this instead of the Utf8JsonReader ref struct. This works without any issues at all setting the reference from the constructor. By populating the data values in the struct it is clear that what is being passed is actually just a pointer to the data (not the address of the pointer).
Strangely if I then change the struct definition to embed the Utf8JsonReader like:
public ref struct TestStruct
{
public int value;
Utf8JsonReader reader;
public Span<byte> data;
}
It fails again if I set the value from the constructor. If I set the reference from inside the Serialize function it works (just like previously) although interestingly for some reason it is MUCH slower (about x20) when calling the callback repeatedly then if the Utf8JsonReader is not part of the structure. This is all very strange. Why would including an additional struct within a struct, that is being passed by reference, change the behavior at all - and why would it make it slower.
EDIT 2: OK so I think I know what might be happening. I'm guessing that for some reason when the Utf8JsonReader (or the struct containing it) is pinvoked the runtime is deciding to marshal the argument by making a copy of the struct to a buffer on the way into the call, passing the method the buffer address and copying back from that buffer on the way out. Since this buffer is out of scope when the second function is called, the address is no longer valid then. The same thing presumably happens on the delegate calls - which is why they are so slow. If this is the case the question remains why doesn't it do this for my own ref struct, and is there anyway to change it?
EDIT 3: OK I have now found out why the Utf8JsonReader is behaving differently. It is a non-blittable type and according to https://docs.microsoft.com/en-us/dotnet/framework/interop/copying-and-pinning this means it will be copied and marshalled where as my simple structure was blittable and so was just pinned. I guess what I really need is just some way to pass a managed ref through C++ in an opaque way - since C++ does need to know what is in the type. Ideally I would just convert the ref to a pointer (using some unsafe code). However it does not appear possible to create a pointer to a ref struct. Maybe another question.
来源:https://stackoverflow.com/questions/63760330/how-to-to-pass-a-c-sharp-delegate-callback-with-a-ref-struct-parameter-to-c