I have a huge file that I must parse line by line. Speed is of the essence.
Example of a line:
Token-1 Here-is-the-Next-Token La
The fastest way to write the code would probably be to create a TStringList and assign each line in your text file to the CommaText property. By default, white space is a delimiter, so you will get one StringList item per token.
MyStringList.CommaText := s;
for i := 0 to MyStringList.Count - 1 do
begin
// process each token here
end;
You'll probably get better performance by parsing each line yourself, though.
If speed is of the essence, custom code is the answer. Check out the Windows API that will map your file into memory. You can then just use a pointer to the next character to do your tokens, marching through as required.
This is my code for doing a mapping:
procedure TMyReader.InitialiseMapping(szFilename : string);
var
// nError : DWORD;
bGood : boolean;
begin
bGood := False;
m_hFile := CreateFile(PChar(szFilename), GENERIC_READ, 0, nil, OPEN_EXISTING, 0, 0);
if m_hFile <> INVALID_HANDLE_VALUE then
begin
m_hMap := CreateFileMapping(m_hFile, nil, PAGE_READONLY, 0, 0, nil);
if m_hMap <> 0 then
begin
m_pMemory := MapViewOfFile(m_hMap, FILE_MAP_READ, 0, 0, 0);
if m_pMemory <> nil then
begin
htlArray := Pointer(Integer(m_pMemory) + m_dwDataPosition);
bGood := True;
end
else
begin
// nError := GetLastError;
end;
end;
end;
if not bGood then
raise Exception.Create('Unable to map token file into memory');
end;
Rolling your own is the fastest way for sure. For more on this topic, you could see Synedit's source code which contains lexers (called highlighters in the project's context) for about any language on the market. I suggest you take one of those lexers as a base and modify for your own usage.