How to read a text file reversely with iterator in C#

后端 未结 11 1873
甜味超标
甜味超标 2020-11-22 04:05

I need to process a large file, around 400K lines and 200 M. But sometimes I have to process from bottom up. How can I use iterator (yield return) here? Basically I don\'t l

相关标签:
11条回答
  • 2020-11-22 04:25

    Very fast solution for huge files. Use powershell Get-Content cmdlet with Tail option. Calling powershell will give a little bit overhead, but for huge files its worthless

    using System.Management.Automation;
    
    const string FILE_PATH = @"d:\temp\b_media_27_34_0000_25393.txt";
    var ps = PowerShell.Create();
    ps.AddCommand("Get-Content")
        .AddParameter("Path", FILE_PATH)
        .AddParameter("Tail", 1);
    var psResults = ps.Invoke();
    var lastLine = psResults.FirstOrDefault()?.BaseObject.ToString();
    
    ps.Dispose();
    

    Required powershell reference

    C:\Program Files (x86)\Reference Assemblies\Microsoft\WindowsPowerShell\3.0\System.Management.Automation.dll

    0 讨论(0)
  • 2020-11-22 04:28

    You can read the file one character at a time backwards and cache all characters until you reach a carriage return and/or line feed.

    You then reverse the collected string and yeld it as a line.

    0 讨论(0)
  • 2020-11-22 04:28

    There are good answers here already, and here's another LINQ-compatible class you can use which focuses on performance and support for large files. It assumes a "\r\n" line terminator.

    Usage:

    var reader = new ReverseTextReader(@"C:\Temp\ReverseTest.txt");
    while (!reader.EndOfStream)
        Console.WriteLine(reader.ReadLine());
    

    ReverseTextReader Class:

    /// <summary>
    /// Reads a text file backwards, line-by-line.
    /// </summary>
    /// <remarks>This class uses file seeking to read a text file of any size in reverse order.  This
    /// is useful for needs such as reading a log file newest-entries first.</remarks>
    public sealed class ReverseTextReader : IEnumerable<string>
    {
        private const int BufferSize = 16384;   // The number of bytes read from the uderlying stream.
        private readonly Stream _stream;        // Stores the stream feeding data into this reader
        private readonly Encoding _encoding;    // Stores the encoding used to process the file
        private byte[] _leftoverBuffer;         // Stores the leftover partial line after processing a buffer
        private readonly Queue<string> _lines;  // Stores the lines parsed from the buffer
    
        #region Constructors
    
        /// <summary>
        /// Creates a reader for the specified file.
        /// </summary>
        /// <param name="filePath"></param>
        public ReverseTextReader(string filePath)
            : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), Encoding.Default)
        { }
    
        /// <summary>
        /// Creates a reader using the specified stream.
        /// </summary>
        /// <param name="stream"></param>
        public ReverseTextReader(Stream stream)
            : this(stream, Encoding.Default)
        { }
    
        /// <summary>
        /// Creates a reader using the specified path and encoding.
        /// </summary>
        /// <param name="filePath"></param>
        /// <param name="encoding"></param>
        public ReverseTextReader(string filePath, Encoding encoding)
            : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), encoding)
        { }
    
        /// <summary>
        /// Creates a reader using the specified stream and encoding.
        /// </summary>
        /// <param name="stream"></param>
        /// <param name="encoding"></param>
        public ReverseTextReader(Stream stream, Encoding encoding)
        {          
            _stream = stream;
            _encoding = encoding;
            _lines = new Queue<string>(128);            
            // The stream needs to support seeking for this to work
            if(!_stream.CanSeek)
                throw new InvalidOperationException("The specified stream needs to support seeking to be read backwards.");
            if (!_stream.CanRead)
                throw new InvalidOperationException("The specified stream needs to support reading to be read backwards.");
            // Set the current position to the end of the file
            _stream.Position = _stream.Length;
            _leftoverBuffer = new byte[0];
        }
    
        #endregion
    
        #region Overrides
    
        /// <summary>
        /// Reads the next previous line from the underlying stream.
        /// </summary>
        /// <returns></returns>
        public string ReadLine()
        {
            // Are there lines left to read? If so, return the next one
            if (_lines.Count != 0) return _lines.Dequeue();
            // Are we at the beginning of the stream? If so, we're done
            if (_stream.Position == 0) return null;
    
            #region Read and Process the Next Chunk
    
            // Remember the current position
            var currentPosition = _stream.Position;
            var newPosition = currentPosition - BufferSize;
            // Are we before the beginning of the stream?
            if (newPosition < 0) newPosition = 0;
            // Calculate the buffer size to read
            var count = (int)(currentPosition - newPosition);
            // Set the new position
            _stream.Position = newPosition;
            // Make a new buffer but append the previous leftovers
            var buffer = new byte[count + _leftoverBuffer.Length];
            // Read the next buffer
            _stream.Read(buffer, 0, count);
            // Move the position of the stream back
            _stream.Position = newPosition;
            // And copy in the leftovers from the last buffer
            if (_leftoverBuffer.Length != 0)
                Array.Copy(_leftoverBuffer, 0, buffer, count, _leftoverBuffer.Length);
            // Look for CrLf delimiters
            var end = buffer.Length - 1;
            var start = buffer.Length - 2;
            // Search backwards for a line feed
            while (start >= 0)
            {
                // Is it a line feed?
                if (buffer[start] == 10)
                {
                    // Yes.  Extract a line and queue it (but exclude the \r\n)
                    _lines.Enqueue(_encoding.GetString(buffer, start + 1, end - start - 2));
                    // And reset the end
                    end = start;
                }
                // Move to the previous character
                start--;
            }
            // What's left over is a portion of a line. Save it for later.
            _leftoverBuffer = new byte[end + 1];
            Array.Copy(buffer, 0, _leftoverBuffer, 0, end + 1);
            // Are we at the beginning of the stream?
            if (_stream.Position == 0)
                // Yes.  Add the last line.
                _lines.Enqueue(_encoding.GetString(_leftoverBuffer, 0, end - 1));
    
            #endregion
    
            // If we have something in the queue, return it
            return _lines.Count == 0 ? null : _lines.Dequeue();
        }
    
        #endregion
    
        #region IEnumerator<string> Interface
    
        public IEnumerator<string> GetEnumerator()
        {
            string line;
            // So long as the next line isn't null...
            while ((line = ReadLine()) != null)
                // Read and return it.
                yield return line;
        }
    
        IEnumerator IEnumerable.GetEnumerator()
        {
            throw new NotImplementedException();
        }
    
        #endregion
    }
    
    0 讨论(0)
  • 2020-11-22 04:29

    Attention: this approach doesn't work (explained in EDIT)

    You could use File.ReadLines to get lines iterator

    foreach (var line in File.ReadLines(@"C:\temp\ReverseRead.txt").Reverse())
    {
        if (noNeedToReadFurther)
            break;
    
        // process line here
        Console.WriteLine(line);
    }
    

    EDIT:

    After reading applejacks01's comment, I run some tests and it does look like .Reverse() actually loads whole file.

    I used File.ReadLines() to print first line of a 40MB file - memory usage of console app was 5MB. Then, used File.ReadLines().Reverse() to print last line of same file - memory usage was 95MB.

    Conclusion

    Whatever `Reverse()' is doing, it is not a good choice for reading bottom of a big file.

    0 讨论(0)
  • 2020-11-22 04:30

    To create a file iterator you can do this:

    EDIT:

    This is my fixed version of a fixed-width reverse file reader:

    public static IEnumerable<string> readFile()
    {
        using (FileStream reader = new FileStream(@"c:\test.txt",FileMode.Open,FileAccess.Read))
        {
            int i=0;
            StringBuilder lineBuffer = new StringBuilder();
            int byteRead;
            while (-i < reader.Length)
            {
                reader.Seek(--i, SeekOrigin.End);
                byteRead = reader.ReadByte();
                if (byteRead == 10 && lineBuffer.Length > 0)
                {
                    yield return Reverse(lineBuffer.ToString());
                    lineBuffer.Remove(0, lineBuffer.Length);
                }
                lineBuffer.Append((char)byteRead);
            }
            yield return Reverse(lineBuffer.ToString());
            reader.Close();
        }
    }
    
    public static string Reverse(string str)
    {
        char[] arr = new char[str.Length];
        for (int i = 0; i < str.Length; i++)
            arr[i] = str[str.Length - 1 - i];
        return new string(arr);
    }
    
    0 讨论(0)
  • 2020-11-22 04:33

    I know this post is very old but as I couldn't find how to use the most voted solution, I finally found this: here is the best answer I found with a low memory cost in VB and C#

    http://www.blakepell.com/2010-11-29-backward-file-reader-vb-csharp-source

    Hope, I'll help others with that because it tooks me hours to finally find this post!

    [Edit]

    Here is the c# code :

    //*********************************************************************************************************************************
    //
    //             Class:  BackwardReader
    //      Initial Date:  11/29/2010
    //     Last Modified:  11/29/2010
    //     Programmer(s):  Original C# Source - the_real_herminator
    //                     http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/9acdde1a-03cd-4018-9f87-6e201d8f5d09
    //                     VB Converstion - Blake Pell
    //
    //*********************************************************************************************************************************
    
    using System.Text;
    using System.IO;
    public class BackwardReader
    {
        private string path;
        private FileStream fs = null;
        public BackwardReader(string path)
        {
            this.path = path;
            fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
            fs.Seek(0, SeekOrigin.End);
        }
        public string Readline()
        {
            byte[] line;
            byte[] text = new byte[1];
            long position = 0;
            int count;
            fs.Seek(0, SeekOrigin.Current);
            position = fs.Position;
            //do we have trailing rn?
            if (fs.Length > 1)
            {
                byte[] vagnretur = new byte[2];
                fs.Seek(-2, SeekOrigin.Current);
                fs.Read(vagnretur, 0, 2);
                if (ASCIIEncoding.ASCII.GetString(vagnretur).Equals("rn"))
                {
                    //move it back
                    fs.Seek(-2, SeekOrigin.Current);
                    position = fs.Position;
                }
            }
            while (fs.Position > 0)
            {
                text.Initialize();
                //read one char
                fs.Read(text, 0, 1);
                string asciiText = ASCIIEncoding.ASCII.GetString(text);
                //moveback to the charachter before
                fs.Seek(-2, SeekOrigin.Current);
                if (asciiText.Equals("n"))
                {
                    fs.Read(text, 0, 1);
                    asciiText = ASCIIEncoding.ASCII.GetString(text);
                    if (asciiText.Equals("r"))
                    {
                        fs.Seek(1, SeekOrigin.Current);
                        break;
                    }
                }
            }
            count = int.Parse((position - fs.Position).ToString());
            line = new byte[count];
            fs.Read(line, 0, count);
            fs.Seek(-count, SeekOrigin.Current);
            return ASCIIEncoding.ASCII.GetString(line);
        }
        public bool SOF
        {
            get
            {
                return fs.Position == 0;
            }
        }
        public void Close()
        {
            fs.Close();
        }
    }
    
    0 讨论(0)
提交回复
热议问题