Easiest scripting method to merge two text files - Ruby, Python, JavaScript, Java?

前端 未结 6 766
一个人的身影
一个人的身影 2021-01-14 21:14

I have two text files, one containing HTML and the other containing URL slugs:

FILE 1 (HTML):

  • 相关标签:
    6条回答
    • 2021-01-14 21:54

      You need zip-function, which is available in most languages. It's purpose is parallel processing of two or more arrays.
      In Ruby it will be something like this:

      f1 = File.readlines('file1.txt')
      f2 = File.readlines('file2.txt')
      
      File.open('file3.txt','w') do |output_file|
      
          f1.zip(f2) do |a,b|
              output_file.puts a.sub('/article/','/article/'+b)
          end
      
      end
      

      For zipping more, than two arrays you can do f1.zip(f2,f3,...) do |a,b,c,...|

      0 讨论(0)
    • 2021-01-14 22:01

      Python is great language Just have a look at these six lines of python they can merge any big text file, just now i have merged 2 text file of 10 GB each.

       o = open("E:/temp/3.txt","wb") #open for write
       for line in open("E:/temp/1.txt","rb"):
           o.write(line)
       for line in open("E:/temp/2.txt","rb"):
           o.write(line)
       o.close()
      
      0 讨论(0)
    • 2021-01-14 22:08

      Ruby one liner:

      File.open("joined.txt","w") { |f| f.puts ['file1.txt', 'file2.txt'].map{ |s| IO.read(s) }}
      
      0 讨论(0)
    • 2021-01-14 22:11

      This will be easy in any language. Here it is in pseudo-Python; I've omitted the lxml bits because I don't have access to them and I can't quite remember the syntax. They're not difficult, though.

      with open(...) as htmls, open(...) as slugs, open(...) as output:
          for html, slug in zip(htmls, slugs):
              root = lxml.etree.fromstring(html)
              # do some fiddling with lxml to get the name
      
              slug = slug.split("-")[(len(name.split()):]
              # add in the extra child in lxml
      
              output.write(root.tostring())
      

      Interesting features:

      • This doesn't read in the entire file at once; it does it chunk by chunk (well, line-by-line but Python will buffer it). Useful if the files are huge, but probably irrelevant.

      • lxml may be overkill, depending on how rigid the format of the html strings is. If they're guaranteed to be the same and all well-formed, it might be easier for you to use simple string operations. On the other hand, lxml is pretty fast and offers a lot more flexibility.

      0 讨论(0)
    • 2021-01-14 22:12

      PHP is the easiest!

      $firstFile = file('file1.txt');
      $secodFile = file('file2.txt');
      
      $findKey='/article/';
      $output='';
      
      if (count($firstFile)==count($secodFile)) 
                          or die('record counts dont match');
      
      for($i=0;$i<count($firstFile);$i++)
      {
          $output.=str_replace($findKey,$findKey.trim($secodFile[$i]),$firstFile[$i]);
      }
      
      file_put_contents('output.txt',$output);
      
      0 讨论(0)
    • 2021-01-14 22:14

      The easiest way to do this is to use the language of the listed ones that you are most familiar with. Even if it doesn't produce the neatest solution, you'll get the job done with the least (mental) effort.

      If you know none of them, then Perl is a good option because this is the kind of thing it was designed to do. (I'm assuming that you understand regular expressions ...) And by the look of some of the other answers, Python is a good option too.

      0 讨论(0)
    提交回复
    热议问题