Python library to modify MP3 audio without transcoding

前端 未结 5 970
我寻月下人不归
我寻月下人不归 2021-02-02 14:30

I am looking for some general advice about the mp3 format before I start a small project to make sure I am not on a wild-goose chase.

My understanding of the internals o

相关标签:
5条回答
  • 2021-02-02 14:42

    As for removing or extracting mp3 segments from an mp3 file while staying in the MP3 domain (that is, without conversion to PCM format and back), there is also the open source package PyMp3Cut.

    As for splicing MP3 files together (adding e.g. 'Credits' to the end or beginning of an mp3 file) I've found you can simply concatenate the MP3 files providing that the files have the same sampling rate (e.g. 44.1khz) and the same number of channels (e.g. both are stereo or both are mono).

    0 讨论(0)
  • 2021-02-02 14:51

    Mp3 is lossy, but it is lossy in a very specific way. The algorithms used as designed to discard certain parts of the audio which your ears are unable to hear (or are very difficult to hear). Re-doing the compression process at the same level of compression over and over is likely to yield nearly identical results for a given piece of audio. However, some additional losses may slowly accumulate. If you're going to be modifying files a lot, this might be a bad idea. It would also be a bad idea if you were concerned about quality, but then using MP3 if you are concerned about quality is a bad idea over all.

    You could construct a test using an encoder and a decoder to re-encode a few different mp3 files a few times and watch how they change, this could help you determine the rate of deterioration and figure out if it is acceptable to you. Sounds like you have libraries you could use to run this simple test already.

    MP3 files are composed of "frames" of audio and so it should be possible, with some effort, to remove entire frames with minimal processing (remove the frame, update some minor details in the file header). I believe frames are pretty short (a few milliseconds each) which would give the precision you're looking for. So doing some reading on the MP3 File Format should give you enough information to code your own python library to do this. This is a fair bit different than traditional "audio processing" (since you don't care about precision) and so you're unlikely to find an existing library that does this. Most, as you've found, will decompress the audio first so you can have complete fine-grained control.

    0 讨论(0)
  • 2021-02-02 14:57

    If you want to do things low-level, use pymad. It turns MP3s into a buffer of sample data.

    If you want something a little higher-level, use the Echo Nest Remix API (disclosure: I wrote part of it for my dayjob). It includes a few examples. If you look at the cowbell example (i.e., MoreCowbell.dj), you'll see a fork of pymad that gives you a NumPy array instead of a buffer. That datatype makes it easier to slice out sections and do math on them.

    0 讨论(0)
  • 2021-02-02 14:58

    Not a direct answer to your needs, but check the mp3DirectCut software that does what you want (as a GUI app). I think that the source code is available, so even if you don't find a library, you could build one of your own, or build a python extension using code from mp3DirectCut.

    0 讨论(0)
  • 2021-02-02 15:03

    I got three quality answers, and I thank you all (and upvoted you all) for them. I haven't chosen any as the accepted answer, because each addressed one aspect, so I wanted to write a summary.

    Do you need to work in MP3?

    • Transcoding to PCM and back to MP3 is unlikely to result in a drop in quality.

    • Don't optimise audio-quality prematurely; test it with a simple prototype and listen to it.

    Working in MP3

    • Wikipedia has a summary of the MP3 File Format.

    • MP3 frames are short (1152 samples, or just a few milliseconds) allowing for moderate precision at that level.

    • However, Wikipedia warns that "Frames are not independent items ("byte reservoir") and therefore cannot be extracted on arbitrary frame boundaries."

    • Existing libraries are unlikely to be of assistance, if I really want to avoid decoding.

    Working in PCM

    There are several libraries at this level:

    • LAME (latest release: October 2017)
    • PyMedia (latest release: February 2006)
    • PyMad (Linux only? Decoder only? Latest release: January 2007)

    Working at a higher level

    • Echo Nest Remix API (Mac or Linux only, at the moment) is an API to a web-service that supports quite sophisticated operations (e.g. finding the locations of music beats and tempo, etc.)

    • mp3DirectCut (Windows only) is a GUI that apparently performs the operations I want, but as an app. It is not open-source. (I tried to run it, got an Access Denied installer error, and didn't follow up. A GUI isn't suitably for me, as I want to repeatedly run these operations on a changing library of files.)

    My plan is now to start out in PyMedia, using PCM. Thank you all for your assistance.

    0 讨论(0)
提交回复
热议问题