Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve mtimes of files extracted from zip archives #4392

Open
zuzzurro opened this issue Jun 29, 2022 · 12 comments
Open

Preserve mtimes of files extracted from zip archives #4392

zuzzurro opened this issue Jun 29, 2022 · 12 comments
Labels
feature features we would like to implement

Comments

@zuzzurro
Copy link

zuzzurro commented Jun 29, 2022

Importing a disc from a zip file while having the importadded set results in file times not corresponding to the content of the zip archive itself, the files times are current. Repeating the same import on a folder extracted from the zip file using the unzip tool works fine.

I'm using 1.6.0 on Fedora 36 (self compiled as the distro still has 1.4.9).

I can provide more information, but the issue is quite easy to reproduce as I explained above.

this is the relevant section of my config file:

importadded:
  preserve_mtimes: yes
  preserve_write_mtimes: yes
@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label Jun 29, 2022
@sampsyo
Copy link
Member

sampsyo commented Jun 29, 2022

Interesting! I think the first step to understanding this would be to know whether the library we use to extract these files, namely the zipfile.extractall function, can preserve mtimes. If it does, then we would need to find where we're discarding those. If it doesn't, then we may be out of luck. Any chance you'd be able to investigate?

@zuzzurro
Copy link
Author

Mmmh. It doesn't seem to work properly. I tested it by running:

python -m zipfile -e zz.zip zz/

and the time of the extracted files is now.

@arogl
Copy link
Contributor

arogl commented Jun 30, 2022

I found the following, so it seems a limitation of pythons implementation of zip extraction:

https://stackoverflow.com/questions/9813243/extract-files-from-zip-file-and-retain-mod-date

@zuzzurro
Copy link
Author

Would it be possible to adopt one of the proposed solutions? Since it's for internal use I don't see much possibly going wrong by changing the time by hand after the extraction.

@sampsyo
Copy link
Member

sampsyo commented Jun 30, 2022

Nice find, @arogl! It seems technically possible, but somewhat annoying to implement because we can no longer just use that extractall function… but I'll mark this as a feature request in case anyone is interested in giving it a shot.

@sampsyo sampsyo added feature features we would like to implement and removed needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." labels Jun 30, 2022
@sampsyo sampsyo changed the title importadded plugin don't seem to work when importing from a zip file Preserve mtimes of files extracted from zip archives Jun 30, 2022
@zuzzurro
Copy link
Author

zuzzurro commented Jun 30, 2022

I haven't tested if the functionality works for RAR files, but since RAR unpacking it is implemented by directly calling an external utility (unrar) maybe that's the case.

@arogl
Copy link
Contributor

arogl commented Jun 30, 2022

I will try to look at all file extraction from archives over the weekend.

I was thinking of wrapping the time setting while only the preserve options enabled

@arogl
Copy link
Contributor

arogl commented Jul 1, 2022

@sampsyo

Could this work?

In importer.py#L1080 add

# From here:
# https://stackoverflow.com/questions/9813243/extract-files-from-zip-file-and-retain-mod-date
# fixing #4392

def RestoreTimestampsOfArchiveContents(archivename, extract_dir):
    for f in archivename.infolist():
        # path to this extracted f-item
        fullpath = os.path.join(extract_dir, f.filename)
        # still need to adjust the dt o/w item will have the current dt
        date_time = time.mktime(f.date_time + (0, 0, -1))
        # update dt
        os.utime(fullpath, (date_time, date_time))

Then at importer.py#L1093:

if (config['preserve_mtimes'].get(bool)):
    RestoreTimestampsOfArchiveContents(archive, extract_to)

I have not thought too much about PY2 example v. PY3, nor the if config

@sampsyo
Copy link
Member

sampsyo commented Jul 1, 2022

Yes, something like this could work! With the caveat that the preserve_mtimes option is located within the importadded configuration—not at the top level of config.

@zuzzurro
Copy link
Author

zuzzurro commented Jul 1, 2022

By doing it this way are we setting the times also in the cases where they are already set by the unarchiver? Just curious.

@arogl
Copy link
Contributor

arogl commented Jul 1, 2022

By doing it this way are we setting the times also in the cases where they are already set by the unarchiver? Just curious.

At the moment every extraction, regardless of type.

Further testing to be done

arogl added a commit to arogl/beets that referenced this issue Jul 2, 2022
@arogl
Copy link
Contributor

arogl commented Jul 2, 2022

Initial change pushed #4396

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature features we would like to implement
Projects
None yet
Development

No branches or pull requests

3 participants