Skip to content

[Bug]+io: settings files written under system default encoding can fail on non-UTF-8 codepages #1345

@reddraconi

Description

@reddraconi

Checklist

  • I am using an up-to-date version.
  • I have read the documentation.
  • I have searched existing issues.

TagStudio Version

9.5.6

Operating System & Version

Fedora 43 x86_64

Description

There's a few places where we don't specify UTF-8 as the format for files that TagStudio owns. For Linux and Mac it's probably not a problem since they should default to UTF-8 but on Windows systems that can have non-UTF-8 codepages, this will cause problems.

Examples:

  • A user on Japanese Windows (cp932) saves a library path containing 写真 to settings.toml. The file's written in cp932. If that user opens TagStudio on system with a different Windows codepage, or on Linux, read_settings() blows up with UnicodeDecodeError.
  • TOML and JSON specs both mandate UTF-8, so anything we write that isn't UTF-8 is nonconforming and won't
    round-trip through other tools. (Probably pretty minor, in the grand scheme of things)
  • The compiled ignore file (.compiled_ignore) passed to ripgrep is opened with default encoding, so a user with non-ASCII glob patterns can poison the ripgrep input.
  • ResourceManager.read_text() reads bundled resource files (English translations, etc.) under the system encoding.

PR inbound.

Expected Behavior

TagStudio files (configurations like settings.toml, .ts_ignore, etc.) should use a standard codepage to ensure files moved between systems are readable by those systems if one or the other(s) are using a non-UTF-8 codepage.

If a TagStudio-owned file is written with non-UTF-8 formatting, it should be rewritten in UTF-8 on next save.

Steps to Reproduce

  # Build a settings.toml whose ASCII structure is intact but whose string *value*
  # contains raw cp932 bytes (invalid as UTF-8).
  import pathlib
  legacy = "写真".encode("cp932")  # b"\x8e\xca\x90^"
  payload = b'language = "ja"\ndate_format = "' + legacy + b'"\n'
  pathlib.Path("/tmp/settings.toml").write_bytes(payload)

Use that settings file in TagStudio on a non-UTF-8 system with a non-cp932 codepage and watch it fall over.

Logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugSomething isn't working as intended

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions