Logs: liberachat/#haskell
| 2021-05-25 12:12:37 | <int-e> | dminuoso: ah. well, close enough. |
| 2021-05-25 12:12:39 | <dminuoso> | The term is illsuited since there's no useful definition of the term "character" |
| 2021-05-25 12:12:42 | <opqdonut> | dminuoso: yeah |
| 2021-05-25 12:12:51 | <boxscape> | opqdonut quite possibly, but they are read from a file that contains english text, not just filenames |
| 2021-05-25 12:12:52 | <tomsmeding> | I don't think there exist systems where filenames are equal modulo unicode normalisation |
| 2021-05-25 12:12:56 | <opqdonut> | merijn: posix? |
| 2021-05-25 12:12:57 | <merijn> | opqdonut: Windows is unicode, macOS' old filesystem is unicode too |
| 2021-05-25 12:13:02 | <opqdonut> | ok, good to know |
| 2021-05-25 12:13:08 | <merijn> | opqdonut: AppleFS is...weird |
| 2021-05-25 12:13:36 | <merijn> | opqdonut: It's unicode, but handled at the library level, with the FS accepting any byte sequence, leading to confusing behaviour with different normalisations |
| 2021-05-25 12:13:38 | <tomsmeding> | boxscape: what function are you using to check suffix equality |
| 2021-05-25 12:13:48 | <merijn> | imo, Windows is the only one who gets this right |
| 2021-05-25 12:14:02 | × | argento quits (~argent0@168.227.96.51) (Client Quit) |
| 2021-05-25 12:14:23 | <opqdonut> | yeah the linux byte sequence filenames are a bit of a copout |
| 2021-05-25 12:14:24 | <boxscape> | tomsmeding at the moment I'm checking "a `T.isSuffixOf ` b || b `T.isSuffixOf` a", which probably does some checking twice |
| 2021-05-25 12:14:26 | <opqdonut> | leaving userland to deal with problems |
| 2021-05-25 12:14:27 | hellcp | is now known as lcp |
| 2021-05-25 12:14:32 | → | argento joins (~argent0@168.227.96.51) |
| 2021-05-25 12:14:42 | × | lcp quits (~hellcp@83.24.148.243.ipv4.supernova.orange.pl) (Client Quit) |
| 2021-05-25 12:14:46 | <merijn> | Requiring a specific known unicode encoding is the only sensible way to handle files when the average user wants to name thing in their own language |
| 2021-05-25 12:14:49 | shapr`` | is now known as shapr |
| 2021-05-25 12:14:56 | → | lcp joins (~hellcp@83.24.148.243.ipv4.supernova.orange.pl) |
| 2021-05-25 12:15:33 | <tomsmeding> | boxscape: what about "let len = min (T.length a) (T.length b) in T.takeEnd len a == T.takeEnd len b" |
| 2021-05-25 12:15:47 | → | brandonh_ joins (~brandonh@mi-18-24-205.service.infuturo.it) |
| 2021-05-25 12:15:48 | <boxscape> | tomsmeding that sounds like a good idea |
| 2021-05-25 12:15:58 | absence_ | is now known as absence |
| 2021-05-25 12:16:04 | × | lcp quits (~hellcp@83.24.148.243.ipv4.supernova.orange.pl) (Client Quit) |
| 2021-05-25 12:16:13 | <tomsmeding> | merijn: both of these solutions don't actually need O(1) indexing in any "sensible" way, just in the straight bytes way :p |
| 2021-05-25 12:16:18 | <dminuoso> | merijn: I guess on a technical level that means filenames are just identified by a byte sequence. |
| 2021-05-25 12:16:19 | → | space-shell joins (~space-she@88.98.247.38) |
| 2021-05-25 12:16:41 | <merijn> | dminuoso: Right, but that means it's impossible to reliably display filenames to users |
| 2021-05-25 12:17:01 | × | argento quits (~argent0@168.227.96.51) (Client Quit) |
| 2021-05-25 12:17:04 | <dminuoso> | Right. And it still leaves the problem of unicode equivalence |
| 2021-05-25 12:17:24 | <dminuoso> | So normalization is a real issue here |
| 2021-05-25 12:17:27 | <merijn> | dminuoso: What if one filename is in UTF-8 and the other is UTF-16 (perhaps because they're made by different users with different locales) |
| 2021-05-25 12:17:55 | → | argento joins (~argent0@168.227.96.51) |
| 2021-05-25 12:18:09 | × | brandonh quits (~brandonh@51.190.236.231) (Ping timeout: 265 seconds) |
| 2021-05-25 12:18:46 | <Maxdamantus> | Bytes should always be the way to represent filenames. |
| 2021-05-25 12:18:46 | × | ksqsf quits (~textual@67.209.186.120.16clouds.com) (Remote host closed the connection) |
| 2021-05-25 12:18:47 | <dminuoso> | merijn: Easy. Just present them with mojibake. |
| 2021-05-25 12:19:01 | <dminuoso> | You know, like the rest of the text world where encoding is not stored as metadata. |
| 2021-05-25 12:19:19 | × | argento quits (~argent0@168.227.96.51) (Client Quit) |
| 2021-05-25 12:19:20 | <Maxdamantus> | On Windows, you encode the filename into WTF-8. On other systems you just copy the bytes. |
| 2021-05-25 12:19:22 | × | oxide quits (~lambda@user/oxide) (Ping timeout: 264 seconds) |
| 2021-05-25 12:19:57 | <merijn> | Maxdamantus: Windows has a specific, required UTF-16 encoding and normalisation for filenames, enforced by the filesystem it is *not* "WTF-8" |
| 2021-05-25 12:19:58 | × | nan`_ quits (~nan`@rrcs-70-60-83-42.central.biz.rr.com) (Read error: Connection reset by peer) |
| 2021-05-25 12:20:09 | → | nan` joins (~nan`@rrcs-70-60-83-42.central.biz.rr.com) |
| 2021-05-25 12:20:10 | → | argento joins (~argent0@168.227.96.51) |
| 2021-05-25 12:20:16 | <Maxdamantus> | merijn: hmm.. Pretty sure it doesn't. |
| 2021-05-25 12:20:23 | <merijn> | Yes it does |
| 2021-05-25 12:20:28 | <Maxdamantus> | merijn: pretty sure you can put lone surrogates in Windows filenames. |
| 2021-05-25 12:20:41 | Maxdamantus | will try it at work tomorrow. |
| 2021-05-25 12:21:28 | <Maxdamantus> | If you can't put lone surrogates in, then that would mean some sort of incompatibility with older UCS-2 filenames. |
| 2021-05-25 12:21:28 | × | nan` quits (~nan`@rrcs-70-60-83-42.central.biz.rr.com) (Read error: Connection reset by peer) |
| 2021-05-25 12:21:37 | → | nan`_ joins (~nan`@rrcs-70-60-83-42.central.biz.rr.com) |
| 2021-05-25 12:21:49 | <merijn> | Allowing non-sensical unicode is fine, *if* it's a consistent well-specified format |
| 2021-05-25 12:22:01 | × | nan`_ quits (~nan`@rrcs-70-60-83-42.central.biz.rr.com) (Read error: Connection reset by peer) |
| 2021-05-25 12:22:07 | <merijn> | The problem with "just bytes" is that any folder can have names using any random mix of encodings |
| 2021-05-25 12:22:11 | → | eggplantade joins (~Eggplanta@2600:1700:bef1:5e10:c032:b754:d42c:78b5) |
| 2021-05-25 12:22:17 | → | nan` joins (~nan`@rrcs-70-60-83-42.central.biz.rr.com) |
| 2021-05-25 12:22:19 | <merijn> | "oh, but you shouldn't use non-ascii names..." |
| 2021-05-25 12:22:40 | <merijn> | Well, that's just a giant "fuck you" to any computer user outside of the anglophone world |
| 2021-05-25 12:22:47 | <Maxdamantus> | I'm talking about the representation within the program. If the OS doesn't like certain filenames, it can reject those when you try to interact with the OS. |
| 2021-05-25 12:22:53 | <dminuoso> | Even ASCII is not enough, because ASCII being a terminal control protocol, you probably want to limit ASCII to printable codepoints.. |
| 2021-05-25 12:22:58 | <Maxdamantus> | You get that with Linux too. |
| 2021-05-25 12:23:03 | <dminuoso> | i.e. how do you print `\BEL` ? |
| 2021-05-25 12:23:07 | <boxscape> | tomsmeding I'm guessing it's faster to do "let {lenA = T.length A; lenB = T.length B; len = min lenA lenB} in lenA == lenB && T.takeEnd len a == T.takeEnd len b" |
| 2021-05-25 12:23:13 | × | nan` quits (~nan`@rrcs-70-60-83-42.central.biz.rr.com) (Read error: Connection reset by peer) |
| 2021-05-25 12:23:13 | <merijn> | Maxdamantus: Right, and it's one of the things that makes linux awful :p |
| 2021-05-25 12:23:22 | <Maxdamantus> | eg, '\0', '/' and ".." and "." are treated specially. |
| 2021-05-25 12:23:34 | <tomsmeding> | boxscape: what makes you think (==) on Text wouldn't do the length check first itself? |
| 2021-05-25 12:23:39 | <Maxdamantus> | Well, it's better than Windows at least. |
| 2021-05-25 12:23:39 | → | nan` joins (~nan`@rrcs-70-60-83-42.central.biz.rr.com) |
| 2021-05-25 12:23:46 | <merijn> | Maxdamantus: Hard disagree |
| 2021-05-25 12:23:46 | <boxscape> | tomsmeding not thinking it through :) |
| 2021-05-25 12:23:59 | <merijn> | Linux engineering is, in many ways, inferior to Windows |
| 2021-05-25 12:24:09 | <boxscape> | tomsmeding I also compared the wrong lengths |
| 2021-05-25 12:24:31 | <merijn> | Most programmers with "windows is bad" takes just equate "not the same interface as linux, so I can't run my code unchanged" is the same as bad engineering |
| 2021-05-25 12:24:31 | <dminuoso> | I too preferred working with the (then) Win32 API over Linux. |
| 2021-05-25 12:24:36 | <boxscape> | frankly what I wrote just doesn't make much sense :) |
| 2021-05-25 12:24:37 | <Maxdamantus> | merijn: there are lots of extra special cases in Windows, like "nul" and "con". |
| 2021-05-25 12:24:39 | <dminuoso> | It was a mostly consistent and well documented API |
| 2021-05-25 12:24:47 | <Maxdamantus> | mkdir con |
| 2021-05-25 12:25:20 | <tomsmeding> | boxscape: you awakened something in this channel |
| 2021-05-25 12:25:26 | <boxscape> | I sure did |
| 2021-05-25 12:25:27 | <merijn> | dminuoso: Most of the complaints I've read in "windows is bad" discussions are just "windows is doing something different, for totally reasonable engineering reasons, but I hate it, because it's different" |
| 2021-05-25 12:25:42 | <dminuoso> | merijn: Right. |
| 2021-05-25 12:25:48 | <boxscape> | tomsmeding that always happens when someone mentions something involving text and merijn is here :P |
| 2021-05-25 12:25:55 | <merijn> | And it just annoys me. There's plenty of valid criticism of MS/Windows, but blindly asserting their code is badly engineered isn't one of them |
| 2021-05-25 12:26:19 | <dminuoso> | I mean there's a lot of things I hate about Windows and its user interface, but the programmatic interface I found enjoyable to work against ignoring the language itself. |
| 2021-05-25 12:26:23 | <merijn> | boxscape: And time when I'm here and... |
| 2021-05-25 12:26:25 | <Maxdamantus> | I think it's difficult to argue that Windows filenames are not based on accidental technologies that are now obsolete. |
| 2021-05-25 12:26:26 | × | eggplantade quits (~Eggplanta@2600:1700:bef1:5e10:c032:b754:d42c:78b5) (Ping timeout: 244 seconds) |
| 2021-05-25 12:26:37 | × | brandonh_ quits (~brandonh@mi-18-24-205.service.infuturo.it) (Quit: brandonh_) |
| 2021-05-25 12:26:40 | <merijn> | boxscape: My role is to just rant people out of bad decission making ;) |
| 2021-05-25 12:26:44 | <boxscape> | haha |
| 2021-05-25 12:26:53 | → | oxide joins (~lambda@user/oxide) |
| 2021-05-25 12:27:24 | × | argento quits (~argent0@168.227.96.51) (Quit: leaving) |
| 2021-05-25 12:27:25 | <Maxdamantus> | (particularly, UCS-2 .. and I suspect it was just some architectural mistake that results in "con" and "nul" and other random things being reserved filenames) |
| 2021-05-25 12:27:26 | <dminuoso> | merijn: Perhaps there's something to be said about when your product comes out of a single shop with paid engineers and clear design goals. A lot of linux is just decades of frankenstein. :) |
All times are in UTC.