Home liberachat/#haskell: Logs Calendar

Logs: liberachat/#haskell

←Prev  Next→ 1,790,295 events total
2026-03-19 16:48:26 <Freakie> I know, I was just off by a byte factor
2026-03-19 16:48:35 <EvanR> and ghc usually uses a whole word for any field
2026-03-19 16:49:13 <Freakie> but basically I have two algorithms that work on each other. both consume two lists of one type and accumulate two other lists of another type as output
2026-03-19 16:49:33 <EvanR> accumulate?
2026-03-19 16:49:45 <Freakie> I mean, just consing
2026-03-19 16:49:50 <EvanR> by iteratively pushing to the beginning
2026-03-19 16:49:58 <Freakie> yes?
2026-03-19 16:50:02 <EvanR> ok
2026-03-19 16:50:05 <EvanR> then what
2026-03-19 16:50:31 <Freakie> then that continues till the input lists have been consumed
2026-03-19 16:51:03 <EvanR> it matters how these accumulated lists are consumed
2026-03-19 16:51:28 <Freakie> the output lists are only ever added to during the algorithms, and the input lists are only ever traversed linearly
2026-03-19 16:51:46 <EvanR> if the output list is never consumed
2026-03-19 16:52:03 <Freakie> I mean they have to by construction of my benchmark
2026-03-19 16:52:15 <EvanR> it's either unobservable or a memory leak
2026-03-19 16:52:27 <Freakie> they're fed straight into another iteration of the algorithms
2026-03-19 16:53:26 <Freakie> heap profiling shows that 80-90% of the data recorded comes from building the output in the one algorithm, but I suspect it shows that because the lists are being retained in the other algorithm and possibly not deallocated on the run?
2026-03-19 16:53:43 <EvanR> yes that would be my suspicion
2026-03-19 16:54:08 <Freakie> by design the algorithms only work on lists and trees so it's bound to be inefficient but but scaling shouldn't be this bad
2026-03-19 16:55:55 <int-e> sm: Well, I implemented a stupid thing: https://silicon.int-e.eu/lambdabot/State/where.html (static page, decodes where file in browser, no search, also not tested much)
2026-03-19 16:56:13 <Freakie> I aborted the program after an hour and at that point only 5% of the runtime not spent on garbage collection which is not surprising given how much data is live and I don't have any compaction
2026-03-19 16:58:15 × Jackneill quits (~Jackneill@188-143-82-16.pool.digikabel.hu) (Remote host closed the connection)
2026-03-19 16:58:45 Jackneill joins (~Jackneill@188-143-82-16.pool.digikabel.hu)
2026-03-19 16:59:02 × Jackneill quits (~Jackneill@188-143-82-16.pool.digikabel.hu) (Remote host closed the connection)
2026-03-19 16:59:10 <EvanR> retaining long lists live when they don't have to be is usually caused by the algorithm consuming lists in a suboptimal way. But it's also possible that's unavoidable because of the algorithm
2026-03-19 16:59:43 <Freakie> I can most likely refactor it to include compaction but that doesn't really help me reduce the size of the live memory to begin with
2026-03-19 17:00:01 <monochrom> [Int] of length n, even in the worst case (which doesn't happen often), takes only n*(24+8) bytes. (Often the +8 doesn't happen because GHC-compiled code actually does a flyweight pattern for small Ints.)
2026-03-19 17:00:37 <Freakie> I've been trying for a while to find some way of actually tracking the size of data during runtime
2026-03-19 17:00:46 <Freakie> it seems like I have precious few alternatives at least
2026-03-19 17:01:53 <monochrom> Profiling is only as good as the person who interprets the profiling report. >:)
2026-03-19 17:02:15 <Freakie> well in that case I am an imposter and a cad
2026-03-19 17:05:01 <EvanR> monochrom, that sounds like an underestimate for a list node + Int box? or is list node / Int specially implemented to reduce space
2026-03-19 17:06:01 <monochrom> 24 is already the node, 8 is already the Int
2026-03-19 17:06:02 <EvanR> 8+8+8+8, would be the bare minimum without any runtime info
2026-03-19 17:06:28 <monochrom> Oh oops s/8/16/ haha
2026-03-19 17:07:04 <gentauro> btw, do you have re-use of pattern-matchin cases in Haskell (like we have en OCaml/F#)?. Example: `type FooBarBazQux = Foo | Bar | Baz | Qux;; let fooBarBazQux = function | Foo | Bar -> "foobar" | Baz | Qux -> "bazqux";;`
2026-03-19 17:07:09 <monochrom> But that still doesn't cause n=10000 to become 1GB. So the hypothesis that n=10000 is rejected.
2026-03-19 17:07:42 <Freakie> I mean I have empirical data on the total number of nodes generated
2026-03-19 17:07:45 <gentauro> it seems that `case … of` requires to define duplicate code for `Foo and Bar` and then `Baz and Qux`
2026-03-19 17:08:01 <Freakie> which, ok is closer to 100k when I reach 1 gb
2026-03-19 17:08:21 <EvanR> gentauro, probably for the best. I really like reading haskell's "redundant" code
2026-03-19 17:08:35 <Freakie> gentauro haskell doesn't have disjoint patterns but I tend to use guard patterns instead
2026-03-19 17:08:45 <EvanR> instead of mentally decompressing it after someone was "too lazy to type it all out"
2026-03-19 17:09:01 <Freakie> that feels a little like a strawman but ok
2026-03-19 17:09:21 <gentauro> EvanR: I'm kind of used to the `SML` way :)
2026-03-19 17:09:43 <Freakie> I tend to prefer haskell's syntax for the most part
2026-03-19 17:09:55 <gentauro> you know `dry` (Don't repeat yourself)
2026-03-19 17:09:56 <EvanR> the difference between a straight list of cases, and the ruby way of having algorithms generate the code or values
2026-03-19 17:10:11 <EvanR> yeah I have often see "dry" be bad in haskell
2026-03-19 17:11:06 <haskellbridge> <sm> int-e https://silicon.int-e.eu/lambdabot/State/where.html - very nice! Much simpler than https://haskell-links.org . How to make it discoverable ?
2026-03-19 17:11:26 <EvanR> you do have pattern synonyms and view patterns
2026-03-19 17:11:40 <haskellbridge> <sm> @where+ links https://silicon.int-e.eu/lambdabot/State/where.html I guess
2026-03-19 17:12:21 <Leary> gentauro: https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/or_patterns.html
2026-03-19 17:12:49 <Freakie> oh neat
2026-03-19 17:14:31 <int-e> @where+ where https://silicon.int-e.eu/lambdabot/State/where.html
2026-03-19 17:14:31 <lambdabot> I will never forget.
2026-03-19 17:15:04 <gentauro> Leary: nice
2026-03-19 17:15:39 <gentauro> this will work with minimalistic sum types due to -> "If we want the compiler to aid us in Haskell2010, we must write out all cases explicitly".
2026-03-19 17:15:58 <gentauro> unless "underscore wildcard" can be used.
2026-03-19 17:15:59 <int-e> (not opposed to changing the `links` entry, but for now it still works)
2026-03-19 17:17:54 <Leary> Freakie: BTW, if you want good scaling, you probably need to write your algorithm to stream output as input comes in. You might be able to achieve that by building the output lists outside-in rather than inside-out, but the safer path would be to use a streaming library.
2026-03-19 17:18:13 <sm> 👍🏻 and now added at https://joyful.com/Haskell+map#other+docs
2026-03-19 17:18:27 <Freakie> technically I'm doing a thesis on algorithms designed to work efficiently on external memory which implies streaming by design
2026-03-19 17:18:51 <Freakie> how do you mean outside-in though?
2026-03-19 17:21:42 <Leary> When a recursive function `go` wants to yield an element `x`, do so by producing `x : go (...)`.
2026-03-19 17:23:10 <Freakie> so the whole tail recursion modulo cons business?
2026-03-19 17:24:04 × merijn quits (~merijn@77.242.116.146) (Ping timeout: 245 seconds)
2026-03-19 17:24:25 <Leary> Maybe; I don't think from that angle. The important part is that the recursive call is guarded behind the `:` constructor, so it doesn't run until after `x` is consumed.
2026-03-19 17:24:26 <haskellbridge> <sm> A new entry for the Haskell Book list, tcard__ : https://www.reddit.com/r/haskell/comments/1ry38dn/haskell_brain_teasers_is_now_available/
2026-03-19 17:24:43 <Freakie> i'll try to keep it in mind
2026-03-19 17:25:03 <Freakie> at any rate I'm probably going to have to pivot by necessity to another part of the project and let the performance be what it is
2026-03-19 17:25:18 <Freakie> it's just been bothering me for a while that I can't explain how hungry the algorithms are
2026-03-19 17:26:20 <pipsquak-bird> why is computer job market so jacked now? india outsourcing and vendors trying to stamp out linux?
2026-03-19 17:26:39 <Freakie> more like executives bought into the AI hype
2026-03-19 17:26:44 <pipsquak-bird> heh
2026-03-19 17:26:53 <pipsquak-bird> AI creates houses and megadorms!
2026-03-19 17:27:02 <pipsquak-bird> free room food and net!
2026-03-19 17:27:39 ChanServ sets mode +o monochrom
2026-03-19 17:27:44 monochrom sets mode +b *!*@c-71-235-170-34.hsd1.ma.comcast.net
2026-03-19 17:27:44 pipsquak-bird is kicked by monochrom (pipsquak-bird)
2026-03-19 17:27:56 <Freakie> lmao tho
2026-03-19 17:28:05 <Franciman> megadorms
2026-03-19 17:28:24 <Freakie> Leary so I finally found a way to track the memory of values in runtime which also triggers a garbage collection before measuring and it seems to have completely squashed memory usage (at least within reason) at the cost of, uh, time
2026-03-19 17:28:38 <Freakie> so I guess it's just a space leak somewhere after all
2026-03-19 17:30:16 <Freakie> one complication of what you suggested is that I always return a pair where only one of the elements is updated
2026-03-19 17:30:27 <Freakie> will that matter in terms of what you said?
2026-03-19 17:30:53 monochrom sets mode -o monochrom
2026-03-19 17:33:43 <Leary> Freakie: Probably, but you can change out `([a], [b])` for `[Either a b]`.
2026-03-19 17:34:58 <Freakie> how will that relate to guarding behind a constructor?
2026-03-19 17:36:31 ft joins (~ft@p508db341.dip0.t-ipconnect.de)
2026-03-19 17:37:35 <Leary> Well it's a single list, so the rest of the computation is always behind the next cons cell, and it knows what it's already done. With two lists, you would have two independent computations instead.
2026-03-19 17:38:23 <Freakie> oh you meant a list of eithers
2026-03-19 17:38:45 <Freakie> I don't think that's sound
2026-03-19 17:39:37 <Freakie> only one of the pair elements is a list, the other is a priority queue
2026-03-19 17:41:36 <Freakie> but yeah it turns out that stuff isn't being garbage collected
2026-03-19 17:41:58 <Freakie> which is progress at least
2026-03-19 17:42:28 <Leary> Does it actually need to be returned? I suspect you can just thread the queue state as an argument to the recursive call (or other continuation).
2026-03-19 17:43:43 <Freakie> do you mean with a stateful priority queue?
2026-03-19 17:45:05 <Freakie> both need to be returned because the priority queues make up with invariants of the algorithm
2026-03-19 17:45:06 tri joins (~tri@69.74.159.34)

All times are in UTC.