Macil's recent activity
-
Comment on Microsoft reported to be sharply reducing planned data center investment worldwide in ~tech
-
Comment on DeepSeek’s safety guardrails failed every test researchers threw at its AI chatbot in ~tech
Macil (edited )Link ParentIn my opinion the main consequences of this are: If a company uses DeepSeek as a customer service bot or other kind of integrated assistant (ie code writing assistant), then it's possible the user...In my opinion the main consequences of this are:
- If a company uses DeepSeek as a customer service bot or other kind of integrated assistant (ie code writing assistant), then it's possible the user can make it say things that are embarrassing for the company.
- The AI is not fit to be used in a situation where it has control over resources and talks to people who don't have the same level of access to those resources, because the AI can be tricked into doing whatever the people say to it, despite whatever instructions it was deployed with. (For example, if you give an AI read access to an employee database and tell it that employees are allowed to ask what their own pay is, then it will be easy for an employee to convince the AI to tell them other employees' details. Currently to avoid this issue, you need to make it so the AI when talking to an employee only has the same level of access as the employee it's talking to. Modern AI is utterly vulnerable to confused deputy problems.)
- In the future, AI might be capable enough and able to tell novice users how to easily do very destructive things (run a ransomware campaign using new vulnerabilities, submit online orders to mRNA manufacturing services to synthesize novel diseases, etc).
Point 1 is probably the most currently relevant issue, which is kind of silly except that progress toward solving it may help progress toward points 2 and 3. Having point 2 solved would enable AI to be used much more easily and in many more situations. Only point 3 represents a direct public safety issue, but as long as models aren't that capable, progress on it is mostly just about preparation for a possible future.
Though I definitely agree that this article's framing is a bit disingenuous, because all current models are vulnerable to some degree to these issues and the other models' partial solutions don't really move the needle much on any of these points.
-
Comment on What programming/technical projects have you been working on? in ~comp
Macil (edited )Link ParentI used to have a problem of accumulating multiple game servers for things I rarely used, deciding to shut them down after 6 months just for the next week to be when friends and I get the sudden...I used to have a problem of accumulating multiple game servers for things I rarely used, deciding to shut them down after 6 months just for the next week to be when friends and I get the sudden itch to play it again, and then on multiple occasions discovering they had failing hardware by the time I used them again, leading to support tickets and waiting days for a new server and trying to understand how to reconstruct everything on a new server, so to me it's particularly valuable and freeing to have things fully automated for setting up a new server or spinning things down. Especially since my costs won't go up as I add any number of games to the mix. I am a little interested though if there are any good cheaper providers that still have pay-by-the-hour servers that can be created or deleted through an API, though at the small fraction of the average day I'm actually using the game server it wouldn't move my costs much.
-
Comment on What programming/technical projects have you been working on? in ~comp
Macil (edited )Link ParentI'm paying $5/month for Cloudflare Workers & Pages (though that's shared with some other small projects of mine). The VPS is a dual core 2gb ram $21/month instance. (Now that I'm not paying for...I'm paying $5/month for Cloudflare Workers & Pages (though that's shared with some other small projects of mine). The VPS is a dual core 2gb ram $21/month instance. (Now that I'm not paying for that for the whole month, I might try upgrading that to a more expensive instance which might be able to handle game saving and players connecting faster.) So it's not huge savings, but if we decide to add anything else like a Minecraft server for us then this system will really shine.
The spin up time seems like 2-3 minutes usually.
-
Comment on What programming/technical projects have you been working on? in ~comp
Macil (edited )LinkI made a webpage that my friend and I can use to start up or deactivate our Factorio server. The server is a DigitalOcean VPS that's paid for on an hourly basis, so I don't have to pay for it...I made a webpage that my friend and I can use to start up or deactivate our Factorio server. The server is a DigitalOcean VPS that's paid for on an hourly basis, so I don't have to pay for it whenever we have it deactivated. Not only do we save money during idle hours, but it's particularly valuable since we've been playing less frequently lately. Before, we felt pressured to play more often to justify the server costs, or have me deal with manual server creation/deletion procedures often. Now we can axe the server casually whenever and even my friend can start up the server whenever he wants without depending on my availability.
I'm a web developer, so part of my goal with this project was to practice using the same kinds of tools I'd use for real user-facing web apps. Otherwise I probably could've gotten away with sharing a pair of shell scripts with my friend (though that would mean sharing my DigitalOcean and Cloudflare DNS API keys) or putting something together in a workflow builder tool like Retool. I ended up making a web app using Typescript running on Cloudflare Workers & Pages for this.
Details
You still have to pay for DigitalOcean VPS servers while they're shut down, so it's not enough just to shut down the server. The server must be actually deleted in order to not be billed for it. The save data needs to be stored somewhere outside of the server so it doesn't get deleted too. Also when I create a new server, it has a new IP address, so I need to update a DNS record to point at the new server each time.
The DigitalOcean VPS is started with the Docker Linux image, an attached persistent 8gb ($0.80/month) DigitalOcean block storage device where the Factorio save data (and ssh host key) is stored, and a startup script which sets up a swap file and starts a community-maintained Factorio docker image.
There's a worker script hosted on Cloudflare Workers which exposes RPC methods for reading the current server state or setting the desired state. The worker forwards its method calls to a single Cloudflare Durable Object where the current power state and desired power state are tracked, because durable objects have good transaction support and websocket support (which I'm not using for the webpage yet but want to). When the desired power state is changed, it runs a Cloudflare Workflow to set up the server or delete it (and terminates any previously started workflow). I use Workflows because each process has several steps that may fail and need to be retried, and I want everything I make to be durable against server failure at any point (along the principles of crash-only architecture), which Workflows handles well too.
Details about the server creation and deletion workflows
The delete server workflow:
- uses Cloudflare's API to update the dns record for the Factorio server's subdomain to point to the invalid address 0.0.0.0. (I'm not sure if this is cleaner than just deleting the dns record; I think the dns record needs to keep existing for my low 60s TTL config on it to stay active.)
- uses the DigitalOcean API to send a shutdown command to the server.
- waits 30 seconds for a clean shutdown to happen.
- deletes the server.
- communicates that deletion is done to the worker.
The create server workflow:
- creates the server. If this fails because the same-named server already exists (which it might if a user deactivated then reactivated the server and the delete-server workflow was terminated before step 4), then if the server is powered on, continue to step 2, else delete the server and recreate it (because the server may have been shut down while it was running its first-time startup script and not have finished being configured to run Factorio. That was a fun bug to figure out! Thankfully the only bugs I've run into were related to incomplete server deletions and could always be worked around at the time by letting the server fully deactivate first.).
- gets the new server's IP address. (The server doesn't get an IP address immediately after being made, so this step usually has to retry for up to a few minutes.)
- updates the dns record to point to the server's IP address.
- waits a minute for the dns change to propagate.
- communicates the creation is done to the worker.
Then the actual website is maybe only a hundred real lines of code using Next.js, a fullstack React framework, and hosted on Cloudflare Pages. If you're not yet signed in to a whitelisted account, the page shows a Google sign-in button handled by the next-auth library, otherwise it shows the server's current state ("On", "Off", "Turning on...", or "Turning off...") and a button to toggle it. This project has a service binding to the Cloudflare worker above, so I can call its methods directly from React server components and server actions. (I'm very happy with how easy Next.js makes it to communicate across the client/server boundary and how easy Cloudflare bindings make it to communicate across the server/worker boundary.) The turn-on/off button works through a normal html form, so the page doesn't even require the browser to use javascript except for auto-updating the displayed status.
-
Comment on Microsoft says having a TPM is "non-negotiable" for Windows 11 in ~tech
Macil (edited )Link ParentTPMs are also used to encrypt your saved passwords in your browser (and other programs that use the right OS APIs) so that they can only be decrypted if you've logged in to the OS with your...TPMs are also used to encrypt your saved passwords in your browser (and other programs that use the right OS APIs) so that they can only be decrypted if you've logged in to the OS with your password or PIN, and to set limits on password/PIN attempts so that you can use a short password/PIN locally and still benefit from encryption. This helps protect your data if your computer is physically stolen. Without a TPM, there's no way you could have a short 4-digit PIN that safely encrypts your saved passwords because they'd be too easy to brute-force.
-
Comment on Introducing ChatGPT Pro in ~tech
Macil LLMs can actually be decent at making jokes depending the prompt. I think the trick is to get it out of writing in the "assistant" voice and to get it to lean into absurd mash-ups. Here are my...LLMs can actually be decent at making jokes depending the prompt. I think the trick is to get it out of writing in the "assistant" voice and to get it to lean into absurd mash-ups. Here are my results of telling Claude 3 (and GPT 4, which wasn't as good) to write tweets in the style of dril, a popular Twitter user: https://bsky.app/profile/macil.tech/post/3kpcvicmirs2v
-
Comment on Introducing ChatGPT Pro in ~tech
Macil 4o is much cheaper and quicker than o1 while being just as good for a lot of tasks. o1 is only better for certain tasks. On the developer side, there's been a lot of trade-offs in choosing between...4o is much cheaper and quicker than o1 while being just as good for a lot of tasks. o1 is only better for certain tasks.
On the developer side, there's been a lot of trade-offs in choosing between different LLMs for a while, with a whole spectrum from cheap+quick+dumb models to expensive+slow models. I've been surprised that the consumer applications (ChatGPT, Claude, Gemini) have stuck to a single subscription option for so long instead of offering multiple price points as ChatGPT is doing now.
-
Comment on Introducing ChatGPT Pro in ~tech
Macil (edited )Link ParentThe $20/month Plus subscription has strict limits on o1 and Advanced Voice Mode usage. Some users have tasks that only these features are able to accomplish. The new $200/month Pro subscription...The $20/month Plus subscription has strict limits on o1 and Advanced Voice Mode usage. Some users have tasks that only these features are able to accomplish. The new $200/month Pro subscription gives you more than 10x those limits (unlimited).
-
Comment on Introducing ChatGPT Pro in ~tech
Macil The o1 model has special training that 4o doesn't to help it be more productive at making progress in solving problems while it writes text to itself, but your experience matches mine. From the...The o1 model has special training that 4o doesn't to help it be more productive at making progress in solving problems while it writes text to itself, but your experience matches mine. From the benchmarks it appears there are certain kinds of multi-step problems that o1 uniquely excels at, but for a lot of other stuff it ends up being only as good as 4o while being slower.
-
Comment on Introducing ChatGPT Pro in ~tech
Macil (edited )Link ParentDid ChatGPT itself tell you that it can't do something without you paying? That absolutely sounds like a mistake by it that you shouldn't trust. In general it and its free version have only gotten...Did ChatGPT itself tell you that it can't do something without you paying? That absolutely sounds like a mistake by it that you shouldn't trust. In general it and its free version have only gotten better over time, and to my knowledge it's never been instructed to try to upsell people in conversation. Just start a new conversation and ask it again. (Also consider checking the ChatGPT memories in the settings page to make sure it hasn't recorded a misleading memory about the previous conversation.)
You may have planted the idea that you need to pay more into the conversation and it was too agreeable with that. It is a common issue for ChatGPT to believe something to a fault once it has been said in the conversation.
-
Comment on James Webb Space Telescope finds stunning evidence for alternate theory of gravity in ~space
Macil (edited )Link ParentThis page doesn't say anything about how MOND could explain the unique gravitational lensing of the Bullet Cluster, which I have believed to be the main reason people think the Bullet Cluster is...This page doesn't say anything about how MOND could explain the unique gravitational lensing of the Bullet Cluster, which I have believed to be the main reason people think the Bullet Cluster is evidence for dark matter. Instead the post seems to focus solely on galaxy collision speeds and just says that part works with both theories. I don't feel like the page does a good job at justifying the author's confidence in MOND over dark matter.
-
Comment on Why is Google Gemini saying we should die? in ~tech
Macil It's possible that there's some kind of context missing in the log we're seeing. There was an incident a few months ago where people posted a ChatGPT log (linked from the official website) that...It's possible that there's some kind of context missing in the log we're seeing. There was an incident a few months ago where people posted a ChatGPT log (linked from the official website) that seemed to show that ChatGPT initiated the conversation with the user, which is not expected behavior. OpenAI later confirmed that because of a glitch, the logs would sometimes be missing certain messages from the user, and in this case the glitch had dropped the first message in the conversation from the user, making it look like ChatGPT initiated the conversation on its own somehow.
There might be ways for the user to feed text to Gemini which don't show up in this chat log interface. ChatGPT has a "memory" feature where it has a scratchpad of text for itself that's kept between conversations, and it's possible for a user to tell ChatGPT to write a memory to itself telling it to react a certain way to a certain situation. I'm not familiar with the Gemini chat app but it might have some way for the user to smuggle messages to the chatbot outside of the view of the logs. It would be good if the page for the Gemini chat logs was updated to indicate if this might be the case as ChatGPT's does.
It's also possible that no context is missing and that this chat is fully legitimate as it appears. LLMs can be sassy as fuck. Usually I've seen it just when the chatbot is goaded into it by the user but it's possible that the conversation was so unusual that the bot didn't think a normal response fit the context.
It's also possible that there is a secret message encoded by the user within the conversation we see that tells the bot to act like this. I've seen LLMs discover messages hidden in the first word of each sentence or paragraph.
-
Comment on AirPods or not? in ~music
Macil That's unlucky. I've recently upgraded to the Pixel Buds Pro 2 from the previous Pixel Buds Pro, and they finally fit me really well. The previous ones were a little too heavy and wouldn't keep a...That's unlucky. I've recently upgraded to the Pixel Buds Pro 2 from the previous Pixel Buds Pro, and they finally fit me really well. The previous ones were a little too heavy and wouldn't keep a good seal in my ears over longer periods of time. The Pixel Buds Pro 2 are the first ones of the line I've considered excellent enough to recommend to others.
-
Comment on Bitwarden switches password manager and SDK to GPL3 after FOSS-iness drama in ~tech
Macil (edited )LinkIt's weirdly common for projects to decide they want to be open source and then make their own license that isn't compatible with anything, causing everyone to steer clear of the source code. I...It's weirdly common for projects to decide they want to be open source and then make their own license that isn't compatible with anything, causing everyone to steer clear of the source code. I think there's a lot of business leaders that have heard from their employees/news/etc that open source is good, but then don't know that the correct way to do it is to pick an existing popular license that fits what they want instead of telling some lawyers to write something up on their own.
Pretty much everyone wanting to make something open source should just pick between MIT (maybe dual-licensed with Apache 2.0), GPL, AGPL, or maybe BUSL if your main concern is just guaranteeing customers an exit path if you go out of business. Doing anything other than that is just putting in more effort only for a worse result for yourself and everyone.
-
Comment on NRO chief: “You can’t hide” from our new swarm of SpaceX-built spy satellites in ~space
Macil I wonder if SpaceX and Elon being involved comes with any limitations included in how they're used, like how Elon briefly turned off some of Ukraine's Starlink access to protect Russia. It's...I wonder if SpaceX and Elon being involved comes with any limitations included in how they're used, like how Elon briefly turned off some of Ukraine's Starlink access to protect Russia. It's concerning that Elon had that kind of control with his connections.
-
Comment on <deleted topic> in ~tech
Macil (edited )Link ParentAssuming it doesn't work by pure coincidence, we have to call its capability to often work step-by-step to get right answers or generate working code for novel situations something, and it's not...Assuming it doesn't work by pure coincidence, we have to call its capability to often work step-by-step to get right answers or generate working code for novel situations something, and it's not like "reasoning" is a term defined so rigorously to obviously exclude what we see here. The term being used here doesn't inherently mean that there's a rich inner life or inner monologue going on in the model.
-
Comment on A new AI model can hallucinate a game of 1993’s DOOM in real time in ~games
Macil (edited )LinkIt would be really interesting to train it on a weird mix of things and have it hybridize them, like two different first-person games. I notice the geometry and contents of the space the player is...It would be really interesting to train it on a weird mix of things and have it hybridize them, like two different first-person games.
I notice the geometry and contents of the space the player is navigating isn't fully consistent. I wonder if just scaling the model up could fix that, or if techniques will be created to help the model plan and stick to consistent 3d spaces will be developed. I could imagine a system like this might benefit from having a part of the model be specialized for gaussian splatting or another neural net based 3d rendering technique.
It would be cool if a system like this could be made so that you could ask it for things in the game in real-time, like you could make a request about the design of the next room you encounter, or have it create a new enemy on the fly. I feel pretty confident this will exist at some point, at least in a very janky form at first, and the only real question is when.
-
Comment on A new AI model can hallucinate a game of 1993’s DOOM in real time in ~games
Macil The article says it's running on a single TPU, and those seem to have similar specs to a high-end GPU.The article says it's running on a single TPU, and those seem to have similar specs to a high-end GPU.
-
Comment on Disney seeking dismissal of Raglan Road death lawsuit because victim was Disney+ subscriber in ~news
Macil I was half thinking about getting Disney+ but honestly I'm not very sold on this "Disney is legally allowed to kill its customers" part of the deal.I was half thinking about getting Disney+ but honestly I'm not very sold on this "Disney is legally allowed to kill its customers" part of the deal.
He's been saying for years that OpenAI is just about to shut down, that they're going to miserably fail to hit their funding targets, and that AI research is fundamentally petering out and will never surpass its current abilities, which has all been completely wrong repeatedly. [1][2] I think he panders to anti-tech-industry readers who want to hear confirmation that it's all empty hype that's imminently crashing down.