• Activity
  • Votes
  • Comments
  • New
  • All activity
    1. Programming Challenge: Freestyle textual analysis.

      I just realized that I completely glossed over this week's programming challenge. For this week, let's do something more flexible: write a program that accepts a file as input--either through a...

      I just realized that I completely glossed over this week's programming challenge. For this week, let's do something more flexible: write a program that accepts a file as input--either through a file name/path or through standard input--and perform any form of analysis you want on its contents. That's it!

      For instance, you could count the occurrences of each word and find the most common ones, or you could determine the average sentence length, or the average unique words per sentence. You could even perform an analysis on changes in words and sentence structure over time (could be useful for e.g. poetry where metre may play an important role). You can stick with simple numbers or dive right into the grittiest forms of textual analysis currently available. You could output raw text or even a graphical representation. You could even do a bit of everything!

      How simple or complex your solution ends up being is completely up to you, but I encourage you to challenge yourself by e.g. learning a new language or about different textual analysis techniques, or by focusing on code quality rather than complexity, or even by taking a completely different approach to the problem than you ordinarily would. There are a lot of learning opportunities available here.

      11 votes
    2. Thoughts on a ~comp survey of some sort?

      After seeing the "what OS do you use?" thread earlier, I was wondering what everyone here on ~comp would think of a sort of group demographics survey. I think that it would be super interesting to...

      After seeing the "what OS do you use?" thread earlier, I was wondering what everyone here on ~comp would think of a sort of group demographics survey. I think that it would be super interesting to see the data on things like preferred OS, main programming language, preferred text editor/IDE, device OEM, etc.

      14 votes
    3. Programming Challenge - Let's build some AI!

      Hi everyone! In this challenge, we will build simple genetic algorithm. The goal is to create genetic algorithm that will learn and output predefined text ("Hello World!"). The goal can be...

      Hi everyone! In this challenge, we will build simple genetic algorithm.

      The goal is to create genetic algorithm that will learn and output predefined text ("Hello World!").

      The goal can be achieved with any language and you'll need just simple loops, collection and knowledge how to create and use objects, even beginners can try to complete this challenge.

      How?

      I'll try to explain it as best as I can. Genetic algorithms are approximation algorithms - they often do not find the best solution, but they can find very good solutions, fast. It's used when traditional algorithms are either way too slow, or they even don't exist. It's used to, for example, design antennas, or wind turbines. We will use it to write "Hello World".

      First of all, we define our Entity. It is solution to given problem, it can be list of integers that describe antenna shape, decision tree, or string ("Hello World"). Each entity contains the solution (string solution) and fitness function. Fitness function says, how good our entity is. Our fitness function will return, how similar is entity solution text to "Hello World" string.

      But how will the program work? First of all, we will create list of entities List<Entity>. We will make, for example, 1000 entities (randomly generated). Their Entity.solution will be randomized string of length 11 (because "Hello World" is 11 characters long).

      Once we have these entities, we will repeat following steps, until the best entity has fitness == 1.0, or 100% similarity to target string.

      First of all, we compute fitness function of all entities. Then, we will create empty list of entities of length 1000. Now, we will 1000-times pick two entities (probably weighted based on their fitness) and combine their strings. We will use the string to create new entity and we will add the new entity to the new list of entities.

      Now, we delete old entities and replace them with entities we just made.

      The last step is mutation - because what if no entity has the "W" character? We will never get our "Hello World". So we will go through every entity and change 5% (or whatever number you want) of characters in their solution to random characters.

      We let it run for a while - and it is done!

      So to sum up what we did:

      entities <- 1000 random entities
      while entities.best.fitness < 1:
        for every entity: compute fitness
        newEntities <- empty list
        1000-times:
          choose two entities from "entities", based on their fitness
          combine solutions of these entities and make newEntity
          newEntities.add(newEntity)
        for every entity: mutate // Randomly change parts of their strings
      
      print(entities.best.solution) // Hello World!
      

      Now go and create the best, fastest, and most pointless, genetic algorithm we've ever seen!

      23 votes
    4. File sharing over a network

      Me and my friend arrive at an arbitrary place, we have access to a network from there. Now, we want to share a file and the network connection is all we have. The challenge: make the file go from...

      Me and my friend arrive at an arbitrary place, we have access to a network from there. Now, we want to share a file and the network connection is all we have. The challenge: make the file go from my device to my friends device in a pure p2p setting. If you know, for sure, that incoming connections are allowed this is very simple but here i want to explore which solutions exist that do not assume this.

      Assumptions:

      • Same network altough possibly different access points (one might be wired and the other wireless)
      • We have no prior knowledge about the network, incoming traffic might be blocked (outgoing isn't for sure)
      • No extra machines can aid in the transaction (no hole punching etc)
      • Should work reliably for any kind of device that you have free -- as in freedom -- control over. that is PCs, android phones/tablets and macs. most of Apple's other hardware can be excluded because they don't allow for anything anyway.
      • hard mode: We are both digitally illiterate

      Goal:

      • Send a file, p2p, from one party to another.

      Me (MSc cs) and my friend (PhD cs) tried to do this last week. And it appears to be among the hardest problems in CS. I would like to discuss this and hear which solutions you might have for this problem.

      Edits:

      1. this is not an assignment
      2. Added some specifics to the assumption set
      3. we're looking for practical solutions here.
      4. more specs
      10 votes
    5. Have any of you set up GPU passthrough for a virtual machine?

      Right now I dual boot windows 10 and fedora, windows for gaming, fedora for everything else. I'm considering running linux as my only native operating system, and running windows in a virtual...

      Right now I dual boot windows 10 and fedora, windows for gaming, fedora for everything else. I'm considering running linux as my only native operating system, and running windows in a virtual machine for gaming. This will be more convenient than restarting my pc every time I want to play a game, and I'll feel better about having windows sandboxed in a VM than running natively on my computer.

      To get gaming performance out of a virtual machine, I'm planning to have two gpus. One for linux to use, and one reserved exclusively for the virtual machine.

      Have any of you set up a computer like this before? What was your experience like? How was the performance?

      16 votes
    6. Programming Challenge: Construct and traverse a binary tree.

      It's that time of week again! For this week's programming challenge, I thought I would focus on data structures. Goal: Construct a binary tree data structure. It may handle any data type you...

      It's that time of week again! For this week's programming challenge, I thought I would focus on data structures.

      Goal: Construct a binary tree data structure. It may handle any data type you choose, but you must handle sorting correctly. You must also provide a print function that prints each value in the tree on a new line, and the values must be printed in strictly increasing order.

      If you're unfamiliar with this data structure, it's a structure that starts with a single "root" node, and this root can have a left child, right child, both, or neither. Each of these child nodes is exactly the same as the root node, except the root has no parent. This branching structure is why it's called a tree. Furthermore, left descendants always have values smaller than the parent, and right descendants always have larger values.

      12 votes
    7. Full blown SSH servers within Docker containers?

      Trying to get a sense on how the networking would go down? If I had one public IP address and say 4 Docker containers on the host, how would the SSH connections work? Would I have to reserve ports...

      Trying to get a sense on how the networking would go down?

      If I had one public IP address and say 4 Docker containers on the host, how would the SSH connections work? Would I have to reserve ports for each container?

      7 votes
    8. Favorite data visualization toolset?

      I'm primarily a non-programmer these days, but have a fairly extensive background in statistical analysis - seeking recommendations for best/cheapest/easiest-to-learn data visualization tools. I...

      I'm primarily a non-programmer these days, but have a fairly extensive background in statistical analysis - seeking recommendations for best/cheapest/easiest-to-learn data visualization tools. I have access to PowerBI and Tableau through work, but any other recommendations are welcome. You can take the SQL-family relational database query skills for granted, but not necessarily noSQL, Hadoop or the other popular big data sources.

      9 votes
    9. Public access Unix systems, another alternative social environment

      I have been writing a paper on the history of a type of online social space called public access Unix systems, and I'm posting a Tildes-tailored summary here in case anyone is interested. If you...

      I have been writing a paper on the history of a type of online social space called public access Unix systems, and I'm posting a Tildes-tailored summary here in case anyone is interested. If you enjoy this and want to read more (like 10+ pages more) look at the bottom of this post for a link to the main paper-- it has citations, quotes, and everything, just like a real pseudo-academic paper!

      I wrote this because a summary didn't exist and writing it was a way for me to learn about the history. It was not written with the intent of commercial publication, but I'd still love to share it around and get more feedback, especially if that would help me further develop the description of this history and these ideas. If you have any thoughts about this, please let me know.

      What are Public Access Unix Systems?

      When the general public thinks of the Unix operating system (if it does at all), it probably isn't thinking about a social club. But at its core, Unix has a social architecture, and there is a surprisingly large subculture of people who have been using Unix and Unix-like operating systems this way for a long time.

      Public access Unix systems are multi-user systems that provide shell accounts to the general public for free or low cost. The shell account typically provides users with an email account, text-based web browsers, file storage space, a directory for hosting website files, software compilers and interpreters, and a number of tools for socializing with others on the system. The social tools include the well-known IRC (Internet Relay Chat), various flavors of bulletin-board systems, often a number of homegrown communication tools, and a set of classic Unix commands for finding information about or communicating with other system users.

      But more than just mere shell providers, public access Unix systems have always had a focus on the social community of users that develops within them. Some current systems have been online for several decades and many users have developed long-standing friendships and even business partnerships through them. i.e. they're a lot of fun and useful too.

      Of interest to Tildes members is that public access Unix systems have for the most part been non-commercial. Some take donations or charge membership fees for certain tiers of access (some in the U.S. are registered 501(c)(3) or 501(c)(7) non profits). They almost invariably do not take advertising revenue, do not sell user profile data, and the user bases within them maintain a fairly strong culture of concern about the state of the modern commercial Internet.

      This concept of a non-commercial, socially aware, creative space is what really got me interested in the history of these systems. Further, the fact that you have this socially aware, technically competent group of people using and maintaining a medium of electronic communication seems particularly important in the midst of the current corporate takeover of Internet media.

      History

      Public access Unix systems have been around since the early 1980's, back when most of the general public did not have home computers, before there was a commercial Internet, and long before the World Wide Web. Users of the early systems dialed in directly to a Unix server using a modem, and simultaneous user connections were limited by the number of modems a system had. If a system had just one modem, you might have to dial in repeatedly until the previous user logged off and the line opened up.

      These early systems were mostly used for bulletin-board functionality, in which users interacted with each other by leaving and reading text messages on the system. During this same time in the early 80's, other dial-in systems existed that were more definitively labeled "BBSes". Their history has been thoroughly documented in film (The BBS Documentary by Jason Scott) and in a great Wikipedia article. These other systems (pure BBSes) did not run the Unix OS and many advanced computer hobbyists turned up their noses at what they saw as toyish alternatives to the Unix OS.

      Access to early dial-in public access Unix systems was mostly constrained by prohibitively expensive long-distance phone charges, so the user bases drew from local calling areas. The consequence was that people might meet each other online, but there was a chance they could end up meeting in person too because they might literally be living just down the street from each other.

      The first two public access Unix systems were M-Net (in Ann Arbor, MI) and Chinet (in Chicago, IL), both started in 1982. By the late 1980's, there were more than 70 such systems online. And at their peak in the early 1990's, a list of public access Unix systems shared on Usenet contained well over 100 entries.

      Throughout the 1980's, modem speeds and computer power increased rapidly, and so did the functionality and number of users on these systems. But the 1990's were a time of major change for public access Unix systems. In 1991, the Linux operating system was first released, ushering in a new era of hobbyist system admins and programmers. And new commercial services like AOL, Prodigy and CompuServe brought hordes of new people online.

      The massive influx of new people online had two big impacts on public access Unix systems. For one, as access became easier, online time became less precious and people were less careful and thoughtful about their behavior online. Many still describe their disappointment with this period and their memory of the time when thoughtful and interesting interactions on public access Unix systems degraded to LOLCAT memes. In Usenet (newsgroups) history, the analogous impact is what is referred to as "The Eternal September".

      The second impact of this period was from the massive increase of computer hobbyists online. Within this group were a small but high-impact number of "script kiddies" and blackhat hackers that abused the openness of public access Unix systems for their own purposes (e.g. sending spam, hacking other systems, sharing illegal files). Because of this type of behavior, many public access Unix systems had to lock down previously open services, including outbound network connections and even email in some cases.

      For the next decade or so, public access Unix systems continued to evolve with the times, but usership leveled off or even decreased. The few systems that remained seemed to gain a particular sense of self-awareness in response to the growing cacophony and questionable ethics of the commercial World Wide Web. This awareness and sense of identity continues to this day, and I'll describe it more below because I think it is really important, and I expect Tildes members agree.

      2014 and Beyond

      In 2014, Paul Ford casually initiated a new phase in the history of public access Unix systems. He registered a URL for tilde.club (http://tilde.club) and pointed it at a relatively unmodified Linux server. (Note: if there is any relation between tilde.club and Tildes.net, I don't know about it.) After announcing via Twitter that anyone could sign up for a free shell account, Ford rapidly saw hundreds of new users sign up. Somehow this idea had caught the interest of a new generation. The system became really active and the model of offering a relatively unmodified *NIX server for public use (a public access Unix system under a different name) became a "thing".

      Tilde.club inspired many others to open similar systems, including tilde.town, tilde.team* and others which are still active and growing today. The ecosystem of these systems is sometimes called the tilde.verse. These systems maintain the same weariness of the commercial WWW that other public access Unix systems do, but they also have a much more active focus on building a "radically inclusive" and highly interactive community revolving around learning and teaching Unix and programming. These communities are much, much smaller than even small commercial social networks, but that is probably part of their charm. (* full disclosure, I wield sudo on tilde.team.)

      These tilde.boxes aren't the only public access Unix systems online today though. Many others have started up in the past several years, and others have carried on from older roots. One of the most well known systems alive today is the Super Dimension Fortress (SDF.org) that has been going strong for over three decades. Grex.org and Nyx.net have been online for nearly as long too. And Devio.us is another great system, with a community focused around the Unix OS, particularly OpenBSD. Not all these systems label themselves as "public access Unix systems", but they all share the same fundamental spirit.

      One system that I find particularly interesting is Hashbang (aka #!, https://hashbang.sh). Hashbang is a Debian server run and used by a number of IT professionals who are dedicated to the concept of an online hackerspace and training ground for sysadmins. The system itself is undergoing continual development, managed in a git repository, and users can interact to learn everything from basic shell scripting to devops automation tooling.

      Why is Hashbang so cool? Because it is community oriented system in which users can learn proficiency in the infrastructural skills that can keep electronic communications in the hands of the people. When you use Facebook, you don't learn how to run a Facebook. But when you use Hashbang (and by "use", I mean pour blood, sweat and tears into learning through doing), you can learn the skills to run your own system.

      Societal role

      If you've read other things I've written, or if you've interacted with me online, then you know that I feel corporate control of media is a huge, huge concern (like Herman and Chomsky type concern). It's one of the reasons I think Tildes.net is so special. Public access Unix systems are valuable here too because they are focused on person-to-person connections that are not mediated by a corporate-owned infrastructure, and they are typically non-profit organizations that do not track and sell user data.

      You're no doubt aware of the recent repeal of Net Neutrality laws in the U.S., and you're probably aware of what The Economist magazine calls "BAADD" tech companies (big, anti-competitive, addictive and destructive to democracy). One of the most important concerns underlying all of this is that corporations are increasingly in control of our news media and other means of communication. They have little incentive to provide us with important and unbiased information. Instead, they have incentive to dazzle us with vapid clickbait so that we can be corralled past advertisements.

      Public access Unix systems are not the solution to this problem, but they can be part of a broader solution. These systems are populated by independently minded users who are skeptical of the corporate mainstream media, and importantly, they teach about and control the medium of communication and social interaction itself.

      Unix as a social medium

      So what is it that makes public access Unix systems different? This seems like a particularly interesting question relative to Tildes (so interesting that I even wrote another Tildes post about it). My argument is partly that Unix itself is a social and communication medium and that the structure of this medium filters out low-effort participation. In addition to this, public access Unix systems tend to have user bases with a common sense of purpose (Unix and programming), so users can expect to find others with shared interests.

      In contrast to modern social media sites like Facebook or Twitter, you have to put in some effort to use Unix. You have to learn to connect, typically over ssh; you have to learn to navigate a command line shell; and you have to learn the commands and options to run various utilities. And to really use Unix, you have to learn a bit of programming. It's not incredibly hard in the end, but it takes significantly more effort than registering for a Facebook or Twitter account and permitting them to scan your email address book. Once you get over the learning curve, it is powerful and fun.

      This effortful medium does two things. For one, it weeds out people who aren't willing to put in effort. And for two, it provides learned users with a diverse palette of tools and utilities for building and sharing creative output.

      Public access Unix systems are all about active creation of content to be enjoyed and shared with others, and not about passive media consumption. They are about the community that develops around this purpose and not around the profit that can be squeezed out of users' attention.

      Future of public access Unix systems

      Public access Unix systems have been around for nearly four decades now. They have seen ups and downs in popularity, and they have been humming along in the background as computing has gone from the ARPANET to the spectacle of the commercial World Wide Web. Early public access Unix systems were largely about the novelty of socializing with other hobbyists through a computer, and that interest has evolved into the learning, doing and teaching model of an online hackerspace today.

      These systems are not huge, they are not coasting on advertising revenue, and they get by purely on the contributions, volunteer effort, and enthusiastic participation of their users. But as a contrast to commercial social network sites, they are an example of what online socializing can be when individuals put effort, thought, and compassion into their interactions with others. And just as importantly, they pass on the very skills that can independently maintain this social and communication medium for future generations of users.

      --

      As promised in the intro, if you're interested in reading a much more in-depth version of this article, here's the longer copy:
      https://cmccabe.sdf.org/files/pubax_unix_v01.pdf

      73 votes
    10. Programming Challenge: Markov Chain Text Generator

      Markov Chains are a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. By analyzing a...

      Markov Chains are a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. By analyzing a document in some way and producing a model it’s possible to use this model to generate sentences.

      For example, let’s consider this quote:

      Be who you are and say what you feel, because those who mind don't matter, and those who matter don't mind.

      Let’s start with a seed of be, which there is only one of in this text and it’s following word is who. Thus, a 100% chance of the next state being who. From who, there are several next states: you, mind, and matter. Since there are 3 options to choose from, the next state has a 1/3 probability of each. It’s important that if there were for example two instances of who you then you would have a 2/4 probability of next state. Generate a random number and choose the next state, perhaps mind and continue until reaching a full stop. The string of states we reached is then printed and we have a complete sentence (albeit almost certainly gibberish).

      Note: if we were in the state mind, our next two options would be . or don’t, in which if we hit . we would end the generation. (or not, up to you how you handle this!)

      To take it a step further, you could also consider choosing the number of words to consider a state. For example, two words instead of one: those who has two possible next states: who matter or who mind. By using much longer strings of words for our states we can get more natural text but will need much more volume to get unique sentences.

      This programming challenge is for you to create a Markov Chain and Text Generator in your language of choice. The input being a source document of anything you like (fun things include your favourite book, a famous person’s tweets, datasets of reddit / tildes comments), and possibly a seed. The output being a sentence generated using the Markov Chain.

      Bonus points for:

      • Try it a bunch of times on different sources and tell us the best generated sentences
      • Using longer strings of words for the state, or even having it be variable based on input
      • Not requiring a seed as an input, instead implementing that into your Markov Chain (careful as infinite loops can occur without considering the seed)
      • Implement saving the Markov Chain itself, as it can take very long to generate with huge documents
      • Particularly Fast, efficient, short or unique methods

      Good luck!

      P.S A great place to find many large plain text documents for you to play with is Project Gutenberg.

      17 votes