Skip to main content
 

Let's add some metadata on arXiv!

7 min read

This article contains ideas and explanations around this code. Many references to it will be done through this article.

Disclaimer: The above code is here as a proof of concept and to back this article with some code. It is clearly not designed (nor scalable) to run in production. However, the reference_fetcher part was giving good results on the arXiv papers I tested it on.

Nowadays, most of the published scientific papers are available online, either directly on the publisher's website, or as preprints on Open access repositories. For physics and computer science, a large part of them is available on the arXiv.org repository (a major, worldwide, Open access repository managed by Cornell), depending on the research topics. All published papers get a unique (global) identifier, called a DOI, which can be used to identify them and link to them. For instance, if you go to https://dx.doi.org/10.1103%2FPhysRevB.47.7312 you are automatically redirected to the Physical Review B website, on the page of the paper with DOI 10.1103/FPhysRevB.47.7312. This is really useful to target a paper, and identify it uniquely, in a machine-readable way and in a way that will last. However, very little use seems to be done of this system. This is why I had the idea to put some extra metadata on published papers, using such systems.

From now on, I will mainly focus on arXiv for two main reasons. First, it is Open access, so it is accessible everywhere (and not depending on the subscriptions of a particular institution) and reusable, and second, arXiv provides sources for most of the papers, which is of great interest as we will see below. arXiv gives a unique identifier to the preprints. Correspondence between DOIs and arXiv identifiers can be made quite easily as some publishers push back DOIs to arXiv upon publication, and authors manually update the fields on arXiv for the rest of the publishers.

Using services such as Crossref or the publisher's website, it is really easy to get a formatted bibliography (plaintext, BibTeX, …) from a given identifier (e.g. see some codes for DOI or arXiv id for BibTeX output). Then, writing a bibliography should be as easy as keeping track of a list of identifiers!

Let's make a graph of citations!

In scientific papers, references are usually a plaintext list of papers used as reference, at the end of the article. This list follows some rules and formats, but there exist a wide variety of different formats, and it is often really difficult to parse them automatically (see http://arxiv.org/abs/1506.06690 for an example of references format).

If you want to fetch automatically the references from a given paper (to download them in batch for instance), you would basically have to parse a PDF file, find the references section, and parse each textual item, which is really difficult and error-prone. Some repositories, such as arXiv, offer sources for the published preprints. In this case, one can deal with a LaTeX-formatted bibliography (a thebibliography environment, not a full BiBTeX though), which is a bit better, but still a pity to deal with. When referencing an article, nobody uses DOIs!

The first idea is then to try to automatically fetch references for arXiv preprints and mark them as relationships between articles.

Fortunately, arXiv provides bbl source files for most of the articles (which are LaTeX-formatted bibliography). We can then avoid having to parse a PDF file, and directly get some structured text, but bibliography is still in plaintext, without any machine-readable identifier. Here comes Crossref which offers a wonderful API to try to fetch a DOI from a plain text (see http://labs.crossref.org/resolving-citations-we-dont-need-no-stinkin-parser/). And it gives surprisingly good results!

This automatic fetching of DOI for references of a given arXiv papers is available in this code.

Then, one can simply write a simple API accepting POST requests to add papers to a database, fetch referenced papers, and mark relationships between them. This is how https://github.com/Phyks/arxiv_metadata began.

If you post a paper to it, identified either by its DOI (and a valid associated arXiv id is found) or directly by its arXiv id, it will add it to the database, resolve its references and mark relationships in database between this paper and the references papers. One can then simply query the graph of "citations", in direct or reverse order, to get any papers cited by a given one, or citing a given one.

The only similar service I know of on the web is the one provided by SAO/NASA ADS. See for instance how it deals with the introductory paper. It is quite fantastic for giving both the papers citing this one and cited by this one, in a browsable form, but its core is not open-source (or I did not find it), and I have no idea how it works in the background. There is no easily accessible API, and it works only in some very specific fields (typically Physics).

Let's add even more relations!

Now that we have a base API to add papers and relationships between them to a database, we can imagine going one step further and mark any kind of relations between the papers.

For instance, one can find that a given paper could be another reference for another one, which was not citing it. We could then collaboratively work to put extra metadata on scientific papers, such as extra references, which would be useful to everyone.

Such relationships could also be similar to, introductory_course, etc. This is quite limitless and the above code can already handle it. :)

Let's go one step further and add tags!

So, by now, we can have uniquely identified papers, with any kind of relationships between them, which we can crowdsource. Let's take some time to look at how arXiv stores papers.

They classify them by "general categories" (e.g. cond-mat which is a (very) large category called "Condensed Matter") and subcategories (e.g. cond-mat.quant-gas for "Quantum gases" under "Condensed Matter"). A RSS feed is offered for all these categories, and researchers usually follow the subcategory of their research area to keep up to date with published articles.

Although some article are released under multiple categories, most of them only have one category, very often because they do not fit anywhere else, but sometimes because the author did not think it could be relevant in another field. Plus some researchers work at the edge of two fields, and following everything published in these two fields is a very time-consuming task.

Next step is then to collaboratively tag articles. We could get tags as targeted as we want, or as general as we want, and everyone could follow the tags they want. Plus doing it collaboratively allows someone who finds an article interesting for their field, which was not the author's field, to make it appear in the feed of his colleagues.

Conclusion

We finally have the tools to mark relations between papers, to annotate them, complete them, and tag them. And all of this collaboratively. With DOIs and similar unique identifiers, we have the ability to get rid of the painful plaintext citations and references and use easily machine-manageable identifiers, while still getting some nicely rendered BibTeX citations automagically.

People are already doing this kind of things for webpages (identified by their URL) with Reddit or HackerNews and so on, let's do the same for scientific papers! :)

A demo instance should be available at http://arxiv.phyks.me/. This may not be very stable or highly available though. Note that Content-Type is the one of a JSON API and your browser may force you to download the response rather than displaying it. Easier way to browse it is to use cURL, according to the README.

,

 

Velib dataset

1 min read

Just a quick note to say that I am running a script to periodically dump the data available from the Velib API (every 2 minutes).

The dump can be found here (sqlite3 database). It is generated by this script.

Please host your own if you plan on making many queries against the previous URL.

 

Doing low cost telepresence (for under $200)

8 min read

With a friend, we recently started a project of building a project of low cost telepresence robot (sorry, link in French only) at our local hackerspace.

The goal is to build a robot that could be used to move around a room remotely, and stream audio and video in both directions. Our target budget is $200. We got a first working version (although it does not yet stream audio), and it is time for some explanations on the setup and how to build your own =) All the instructions, code and necessary stuff can be found at our git repo.

Screen capture


3D model

Basic idea

When taking part in a group meeting remotely, using some videoconference solution, it is often frustrating not being able to move around the room on the other side. This prevents us from having parallel discussions, and if the remote microphone is poor quality, we often do not hear clearly everybody speaking. Plus, someone speaking may be hidden by another speaker and many other such problems happen.

The goal was then to find a solution to do videoconferences (streaming both audio and video in both directions) and be able to move on the other side, to be able to see everyone and to come closer to the current speaker. Commercial solutions exist but they are really expensive (a few thousands dollars). We wanted to have the same basic features for $200, and it seems we almost achieved it!

Bill of Materials

The whole system is built around a Raspberry Pi and a PiCamera, which offer decent performances at a very fair price. The rest is really basic DIY stuff.

Here is the complete bill of materials:

Total: $140

Notes:

  • We had to use a Raspberry Pi model 2 for the nice performance boost on this model. Even more important is the increased number of GPIOs on this model, with 2 usable hardware PWMs (provided that you don't use the integrated sound card output). This is useful to control the two wheels with hardware PWM and have a precise control of the move. The camera holder can be safely controlled with a software PWM and we did not experience any troubles doing so.
  • You can easily replace those parts by equivalent ones as long as you keep in mind that the battery pack should be able to provide enough current for the raspberry pi and the servos. We used standard USB battery packs for simplicity and user friendliness. However, they are more expensive than standard modelling lithium batteries and provide less current in general.
  • We had to use two battery packs. Indeed, the peak current due to the servos starting was too excessive for the battery pack and it was crashing the raspberry pi. Using two separate alimentation lines for the raspberry pi and the servos, we no longer have this problem and this solution is easier than tweaking the alimentation line until the raspberry pi stops freezing (which it may never do).

For the next version, we plan to add:

Total with these parts: $228

Notes:

  • We used an HDMI screen as the official RaspberryPi screen uses most of the GPIOs pins, which we need. We decided to use bluetooth speakers as the integrated sound card was not usable as we were using the two hardware PWM lines for motion. This way, we have a speaker with a built-in microphone, which smaller than having the two of them separately.
  • The USB bluetooth adapter is impressively expensive, but it is the only one we found at the moment which we were sure would be compatible with Linux without any problems. Plus others adapters we found were not much cheaper.
  • The total budget is $223 without shipping. It is a bit over the initial budget goal, but we can easily lower it to $200. Indeed, we did not especially look for the cheaper parts. In particular, we bought the servos from Adafruit and I think we can find some servos for less (especially the camera holder servo, which can be a micro servo at $5 and should be enough). The bluetooth adapter is quite expensive as well and we could find a cheaper one I think. Budget shrinkage will be our next goal, once we have everything working.

Building the robot

All the necessary stuff is in our git repo (or its github mirror, both should be kept in sync). The repo contains three main directories: - blueprints which are the models of the robot. - disty which is the main server code on the Raspberry Pi. - webview which is the web controller served by the Raspberry Pi.

First of all, you should cut the parts and print the 3D parts in the blueprints dir. eps files in this directory are ready to cut files whereas svg files should be the same ones in easily editable format. You should laser cut the top and bottom files. picam_case_* files are the camera case we used,

You should 3D print:

  • the picam_case_* files for the camera case (licensed under CC BY SA).
  • teleprez.blend is the complete CAO model of the robot in Blender.
  • camera_servo_holder.stl is the plastic part to hold the camera servo. You need to print it once. wheel_servo_holder.stl is the plastic part to hold the servos for the wheels. You need four of them.

Assembling your Disty robot should be straightforward and easy to do if you look at the following pictures :) Use two ball transfer units to stabilize the robot and lock them with some rubber band (or anything better than that). Adjust tightly the height of the wheels so that the two wheels and the ball transfer units touch the ground.

Disty

Disty

Disty

GPIO pinout for the connection can be found at https://raw.githubusercontent.com/hackEns/Disty/master/blueprints/gpio.png.

GPIO pinout

For the electrical wiring, we used a standard USB-Micro USB cable to power the Raspberry Pi from one battery (located below the robot, to add weight on the ball transfer units and ensure contact is made with the surface). On the other battery, we just cut a USB - Micro USB cable to plug into it and connect the servos directly through a piece of breadboard to the battery. We had to use two batteries to prevent the draw from the servos to reboot the Raspberry Pi.

Here you are, you have a working Disty!

Running it

This may not be super user-friendly at the moment, we hope to improve this in the future.

Download any Linux image you want for your Raspberry Pi. Install uv4l and the uv4l-webrtc component. Enable the camera and ensure you can take pictures from the command line (there is a lot of doc) about this on the web.

Then, clone the Git repo somewhere on your Raspberry Pi. You should build the main disty code (which is the serverside code). This code will handle the control of the servos (emit PWMs etc) and listen on UDP port 4242 for instructions sent from the webview. Instructions to build it are located in the associated README. You will need cmake and a system-wide install of wiringpi to build the code.

You can then start the robot. Start by launching the disty program (as root as you need access to the GPIOs), ./disty, and then start the webview, ./run.py as root also as it serves the webview on port 80, which is below 1024 and owned by root. If you have ZeroConf on your Raspberry Pi (or a decent router), you can go to http://disty (or whatever hostname is set on your Raspberry Pi) to get the webview. Else, use the IP address instead. Webview usage should be almost straightforward.

It should work out of the box on your local LAN. If you are behind a NAT, it will need some black magic (which is implemented but may not be sufficient) to connect the remote user and Disty camera. In any case, you need to be able to access the webview (disty port 80) from the remote side.

Contributing!

All contributions and feedbacks are more than welcomed!

All the source code we wrote is under a beer-ware license, under otherwise specified.

* --------------------------------------------------------------------------------
* "THE BEER-WARE LICENSE" (Revision 42):
* Phyks and Élie wrote this file. As long as you retain this notice you
* can do whatever you want with this stuff (and you can also do whatever you want
* with this stuff without retaining it, but that's not cool...). If we meet some
* day, and you think this stuff is worth it, you can buy us a beer
* in return.
*                                                                       hackEns
* ---------------------------------------------------------------------------------

If you need a more legally valid license, you can consider Disty to be under an MIT license.

Some sources of inspiration and documentation

 

Working on the go in Paris

2 min read

This summer I was in NYC and found a couple of nice coffee shops with good coffee and internet access so that they were suitable for work. Coming back to Paris, I started looking for similar places. This post will be updated in the future with the new places I find.

  • Coutume Instituutti is a really cool coffee shop in the 5th, near Saint-Michel. Good coffee, the place is very cool and I always found a decent place to sit and work. Note that there are some power plugs, but they are far from the seats and tables.
  • Anticafé is a brand of really cool coffee shops in Paris (near le Louvres, near Beaubourg and near Bibliothèque François Mitterrand). Fun fact is that you pay for the time you stay (5€ per hour, and decreasing) and everything is free afterwards (food and drink). Really nice to work!
    • The one near le Louvres is really nice, although it can be crowded.
    • The one near Bibliothèque François Mitterrand has plenty of seats and offers some food as well, which is cool. However, the internet is really bad there, since you have very basic and limited internet (Web + emails only). Most of the port are blocked, including TCP 22 for ssh, and it can be really difficult to work with web and emails only (in my case, at least) =(
    • I still should test the one near Beaubourg.
  • Sugarplum is a cool coffee in the 5th, near Mouffetard. Nice coffee and cake, and there are many tables. The wifi is ok, but not really fast, and I did not notice power plugs.

To be tested in the future: http://blog.we-paris.com/insolite-paris/hubsy-cafe-coworking-du-3e-arrondissement/, http://www.yelp.fr/biz/strada-caf%C3%A9-paris.

 

 

Nice places to work around

1 min read

In NYC

Just a note to myself, to remember some good places to work I found in NYC. Might be useful for others :)

In Berlin

  • Homemade (www.yelp.com/biz/homemade-berlin)
 

Controlling servomotors on a Raspberry Pi

5 min read

EDIT: This article might not be very beginner-friendly. If you think it could be worth more explanations, feel free to let me know.

For a project (documentation to be added soon) of low-cost telepresence robot, I had to handle three servomotors with a Raspberry Pi. I chose a Raspberry Pi board as it is very cheap and has a decent camera which can be easily used to stream video at a good resolution and framerate, without using too much CPU. Then, I wanted to control everything with the Raspberry Pi, including the servomotors. Here comes the troubles…

First of all, I had to control three servomotors: two continuous rotation servos for the wheels, and an extra standard servo for the camera orientation. First revisions of the RaspberryPi only have one hard PWM, shared with the audio circuit, which makes it really difficult to use for this purpose. But Raspberry Pi 2 has two hardware PWMs (one is shared with the audio circuit, preventing you from using the onboard audio output). That is enough for my use case, as I have two continuous rotation servos which have to be precise (and will be controlled via hard PWM) and an extra servo for the camera orientation which can be easily controlled in soft PWM.

My code can be found here. The important files are the header of Servo class, the actual Servo class and a file containing some extra constants.

Hard PWM

Next problem is to find a way to control them easily. On an Arduino, you just have to include the Servo library and write angles to your servos. On the Raspberry Pi, on the contrary, there are no such things. WiringPi is a nice C library to handle GPIOs (including PWM) but there are no default failsafe settings to use, and one has to play a lot with the different configuration commands.

Most of the servos expect pulses of variable width, sent on a 20ms basis. Each 20ms, it samples the wave to get the width of the pulse. The base PWM frequency on the Raspberry Pi is 19.2MHz, and you can set a clock prescaler (clock) and a number of samples (range). The base frequency of your PWM signal will then be given by the formula (as 50Hz = 20ms):

19.2MHz / (clock * range) = 50Hz

The stop position is for 1.5ms pulses in general, for a continuous rotation servo. Then, we also want that there exists a valid int such that we could write it to stop the servo. pwmWrite(pin, STOP_VALUE) should stop the servo, with STOP_VALUE being an int. This is important as servos are symmetric with respect to this position and a slight offset will make the calibration of the servo speeds really difficult.

Good results were obtained on my robot with a clock prescaler of 200 and a range of 1920. Then, the stop value was 141 and the full forward value was 155 (value to write to make the servo turn at full speed). This gives me 14 different speeds for the servo, which is more than enough in my case. If you want finer control, you should play with the clock prescaler and range values.

Soft PWM

Then comes the control of the camera orientation using Software PWM. I explored two possibilities: ServoBlaster which is not compatible out of the box with the Raspberry Pi 2 (but a solution exists) and the software PWM library from wiringPi. Both gives equivalent results, including jitter (a bit less jitter for ServoBlaster, but still too much for a stable camera orientation). Then, I used wiringPi which was easier to integrate in my code. I found a dirty hack to fix servo jitter, as explained below.

First, the base frequency of the software PWM is 100Hz, but we need 50Hz for the servo control. Then, we have to softPwmCreate with a range of 200. We can then write values to the softPwm, and it will most likely work, at least if there are not much processes running.

My problem was that once I started the camera capture, my software pwm got preempted a lot, and this was giving a lot of jitter. This was really funny because when I was using autocompletion in the shell for instance, my servo had some jitter that I could hear. I tried to play with nice without success and could not get rid of the jitter.

However, I finally found a really hacky way of getting rid of the jitter (may not be applicable in your case): if I stop sending PWM signal to the servo, it holds still. Plus, if I send a 0 value on the PWM, this is an out-of-range value for the servo and he just ignores it, holding still as well. Then, the solution was easy. I just had to send the soft PWM signal to the servo, wait a bit until he has moved (based on the average speed in the datasheet, I know how long I have to wait) and then send 0 on the software PWM. This way, the servo holds still, and there is no more jittering.

Hope this helps :)

 

Blocking JavaScript on a per API basis

3 min read

JavaScript is widely used on the Internet, and living without it can be a real pain. However, JavaScript use raises some security concerns, in particular because some JavaScript APIs can be really sensitive. Indeed, WebRTC calls can be used to detect your real IP even the private one, on your network, JavaScript can interact with the clipboard, … Clipboard API can be blocked in about:config in Firefox for instance, but that is not the case for every such API. Hence, I wanted to have a look at what can be done to prevent the use of such sensitive APIs.

Plus, in my opinion, NoScript is not really usable, as it blocks way too much things and you end up clicking on "Allow this website" on basically every website to be able to use them. Then, the security benefit from using NoScript gets very low.

I found a way (most a PoC than a real usable thing) to selectively block some APIs, through NoScript surrogate scripts and Object.defineProperty. For instance, to block the WebRTC PoC previously linked, one can add

noscript.surrogate.test.replacement = Object.defineProperty(window, 'mozRTCPeerConnection', {get: function() { console.log('rtc'); return function() {}; }, set: function() {}});
noscript.surrogate.test.sources = @^(?!chrome:)[0-9A-Za-z-]+

in his about:config.

The Object.defineProperty syntax is a bit weird in this case, as I have to return a function in both the getter and the setter. I did not find any specific reason for doing it, and this is the only way I found to block this property. If you have any idea, please let me know, as I do not like to not understand this problem, and do not like much having something that could potentially break at any time. For the full discussion about this NoScript surrogate script, please refer to the NoScript forum.

Next steps would be to implement it as a standalone plugin, in order not to use NoScript and have a more user-friendly user interface (list all the sensitive APIs, one-click to enable / disable them and reload the page…), or better implement it as a Privoxy external filter so that this protection will not be in the browser itself, but in a different layer, to increase security. I will work on it as soon as I find some free time, but feel free to do something and share in the mean time if you are interested in the topic!

 

Devops tools for workstations

5 min read

There is a growing interest in devops tools, such as Docker, Puppet / Ansible / Salt / Chef to set up continuous integration, have the same working environment during development, testing, staging and production, and to manage thousands of servers in the cloud. However, I recently thought that some of these technologies could have a real interest as well for a few machines, for personal workstations. These are just some scenarios and use cases that happened to me, I did not test all of them and they may be irrelevant, feel free to let me know :)

Scenario 1: I want a particular dev environment, without breaking everything

The first scenario is the most widely discussed around the web: I want my dev environment to be identical to production, but I do not want to break my system (python 2 vs 3, ruby stuff, …). For this purpose, one can use containers (e.g. Docker). I will not elaborate much on this one.

Note: I recently discovered that systemd had similar features through systemd-nspawn. In particular, see the articles on Systemd from Lennart Poettering to know more about this and systemd features.

Scenario 2: I use many different operating systems

Note: I did not tried this scenario.

If you use many different operating systems on the same machine (say Windows and Linux for instance), then why not considering using virtualization? And not some palliative stuff like Virtualbox, but real hypervisors such as KVM or Xen. These are widely used on servers to run multiple VMs, but why not using them at home, on your personal computer?

CPUs today have a lot of virtualization-specific technologies, even on your workstation, and they are powerful enough to handle it. You will benefit from many advantages, starting from easy maintenance of your system (as it is just a VM, it is easier to backup / restore, isolate etc) and things like hot-switching between operating systems.

One problem may be left: handling the graphic cards, which may not be easily shared by multiple VMs.

Scenario 3: You want an easy backup mechanism

One important point I previously discussed is the ability to recover from any problem, hardware fault for example. It is important on a server, as downtime is a real problem, especially if you have few servers, but it is also a major concern on workstation. Especially as your laptop may fail at any time: it can experience basic hardware failure, it may fall and break, it may be stolen…

Then, it is important to be able to recover a fully working system fast. One often talks about data backup, and this is indeed the most important as you cannot recover lost data unlike a lost configuration (set of packages, state of configuration files, …). But this is not all. Reinstalling a system is a time-consuming task, and it is not really interesting.

Devops have came up with tools to deploy a given configuration on many servers, across the cloud. Those are Ansible, Puppet, Chef, Salt and so on. So, why not using them to deploy your computer configuration? If correctly managed, installing your laptop could be as easy as: partitioning the drive, bootstrapping the system (install base packages and set up an SSH access) and running Ansible (which is the most basic and fitted to this particular need) to reinstall everything else. Almost everything would be done automatically, perfect!

However, this requires to maintain a list of installed packages and associated files for Ansible to use, which can be a bit heavy on the run. Then, it could be interesting to have some way to ”blueprint” your system, to generate configuration descriptions from your existing system (as it is easier to install stuff on your system, tweak it and blueprint it after, than it is to do it tweaking Ansible configuration description and running it each time).

To achieve easy blueprinting, another solution is to use etckeeper to store all your files under /etc (as these are the only supposed to be modified by you, as /usr is the domain of your distribution and should not be modified) in a Git repository and keep a track of every changes in them. Restoring from etckeeper and list of installed packages (obtained with pacman) is quite easy and can even be done without Ansible.

On this particular subject of blueprinting, I wrote some Python script for Arch Linux (basically just a wrapper around good pacman commands), available here. It may not be perfect, but will give you a basis to blueprint your system.

Another interesting lead for this scenario is Btrfs which has nice snapshot abilities which can even be used over the network. This is something I did not test directly but I am really interested in seeing what it can do…

 

Scenario 4: Sync multiple computers

One last scenario is the following: I have three computers on which I work on an almost daily basis (my laptop, my desktop and another computer). Syncing files between them is quite easy (or at least achievable), but syncing configurations between them is much more difficult. In particular as the whole configuration should not be synced, as there is some device-specific configuration (fstab, LVM configuration, SSH host keys and stuff like that). But this problem is exactly the same as syncing multiple servers in the cloud, and is handled very well by Ansible. Plus Ansible lets you define a task and replicate only some of them on some machines and so on. Then, it is quite easy to synchronize completely multple computers to have the same work environment on all of them.

,

 

Quick comparisons of solutions for 3D cross-platform (mobile) development

9 min read

I spent some time lately comparing available development toolkits for 3D games / apps on mobile platforms (mostly for hobby / indy apps). I do not want to have to adapt too much my base code depending on the target platform, and I was then looking at a toolkit to write most of the code once, and be able to build the app for various platforms on the market. My use case is: sufficient 3D engine (not necessarily a high end thing, just the basics to be able to write 3D apps decently. I consider Three.js as sufficient for my needs for instance). I am working on Linux, so the ability to dev on Linux would be a real plus. Of course, it should be as cheap as possible :) and finally, I am paying much attention at the EULA and licenses, as I do not want to force my users to send “anonymous” statistics and I do not want my app to need extra (and useless) permissions.

Note: I did not test these toolkits deeply, and I am just reporting here what I found while playing a bit with them and comparing the features, licenses and requirements. I only included toolkits that suit my needs and some missing toolkits may be missing just because they do not match my needs.

Note: I need to be cross-platform, which means I want to be able to target both Android, iOS and Firefox OS. Then, I need a WebGL export ability (for Firefox OS). Anyway, being able to have a WebGL export is a plus as this means you can build a webapp, which is interesting for my use cases.

 

Cordova

First toolkit I had a look at was Cordova. It allows you to write pure web apps (using standard HTML / CSS / JS and providing some extra APIs to extend the available APIs, for bluetooth for instance) and to package them into native apps distributed through the market. What it does is basically adding a wrapper around your web app to render it outside a browser, using the offered web abilities provided by webviews on iOs and Android. Writing a web app is really super easy, and then having a first working prototype using Cordova is super fast. Cordova runs on Linux without any problems. Cordova is completely free of charge.

It works pretty well for 2D graphics and basic applications (but it needs some extra permissions as it uses a web view, even if it is not communicating over internet). But when it comes to 3D graphics, using WebGL, you will be in troubles. Indeed, the webview in Android 4.x is using an old version of Chrome, even if you install the latest Chrome on your mobile. Then, you will not be able to use WebGL as it is simply not supported, unless you use some hack to actually use a more recent Chrome version, using Crosswalk for instance. On iOS, this is even worse as WebViews prior to iOS8 do not support WebGL (and as far as I know, there is no alternative which will be both stable, reliable and will pass through the reviewing process of the appstore). This mean that you will not be able to target iPhones prior to (and including) iPhone 4, and iPad 1, which in my opinion is a real problem.

Unreal Engine

Second option is to use the Unreal Engine 4. It is a complete 3D game engine, including many tools to build 3D apps that you can deploy on many platforms (both desktop, web and mobile). You can code in C++ with it, and script using many visual tools. It includes dedicated APIs for advanced features such as Virtual Reality (VR) and may sound overkill.

You can dev using Windows and Mac (officially supported), but not Linux (at least, not officially). However, it seems that the dev editor can be installed on Linux at the cost of a bit of hacking, and this should be even easier to install in the near future as there seems to be a developping Linux community.

Unreal Engine charges you with 5% royalties passed the first 3k$.

Here are some relevant EULA fragments:

12. Hardware and Usage Data

You acknowledge that, as a default setting, the Engine Code will collect and send to Epic anonymous hardware and usage data from end users of Products. This functionality is used by Epic to improve the Engine Code. You may modify the Engine Code under the License to turn off that functionality in your Product, or you may include in your Product the capability for your end users to turn off that functionality in the Product.

and

You agree to keep accurate books and records related to your development, manufacture, Distribution, and sale of Products and related revenue. Epic may conduct reasonable audits of those books and records. Audits will be conducted during business hours on reasonable prior notice to you. Epic will bear the costs of audits unless the results show a shortfall in payments in excess of 5% during the period audited, in which case you will be responsible for the cost of the audit.

The second one is a standard one, as you have to pay royalties depending on your revenues. But the first one is really concerning, as by default Unreal Engine will track your users and send anonymous statistics (thus requiring extra and unneeded permissions and raising privacy concerns). However, once you are aware of it, you can freely modify your app to prevent this, according to the EULA, so this is not a big deal in the end.

Unity

Latest solution I found (used by Monument Valley for instance) is the Unity game engine. The personal license is sufficient in most of the cases, and is free to use up to 100k$ gross revenue. The dev tools are available on Windows and Mac, but there is no Linux version (and no hacky way to get it in Linux).

Here are some relevant fragments from EULA as well:

(c) users will be required to complete a user survey to activate the Software. Unity Pro users who are not eligible to use Unity Personal may not develop and publish Licensee Content for the iOS and Android platforms without purchasing the applicable Unity Pro Add-On Product license. Unity may monitor your compliance with and enforce these restrictions and requirements including but not limited to monitoring the number of downloads of your Licensee Content and any available revenue estimate data.

This one is not really clear, and I am not really sure of what it really implies. However, according to http://unity3d.com/legal/eula and https://unity3d.com/legal/privacy-policy, it seems to imply that statistics are sent and that you cannot avoid it, even in the pro version.

We also include certain device data collection in the runtime of the Software which is incorporated into the applications you create with the software. You should be sure that your privacy policy explains to your players the variety of technical information that is collected and shared with third parties like Unity.

and

Q: I play a game built with Unity software, what should I know?

A: Unity has probably collected some or all of the following information about your device: Unique device identifier generated from the device MAC/IMEI/MEID (which we immediately convert into a different number using a one way hash); IP address; Device manufacturer and model; the operating system and version running on your system or device; browser type; language; the make of the CPU, and number of CPUs present; the graphics card type and vendor name; graphics card driver name and version (example: "nv4disp.dll 6.10.93.71"); which graphics API is in use (example: "OpenGL 2.1" or "Direct3D 9.0c"); amount of system and video RAM present; current screen resolution; version of the Unity Player; version of the Unity Editor used to create the content; a number describing whether the player is running on Mac, Windows or other platforms; and a checksum of all the data that gets sent to verify that it did transmit correctly; application or bundle identification ("app id") of the game installed. Some Unity developers use Unity’s analytics and ad services which collect additional information. See FAQs on Unity Analytics and Unity Ads below.

Q: That seems like a lot of data, why so much?

A: We try to limit the collection of this information from any one player or device; however, certain operating systems do not permit us to note that the info has already been collected. This means that the data may be sent to Unity each time you start the game. We use the information to make decisions about which platforms, operating systems and versions of them should be supported by our game development software. We aggregate this data and make it available to developers at stats.unity3d.com. This data helps us improve our Services and helps developers improve their apps.

8. Your choices about Unity’s collection and use of your information

You always have the option to refrain from using the Service or to discontinue using the Service if you do not want information collected about you.

They also explictly says in the FAQ that there is no opt-out, and the anonymous stats are indeed browsable at http://stats.unity3d.com/mobile/. In conclusion, contrary to Unreal Engine, I do not think you can easily prevent the engine from sending anonymous statistics, which is a pity in my opinion. Moreover, there are a number of threads talking about extra permissions required by Unity (such as network access to send the statistics) and there seems to be no way to not require those permissions and to still conform to the EULA: http://answers.unity3d.com/questions/663197/how-to-prevent-unity-from-adding-permissions-to-an.html and http://forum.unity3d.com/threads/extra-permission-needed-in-android.295556.

 

EDIT: For 2D graphics, this StackOverflow post might be interesting.

EDIT2: I also found EdgeLib, Emo but they do not support web export.

EDIT3: CryEngine does support also Linux, Windows, iOS and Android, but not the web, and it is very expensive (license at 10$ per month).

EDIT4: Gameplay3D is also an open-source toolkit that can be used for cross-platform dev, written in C++, but it does not support emscripten officially for JavaScript output (and they do not plan on supporting it).

, ,

 

“You are not selfhosting”…

8 min read

“You are not selfhosting”… that is more or less what Laurent Guerby (founder of Tetaneutral.net) told me last week after a presentation at the ENS. At first, it surprised me, because I am selfhosting. I have my own server to serve this blog, to handle my emails, to connect to IRC, to handle my XMPP account, to provide me with a fully functional network access and so on. Basically every part of my life on the Internet passes somehow through my server, so I am tempted to consider that I do selfhost.

Plus I wrote some notes (in French) about the checklist before selfhosting and the first step in selfhosting. What is the status of this document if indeed I am not selfhosting?

“You need two machines to selfhost, not one”

Actually, most people forget it when they talk about selfhosting, but to fully selfhost, you need 2 machines, and not only one. Indeed, if you use the cloud services from Google, Facebook, etc., your data are replicated across their datacenters. Your data are stored multiple times, in multiple places around the world and this limits the risk of data losses (both temporary, due to network issues, or permanent).

On the contrary, if you selfhost on a single server, your data is stored only at one place, on one machine. This means that any problem on this machine will kill your entire numerical life. Some people use two servers, but at the same place (a Synology and a Raspi at home for instance). In this case, you do have data replication and you can indeed recover from a material problem, but you will run into troubles if there is a physical problem, like a fire or a simple network failure.

Imagine you have MX backup on two machines on the same network access. If one of the server experiences problems, the other one will be able to handle email delivery, but if the problem is on the network access, MX backup is useless.

Same thing happen if you have a fire at home and all your devices (the one backed up plus the backup server) are at home. You will simply lose everything.

Then, if you care about the quality of service you benefit from, you should have two servers in distinct places. Otherwise, you will not have the same quality of service as the one you would have with the standard cloud, and selfhosting is not that much interesting and not worth the price. Note that you do not need to actually own two machines: you can also use a part of a friend's server, as discussed below.

To achieve this, I am backupping my server daily on a friend's server (encrypted backups, so no privacy concerns to have), and we set up some redundancy as MX backups and so. Then, I have effectively two machines, so that my downtime is reduced, and I can mitigate problems with my main server.

“You need to be able to move quickly”

Another problem, often neglected (starting from my own notes) is the ability to move quickly to another place, and to fully disappear (so that no one can recover anything from your server). This, of course, depends on your hosting provider, and if you have a one year contract with your hosting provider, it might not be that easy (or cheap at least) to move to another server, in the end.

The point of being able to move quickly is:

  • To be able to move if you have troubles with your hosting provider, so that you are fully free and selfhosted and not tied to your hosting provider.
  • To be able to setup a new server very quickly in case of problems (electrical failure, network problem, hard drive failure, …), to reduce downtime.

Reconfiguring a server is a time-consuming task, and is a real pain, so you will want to do it as rarely as possible, which increases the dependency on your hosting provider.

People handling thousands of servers in the cloud use tools like Puppet, Ansible, Chef or simple shell scripts to spawn new machines when needed and replicate their config across all their machines. Even if you have only one or two machines, you should consider using them as they are rather easy to use and not reserved to those. It will make configuration easier, it will allow you to version your configuration and to easily spawn a new server very quickly. Combined with data backup, you can virtually spawn a working new server in a few minutes, anywhere you want, anytime you want.

Another point to consider is also the ability to disappear quickly and prevent anyone from recovering your data, especially if you own a server provided by your hosting provider. These are not necessarily wiped from one user to the next, and the next one could potentially recover sensitive data from the hard drive (like emails). A simple solution (at the cost of a bit of computational power) to this is to encrypt your disk, and overwrite the sector storing the keys before returning the server. This way, your hard drive will just look like garbage to the next owner, and he will not be able to recover anything from it.

Moreover, here are some other thoughts which came to me while I was thinking about the previous points.

Mutualization is not centralization

Most of the time, when we speak about selfhosting, we speak about self which means all the debate is focused on the user. We often oppose selfhosting (decentralization) to centralization, and focus only on the setup of one server for one user. This is clear in projects like Yunohost which allows any user to setup easily their own personal server on an old machine or a Raspberry Pi, but will not offer anything else.

However, mutualization is not centralization, and selfhosted users could benefit a lot from mutualization. Distributions like Yunohost should build a “network” of users, and offer them mutualization capabilities. For instance, one should be able to easily let 10% of their hard drive to be used for other users' backups, and benefit from the ability to replicate their backups on others servers. This way, sharing a bit of storage, one can have a distributed backup storage, resilient to network failures and server issues. This is especially easy as we have technologies which allow us to do distributed and encrypted file storage across multiple servers, such as Tahoe-LAFS. But such solutions are not promoted at all by distributions like Yunohost…

Indeed, maintaining two servers is quite a pain, and might cost a lot, especially if one of them is just here for backups. On the contrary, you might know someone having a server and who is ready to exchange a bit of storage, bandwidth and CPU power to mirror your services and handle backups. This network of trusted users (or not necessarily trusted, if you use encryption and distributed file storage) is a key element of selfhosting, in my opinion, and should be promoted and made easy.

Trust in the server

Last but not least, as soon as you rent a server to someone else, you need to trust them to give you the server they advertised, and to not abuse his power. You might expect to get the right material configuration, but what about stuff like judiciary requests (or not so official requests)? Will he simply give the information, the access to your server in the datacenter etc easily, or will he warn you before? These are things that should be considered, with more or less care depending on why you want your server and the way you use it. But such concerns should not be ignored and should be taken into account (although it will most likely be better than with a cloud provider, in any case).

Possible mitigations for this point are to fully self-host your server (which means buying a server or recycling an old PC, host it at home, if your network connection is powerful enough, or renting a rack in a datacenter), but this can cost much more. Another solution is to look at associative hosting providers, such as Tetaneutral.net which offers to host your server, no matter what its format is. As it is your machine, you know the material, and you can add the security level you want (such as killswitches and so on).