Top linked arXiv articles

View breakdown by




1.
Dissolving the Fermi Paradox (arxiv.org) Abstract ↓
The Fermi paradox is the conflict between an expectation of a high {\em ex ante} probability of intelligent life elsewhere in the universe and the apparently lifeless universe we in fact observe. The expectation that the universe should be teeming with intelligent life is linked to models like the Drake equation, which suggest that even if the probability of intelligent life developing at a given site is small, the sheer multitude of possible sites should nonetheless yield a large number of potentially observable civilizations. We show that this conflict arises from the use of Drake-like equations, which implicitly assume certainty regarding highly uncertain parameters. We examine these parameters, incorporating models of chemical and genetic transitions on paths to the origin of life, and show that extant scientific knowledge corresponds to uncertainties that span multiple orders of magnitude. This makes a stark difference. When the model is recast to represent realistic distributions of uncertainty, we find a substantial {\em ex ante} probability of there being no other intelligent life in our observable universe, and thus that there should be little surprise when we fail to detect any signs of it. This result dissolves the Fermi paradox, and in doing so removes any need to invoke speculative mechanisms by which civilizations would inevitably fail to have observable effects upon the universe.
2.
Evaluating Large Language Models Trained on Code (arxiv.org) Abstract ↓
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
3.
Introducing Physical Warp Drives (arxiv.org) Abstract ↓
The Alcubierre warp drive is an exotic solution in general relativity. It allows for superluminal travel at the cost of enormous amounts of matter with negative mass density. For this reason, the Alcubierre warp drive has been widely considered unphysical. In this study, we develop a model of a general warp drive spacetime in classical relativity that encloses all existing warp drive definitions and allows for new metrics without the most serious issues present in the Alcubierre solution. We present the first general model for subluminal positive-energy, spherically symmetric warp drives; construct superluminal warp-drive solutions which satisfy quantum inequalities; provide optimizations for the Alcubierre metric that decrease the negative energy requirements by two orders of magnitude; and introduce a warp drive spacetime in which space capacity and the rate of time can be chosen in a controlled manner. Conceptually, we demonstrate that any warp drive, including the Alcubierre drive, is a shell of regular or exotic material moving inertially with a certain velocity. Therefore, any warp drive requires propulsion. We show that a class of subluminal, spherically symmetric warp drive spacetimes, at least in principle, can be constructed based on the physical principles known to humanity today.
4.
Language Models are Few-Shot Learners (arxiv.org) Abstract ↓
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
5.
Reading Race: AI Recognises Patient's Racial Identity In Medical Images (arxiv.org) Abstract ↓
Background: In medical imaging, prior studies have demonstrated disparate AI performance by race, yet there is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images. Methods: Using private and public datasets we evaluate: A) performance quantification of deep learning models to detect race from medical images, including the ability of these models to generalize to external environments and across multiple imaging modalities, B) assessment of possible confounding anatomic and phenotype population features, such as disease distribution and body habitus as predictors of race, and C) investigation into the underlying mechanism by which AI models can recognize race. Findings: Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities. Our findings hold under external validation conditions, as well as when models are optimized to perform clinically motivated tasks. We demonstrate this detection is not due to trivial proxies or imaging-related surrogate covariates for race, such as underlying disease distribution. Finally, we show that performance persists over all anatomical regions and frequency spectrum of the images suggesting that mitigation efforts will be challenging and demand further study. Interpretation: We emphasize that model ability to predict self-reported race is itself not the issue of importance. However, our findings that AI can trivially predict self-reported race -- even from corrupted, cropped, and noised medical images -- in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to.
6.
Discovery of ASKAP J173608.2-321635 as a Highly-Polarized Transient Point Source with the Australian SKA Pathfinder (arxiv.org) Abstract ↓
We report the discovery of a highly-polarized, highly-variable, steep-spectrum radio source, ASKAP J173608.2-321635, located $\sim$4\,deg from the Galactic center in the Galactic plane. The source was detected six times between 2020 January and 2020 September as part of the Australian Square Kilometre Array Pathfinder Variables and Slow Transients (ASKAP VAST) survey at 888\,MHz. It exhibited a high degree ($\sim 25$\%) of circular polarization when it was visible. We monitored the source with the MeerKAT telescope from 2020 November to 2021 February on a 2--4 week cadence. The source was not detected with MeerKAT before 2021 February 07 when it appeared and reached a peak flux density of 5.6\,mJy. The source was still highly circularly polarized, but also showed up to 80\% linear polarization, and then faded rapidly with a timescale of one day. The rotation measure of the source varied significantly, from $-11.8\pm0.8$\,rad\,m$^{-2}$ to $-64.0\pm1.5$\,rad\,m$^{-2}$, over three days. No X-ray counterpart was found in follow-up \textit{Swift} or \textit{Chandra} observations about a week after the first MeerKAT detection, with upper limits of $\sim 5.0\times10^{31}$\,erg\,s$^{-1}$ (0.3--8\,keV, assuming a distance $\sim10$ kpc). No counterpart is seen in new or archival near-infrared observations down to $J=20.8$\,mag. We discuss possible identifications for ASKAP J173608.2-321635 including a low-mass star/substellar object with extremely low infrared luminosity, a pulsar with scatter-broadened pulses, a transient magnetar, or a Galactic Center Radio Transient: none of these fully explains the observations, which suggests that ASKAP J173608.2-321635 may represent part of a new class of objects being discovered through radio imaging surveys.
7.
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems (arxiv.org) Abstract ↓
Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility.
8.
Seven Sketches in Compositionality: An Invitation to Applied Category Theory (arxiv.org) Abstract ↓
This book is an invitation to discover advanced topics in category theory through concrete, real-world examples. It aims to give a tour: a gentle, quick introduction to guide later exploration. The tour takes place over seven sketches, each pairing an evocative application, such as databases, electric circuits, or dynamical systems, with the exploration of a categorical structure, such as adjoint functors, enriched categories, or toposes. No prior knowledge of category theory is assumed. A feedback form for typos, comments, questions, and suggestions is available here: https://docs.google.com/document/d/160G9OFcP5DWT8Stn7TxdVx83DJnnf7d5GML0_FOD5Wg/edit
9.
Decision Transformer: Reinforcement Learning via Sequence Modeling (arxiv.org) Abstract ↓
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
10.
Image Super-Resolution via Iterative Refinement (arxiv.org) Abstract ↓
We present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models to conditional image generation and performs super-resolution through a stochastic denoising process. Inference starts with pure Gaussian noise and iteratively refines the noisy output using a U-Net model trained on denoising at various noise levels. SR3 exhibits strong performance on super-resolution tasks at different magnification factors, on faces and natural images. We conduct human evaluation on a standard 8X face super-resolution task on CelebA-HQ, comparing with SOTA GAN methods. SR3 achieves a fool rate close to 50%, suggesting photo-realistic outputs, while GANs do not exceed a fool rate of 34%. We further show the effectiveness of SR3 in cascaded image generation, where generative models are chained with super-resolution models, yielding a competitive FID score of 11.3 on ImageNet.
11.
Terraforming the dwarf planet: Interconnected and growable Ceres megasatellite world (arxiv.org) Abstract ↓
We analyse a megasatellite settlement built from Ceres materials in high Ceres orbit. Ceres is selected because it has nitrogen, which is necessary for an earthlike atmosphere. To have $1 g$ artificial gravity, spinning habitats are attached to a disk-shaped megasatellite frame by passively safe magnetic bearings. The habitats are illuminated by concentrated sunlight produced by planar and parabolic mirrors. The motivation is to have a settlement with artificial gravity that allows growth beyond Earth's living area, while also providing easy intra-settlement travel for the inhabitants and reasonably low population density of 500 /km$^2$. To enable gardens and trees, a 1.5 m thick soil is used. The soil is upgradable to 4 m if more energy is expended in the manufacturing phase. The mass per person is $10^7$ kg, most of which is lightly processed radiation shield and soil. The goal is a long-term sustainable world where all atoms circulate. Because intra-settlement travel can be propellantless, achieving this goal is possible at least in principle. Lifting the materials from Ceres is energetically cheap compared to processing them into habitats, if a space elevator is used. Because Ceres has low gravity and rotates relatively fast, the space elevator is feasible.
12.
Markets are efficient if and only if P = NP (arxiv.org) Abstract ↓
I prove that if markets are weak-form efficient, meaning current prices fully reflect all information available in past prices, then P = NP, meaning every computational problem whose solution can be verified in polynomial time can also be solved in polynomial time. I also prove the converse by showing how we can "program" the market to solve NP-complete problems. Since P probably does not equal NP, markets are probably not efficient. Specifically, markets become increasingly inefficient as the time series lengthens or becomes more frequent. An illustration by way of partitioning the excess returns to momentum strategies based on data availability confirms this prediction.
13.
A problem with the analysis of type Ia supernovae (arxiv.org) Abstract ↓
Type Ia supernovae have light curves that have widths and magnitudes that can be used for testing cosmologies and they provide one of the few direct measurements of time dilation. It is shown that the standard analysis that calibrates the light curve against a rest-frame average (such as SALT2) removes all the cosmological information from the calibrated light curves. Consequently type Ia supernovae calibrated with these methods cannot be used to investigate cosmology. The major evidence that supports the hypothesis of a static universe is that the measurements of the widths of the raw light curves of type Ia supernovae do not show any time dilation. The intrinsic wavelength dependence shown by the SALT2 calibration templates is also consistent with no time dilation. Using a static cosmological model the peak absolute magnitudes of raw type Ia supernovae observations are also independent of redshift. These results support the hypothesis of a static universe.
14.
A Modern Compiler for the French Tax Code (arxiv.org) Abstract ↓
In France, income tax is computed from taxpayers' individual returns, using an algorithm that is authored, designed and maintained by the French Public Finances Directorate (DGFiP). This algorithm relies on a legacy custom language and compiler originally designed in 1990, which unlike French wine, did not age well with time. Owing to the shortcomings of the input language and the technical limitations of the compiler, the algorithm is proving harder and harder to maintain, relying on ad-hoc behaviors and workarounds to implement the most recent changes in tax law. Competence loss and aging code also mean that the system does not benefit from any modern compiler techniques that would increase confidence in the implementation. We overhaul this infrastructure and present Mlang, an open-source compiler toolchain whose goal is to replace the existing infrastructure. Mlang is based on a reverse-engineered formalization of the DGFiP's system, and has been thoroughly validated against the private DGFiP test suite. As such, Mlang has a formal semantics; eliminates previous handwritten workarounds in C; compiles to modern languages (Python); and enables a variety of instrumentations, providing deep insights about the essence of French income tax computation. The DGFiP is now officially transitioning to Mlang for their production system.
15.
ARC: A compact, high-field, fusion nuclear science facility and demonstration power plant with demountable magnets (arxiv.org) Abstract ↓
The affordable, robust, compact (ARC) reactor conceptual design study aims to reduce the size, cost, and complexity of a combined fusion nuclear science facility (FNSF) and demonstration fusion Pilot power plant. ARC is a 200-250 MWe tokamak reactor with a major radius of 3.3 m, a minor radius of 1.1 m, and an on-axis magnetic field of 9.2 T. ARC has rare earth barium copper oxide (REBCO) superconducting toroidal field coils, which have joints to enable disassembly. This allows the vacuum vessel to be replaced quickly, mitigating first wall survivability concerns, and permits a single device to test many vacuum vessel designs and divertor materials. The design point has a plasma fusion gain of Q_p~13.6, yet is fully non-inductive, with a modest bootstrap fraction of only ~63%. Thus ARC offers a high power gain with relatively large external control of the current profile. This highly attractive combination is enabled by the ~23 T peak field on coil with newly available REBCO superconductor technology. External current drive is provided by two innovative inboard RF launchers using 25 MW of lower hybrid and 13.6 MW of ion cyclotron fast wave power. The resulting efficient current drive provides a robust, steady state core plasma far from disruptive limits. ARC uses an all-liquid blanket, consisting of low pressure, slowly flowing fluorine lithium beryllium (FLiBe) molten salt. The liquid blanket is low-risk technology and provides effective neutron moderation and shielding, excellent heat removal, and a tritium breeding ratio >= 1.1. The large temperature range over which FLiBe is liquid permits blanket operation at 900 K with single phase fluid cooling and a high-efficiency Brayton cycle, allowing for net electricity generation when operating ARC as a Pilot power plant.
16.
Retrofitting Parallelism onto OCaml (arxiv.org) Abstract ↓
OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory parallel programming. This paper describes the design, a full-fledged implementation and evaluation of a mostly-concurrent garbage collector (GC) for the multicore extension of the OCaml programming language. Given that we propose to add parallelism to a widely used programming language with millions of lines of existing code, we face the challenge of maintaining backwards compatibility--not just in terms of the language features but also the performance of single-threaded code running with the new GC. To this end, the paper presents a series of novel techniques and demonstrates that the new GC strikes a balance between performance and feature backwards compatibility for sequential programs and scales admirably on modern multicore processors.
17.
Quantum Theory From Five Reasonable Axioms (arxiv.org) Abstract ↓
The usual formulation of quantum theory is based on rather obscure axioms (employing complex Hilbert spaces, Hermitean operators, and the trace rule for calculating probabilities). In this paper it is shown that quantum theory can be derived from five very reasonable axioms. The first four of these are obviously consistent with both quantum theory and classical probability theory. Axiom 5 (which requires that there exists continuous reversible transformations between pure states) rules out classical probability theory. If Axiom 5 (or even just the word "continuous" from Axiom 5) is dropped then we obtain classical probability theory instead. This work provides some insight into the reasons quantum theory is the way it is. For example, it explains the need for complex numbers and where the trace formula comes from. We also gain insight into the relationship between quantum theory and classical probability theory.
18.
Paxos vs Raft: Have we reached consensus on distributed consensus? (arxiv.org) Abstract ↓
Distributed consensus is a fundamental primitive for constructing fault-tolerant, strongly-consistent distributed systems. Though many distributed consensus algorithms have been proposed, just two dominate production systems: Paxos, the traditional, famously subtle, algorithm; and Raft, a more recent algorithm positioned as a more understandable alternative to Paxos. In this paper, we consider the question of which algorithm, Paxos or Raft, is the better solution to distributed consensus? We analyse both to determine exactly how they differ by describing a simplified Paxos algorithm using Raft's terminology and pragmatic abstractions. We find that both Paxos and Raft take a very similar approach to distributed consensus, differing only in their approach to leader election. Most notably, Raft only allows servers with up-to-date logs to become leaders, whereas Paxos allows any server to be leader provided it then updates its log to ensure it is up-to-date. Raft's approach is surprisingly efficient given its simplicity as, unlike Paxos, it does not require log entries to be exchanged during leader election. We surmise that much of the understandability of Raft comes from the paper's clear presentation rather than being fundamental to the underlying algorithm being presented.
19.
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs (arxiv.org) Abstract ↓
Backpropagation of error (backprop) is a powerful algorithm for training machine learning architectures through end-to-end differentiation. However, backprop is often criticised for lacking biological plausibility. Recently, it has been shown that backprop in multilayer-perceptrons (MLPs) can be approximated using predictive coding, a biologically-plausible process theory of cortical computation which relies only on local and Hebbian updates. The power of backprop, however, lies not in its instantiation in MLPs, but rather in the concept of automatic differentiation which allows for the optimisation of any differentiable program expressed as a computation graph. Here, we demonstrate that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules. We apply this result to develop a straightforward strategy to translate core machine learning architectures into their predictive coding equivalents. We construct predictive coding CNNs, RNNs, and the more complex LSTMs, which include a non-layer-like branching internal graph structure and multiplicative interactions. Our models perform equivalently to backprop on challenging machine learning benchmarks, while utilising only local and (mostly) Hebbian plasticity. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry, and may also contribute to the development of completely distributed neuromorphic architectures.
20.
The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (arxiv.org) Abstract ↓
We propose a new framework for reasoning about generalization in deep learning. The core idea is to couple the Real World, where optimizers take stochastic gradient steps on the empirical loss, to an Ideal World, where optimizers take steps on the population loss. This leads to an alternate decomposition of test error into: (1) the Ideal World test error plus (2) the gap between the two worlds. If the gap (2) is universally small, this reduces the problem of generalization in offline learning to the problem of optimization in online learning. We then give empirical evidence that this gap between worlds can be small in realistic deep learning settings, in particular supervised image classification. For example, CNNs generalize better than MLPs on image distributions in the Real World, but this is "because" they optimize faster on the population loss in the Ideal World. This suggests our framework is a useful tool for understanding generalization in deep learning, and lays a foundation for future research in the area.
21.
Dark Patterns after the GDPR: Scraping Consent Pop-ups and Demonstrating their Influence (arxiv.org) Abstract ↓
New consent management platforms (CMPs) have been introduced to the web to conform with the EU's General Data Protection Regulation, particularly its requirements for consent when companies collect and process users' personal data. This work analyses how the most prevalent CMP designs affect people's consent choices. We scraped the designs of the five most popular CMPs on the top 10,000 websites in the UK (n=680). We found that dark patterns and implied consent are ubiquitous; only 11.8% meet the minimal requirements that we set based on European law. Second, we conducted a field experiment with 40 participants to investigate how the eight most common designs affect consent choices. We found that notification style (banner or barrier) has no effect; removing the opt-out button from the first page increases consent by 22--23 percentage points; and providing more granular controls on the first page decreases consent by 8--20 percentage points. This study provides an empirical basis for the necessary regulatory action to enforce the GDPR, in particular the possibility of focusing on the centralised, third-party CMP services as an effective way to increase compliance.
22.
If Loud Aliens Explain Human Earliness, Quiet Aliens Are Also Rare (arxiv.org) Abstract ↓
If life on Earth had to achieve n 'hard steps' to reach humanity's level, then the chance of this event rose as time to the n-th power. Integrating this over habitable star formation and planet lifetime distributions predicts >99% of advanced life appears after today, unless n<3 and max planet duration <50Gyr. That is, we seem early. We offer this explanation: a deadline is set by 'loud' aliens who are born according to a hard steps power law, expand at a common rate, change their volumes' appearances, and prevent advanced life like us from appearing in their volumes. 'Quiet' aliens, in contrast, are much harder to see. We fit this three-parameter model of loud aliens to data: 1) birth power from the number of hard steps seen in Earth history, 2) birth constant by assuming a inform distribution over our rank among loud alien birth dates, and 3) expansion speed from our not seeing alien volumes in our sky. We estimate that loud alien civilizations now control 40-50% of universe volume, each will later control ~10^5 - 3x10^7 galaxies, and we could meet them in ~200Myr - 2Gyr. If loud aliens arise from quiet ones, a depressingly low transition chance (~10^-4) is required to expect that even one other quiet alien civilization has ever been active in our galaxy. Which seems bad news for SETI. But perhaps alien volume appearances are subtle, and their expansion speed lower, in which case we predict many long circular arcs to find in our sky.
23.
Automated Unit Test Generation for Python (arxiv.org) Abstract ↓
Automated unit test generation is an established research field, and mature test generation tools exist for statically typed programming languages such as Java. It is, however, substantially more difficult to automatically generate supportive tests for dynamically typed programming languages such as Python, due to the lack of type information and the dynamic nature of the language. In this paper, we describe a foray into the problem of unit test generation for dynamically typed languages. We introduce Pynguin, an automated unit test generation framework for Python. Using Pynguin, we aim to empirically shed light on two central questions: (1) Do well-established search-based test generation methods, previously evaluated only on statically typed languages, generalise to dynamically typed languages? (2) What is the influence of incomplete type information and dynamic typing on the problem of automated test generation? Our experiments confirm that evolutionary algorithms can outperform random test generation also in the context of Python, and can even alleviate the problem of absent type information to some degree. However, our results demonstrate that dynamic typing nevertheless poses a fundamental issue for test generation, suggesting future work on integrating type inference.
24.
The Loopix Anonymity System (arxiv.org) Abstract ↓
We present Loopix, a low-latency anonymous communication system that provides bi-directional 'third-party' sender and receiver anonymity and unobservability. Loopix leverages cover traffic and brief message delays to provide anonymity and achieve traffic analysis resistance, including against a global network adversary. Mixes and clients self-monitor the network via loops of traffic to provide protection against active attacks, and inject cover traffic to provide stronger anonymity and a measure of sender and receiver unobservability. Service providers mediate access in and out of a stratified network of Poisson mix nodes to facilitate accounting and off-line message reception, as well as to keep the number of links in the system low, and to concentrate cover traffic. We provide a theoretical analysis of the Poisson mixing strategy as well as an empirical evaluation of the anonymity provided by the protocol and a functional implementation that we analyze in terms of scalability by running it on AWS EC2. We show that a Loopix relay can handle upwards of 300 messages per second, at a small delay overhead of less than 1.5 ms on top of the delays introduced into messages to provide security. Overall message latency is in the order of seconds - which is low for a mix-system. Furthermore, many mix nodes can be securely added to a stratified topology to scale throughput without sacrificing anonymity.
25.
Featherweight Go (arxiv.org) Abstract ↓
We describe a design for generics in Go inspired by previous work on Featherweight Java by Igarashi, Pierce, and Wadler. Whereas subtyping in Java is nominal, in Go it is structural, and whereas generics in Java are defined via erasure, in Go we use monomorphisation. Although monomorphisation is widely used, we are one of the first to formalise it. Our design also supports a solution to The Expression Problem.
26.
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (arxiv.org) Abstract ↓
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
27.
Attention Is All You Need (arxiv.org) Abstract ↓
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
28.
Breaking the Warp Barrier: Hyper-Fast Solitons in Einstein-Maxwell-Plasma Theory (arxiv.org) Abstract ↓
Solitons in space--time capable of transporting time-like observers at superluminal speeds have long been tied to violations of the weak, strong, and dominant energy conditions of general relativity. The negative-energy sources required for these solitons must be created through energy-intensive uncertainty principle processes as no such classical source is known in particle physics. This paper overcomes this barrier by constructing a class of soliton solutions that are capable of superluminal motion and sourced by purely positive energy densities. The solitons are also shown to be capable of being sourced from the stress-energy of a conducting plasma and classical electromagnetic fields. This is the first example of hyper-fast solitons resulting from known and familiar sources, reopening the discussion of superluminal mechanisms rooted in conventional physics.
29.
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities (arxiv.org) Abstract ↓
Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms subverting their intentions, exposing unrecognized bugs in their code, producing unexpected adaptations, or exhibiting outcomes uncannily convergent with ones in nature. Such stories routinely reveal creativity by evolution in these digital worlds, but they rarely fit into the standard scientific narrative. Instead they are often treated as mere obstacles to be overcome, rather than results that warrant study in their own right. The stories themselves are traded among researchers through oral tradition, but that mode of information transmission is inefficient and prone to error and outright loss. Moreover, the fact that these stories tend to be shared only among practitioners means that many natural scientists do not realize how interesting and lifelike digital organisms are and how natural their evolution can be. To our knowledge, no collection of such anecdotes has been published before. This paper is the crowd-sourced product of researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases. It thus serves as a written, fact-checked collection of scientifically important and even entertaining stories. In doing so we also present here substantial evidence that the existence and importance of evolutionary surprises extends beyond the natural world, and may indeed be a universal property of all complex evolving systems.
30.
A counterexample to the unit conjecture for group rings (arxiv.org) Abstract ↓
The unit conjecture, commonly attributed to Kaplansky, predicts that if $K$ is a field and $G$ is a torsion-free group then the only units of the group ring $K[G]$ are the trivial units, that is, the non-zero scalar multiples of group elements. We give a concrete counterexample to this conjecture; the group is virtually abelian and the field is order two.
31.
What if Planet 9 is a Primordial Black Hole? (arxiv.org) Abstract ↓
We highlight that the anomalous orbits of Trans-Neptunian Objects (TNOs) and an excess in microlensing events in the 5-year OGLE dataset can be simultaneously explained by a new population of astrophysical bodies with mass several times that of Earth ($M_\oplus$). We take these objects to be primordial black holes (PBHs) and point out the orbits of TNOs would be altered if one of these PBHs was captured by the Solar System, inline with the Planet 9 hypothesis. Capture of a free floating planet is a leading explanation for the origin of Planet 9 and we show that the probability of capturing a PBH instead is comparable. The observational constraints on a PBH in the outer Solar System significantly differ from the case of a new ninth planet. This scenario could be confirmed through annihilation signals from the dark matter microhalo around the PBH.
32.
Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions (arxiv.org) Abstract ↓
There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer', GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE's "Top 25" list). We explore Copilot's performance on three distinct code generation axes -- examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable.
33.
Are Black Hole Starships Possible (arxiv.org) Abstract ↓
We investigate whether it is physically possible to build starships or power sources using the Hawking radiation of an artificial black hole as a power source. The proposal seems to be at the edge of possibility, but quantum gravity effects could change the picture.
34.
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning (arxiv.org) Abstract ↓
Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A ``wake-sleep'' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multi-layered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience.
35.
Demonstration of Communication using Neutrinos (arxiv.org) Abstract ↓
Beams of neutrinos have been proposed as a vehicle for communications under unusual circumstances, such as direct point-to-point global communication, communication with submarines, secure communications and interstellar communication. We report on the performance of a low-rate communications link established using the NuMI beam line and the MINERvA detector at Fermilab. The link achieved a decoded data rate of 0.1 bits/sec with a bit error rate of 1% over a distance of 1.035 km, including 240 m of earth.
36.
Why AI is Harder Than We Think (arxiv.org) Abstract ↓
Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between periods of optimistic predictions and massive investment ("AI spring") and periods of disappointment, loss of confidence, and reduced funding ("AI winter"). Even with today's seemingly fast pace of AI breakthroughs, the development of long-promised technologies such as self-driving cars, housekeeping robots, and conversational companions has turned out to be much harder than many people expected. One reason for these repeating cycles is our limited understanding of the nature and complexity of intelligence itself. In this paper I describe four fallacies in common assumptions made by AI researchers, which can lead to overconfident predictions about the field. I conclude by discussing the open questions spurred by these fallacies, including the age-old challenge of imbuing machines with humanlike common sense.
37.
On the Measure of Intelligence (arxiv.org) Abstract ↓
To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
38.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (arxiv.org) Abstract ↓
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(\theta, \phi)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
39.
Locally Testable Codes with constant rate, distance, and locality (arxiv.org) Abstract ↓
A locally testable code (LTC) is an error-correcting code that has a property-tester. The tester reads $q$ bits that are randomly chosen, and rejects words with probability proportional to their distance from the code. The parameter $q$ is called the locality of the tester. LTCs were initially studied as important components of PCPs, and since then the topic has evolved on its own. High rate LTCs could be useful in practice: before attempting to decode a received word, one can save time by first quickly testing if it is close to the code. An outstanding open question has been whether there exist "$c^3$-LTCs", namely LTCs with *c*onstant rate, *c*onstant distance, and *c*onstant locality. In this work we construct such codes based on a new two-dimensional complex which we call a left-right Cayley complex. This is essentially a graph which, in addition to vertices and edges, also has squares. Our codes can be viewed as a two-dimensional version of (the one-dimensional) expander codes, where the codewords are functions on the squares rather than on the edges.
40.
Who Can Find My Devices? Security and Privacy of Apple's Crowd-Sourced Bluetooth Location Tracking System (arxiv.org) Abstract ↓
Overnight, Apple has turned its hundreds-of-million-device ecosystem into the world's largest crowd-sourced location tracking network called offline finding (OF). OF leverages online finder devices to detect the presence of missing offline devices using Bluetooth and report an approximate location back to the owner via the Internet. While OF is not the first system of its kind, it is the first to commit to strong privacy goals. In particular, OF aims to ensure finder anonymity, untrackability of owner devices, and confidentiality of location reports. This paper presents the first comprehensive security and privacy analysis of OF. To this end, we recover the specifications of the closed-source OF protocols by means of reverse engineering. We experimentally show that unauthorized access to the location reports allows for accurate device tracking and retrieving a user's top locations with an error in the order of 10 meters in urban areas. While we find that OF's design achieves its privacy goals, we discover two distinct design and implementation flaws that can lead to a location correlation attack and unauthorized access to the location history of the past seven days, which could deanonymize users. Apple has partially addressed the issues following our responsible disclosure. Finally, we make our research artifacts publicly available.
41.
Waveguide-coupled Rydberg spectrum analyzer from 0 to 20 GHz (arxiv.org) Abstract ↓
We demonstrate an atomic radio-frequency (RF) receiver and spectrum analyzer based on thermal Rydberg atoms coupled to a planar microwave waveguide. We use an off-resonant RF heterodyne technique to achieve continuous operation for carrier frequencies ranging from DC to 20 GHz. The system achieves an intrinsic sensitivity of up to -120(2) dBm/Hz, DC coupling, 4 MHz instantaneous bandwidth, and over 80 dB of linear dynamic range. By connecting through a low-noise preamplifier, we demonstrate high-performance spectrum analysis with peak sensitivity of better than -145 dBm/Hz. Attaching a standard rabbit-ears antenna, the spectrum analyzer detects weak ambient signals including FM radio, AM radio, Wi-Fi, and Bluetooth. We also demonstrate waveguide-readout of the thermal Rydberg ensemble by non-destructively probing waveguide-atom interactions. The system opens the door for small, room-temperature, ensemble-based Rydberg sensors that surpass the sensitivity, bandwidth, and precision limitations of standard RF sensors, receivers, and analyzers.
42.
Forensic Issues and Techniques to Improve Security in SSD with Flex Capacity Feature (arxiv.org) Abstract ↓
Over-provisioning technology is typically introduced as a means to improve the performance of storage systems, such as databases. The over-provisioning area is both hidden and difficult for normal users to access. This paper focuses on attack models for such hidden areas. Malicious hackers use advanced over-provisioning techniques that vary capacity according to workload, and as such, our focus is on attack models that use variable over-provisioning technology. According to these attack models, it is possible to scan for invalid blocks containing original data or malware code that is hidden in the over-provisioning area. In this paper, we outline the different forensic processes performed for each memory cell type of the over-provisioning area and disclose security enhancement techniques that increase immunity to these attack models. This leads to a discussion of forensic possibilities and countermeasures for SSDs that can change the over-provisioning area. We also present information-hiding attacks and information-exposing attacks on the invalidation area of the SSD. Our research provides a good foundation upon which the performance and security of SSD-based databases can be further improved.
43.
A Polynomial time Algorithm for Hamilton Cycle with maximum Degree 3, 3SAT (arxiv.org) Abstract ↓
Based on the famous Rotation-Extension technique, by creating the new concepts and methods: broad cycle, main segment, useful cut and insert, destroying edges for a main segment, main goal Hamilton cycle, depth-first search tree, we develop a polynomial time algorithm for a famous NPC: the Hamilton cycle problem. Thus we proved that NP=P. The key points of this paper are: 1) there are two ways to get a Hamilton cycle in exponential time: a full permutation of n vertices; or, chose n edges from all k edges, and check all possible combinations. The main problem is: how to avoid checking all combinations of n edges from all edges. My algorithm can avoid this. Lemma 1 and lemma 2 are very important. They are the foundation that we always can get a good branch in the depth-first search tree and can get a series of destroying edges (all are bad edges) for this good branch in polynomial time. The extraordinary insights are: destroying edges, a tree contains each main segment at most one time at the same time, and dynamic combinations. The difficult part is to understand how to construct a main segment's series of destroying edges by dynamic combinations. The proof logic is: if there is at least on Hamilton cycle in the graph, we always can do useful cut and inserts until a Hamilton cycle is got. The times of useful cut and inserts are polynomial. So if at any step we cannot have a useful cut and insert, this means that there are no Hamilton cycles in the graph. In this version, I add a detailed polynomial time algorithm and proof for 3SAT
44.
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead (arxiv.org) Abstract ↓
Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward -- it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision.
45.
The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion (arxiv.org) Abstract ↓
Online tracking is a whack-a-mole game between trackers who build and monetize behavioral user profiles through intrusive data collection, and anti-tracking mechanisms, deployed as a browser extension, built-in to the browser, or as a DNS resolver. As a response to pervasive and opaque online tracking, more and more users adopt anti-tracking tools to preserve their privacy. Consequently, as the information that trackers can gather on users is being curbed, some trackers are looking for ways to evade these tracking countermeasures. In this paper we report on a large-scale longitudinal evaluation of an anti-tracking evasion scheme that leverages CNAME records to include tracker resources in a same-site context, effectively bypassing anti-tracking measures that use fixed hostname-based block lists. Using historical HTTP Archive data we find that this tracking scheme is rapidly gaining traction, especially among high-traffic websites. Furthermore, we report on several privacy and security issues inherent to the technical setup of CNAME-based tracking that we detected through a combination of automated and manual analyses. We find that some trackers are using the technique against the Safari browser, which is known to include strict anti-tracking configurations. Our findings show that websites using CNAME trackers must take extra precautions to avoid leaking sensitive information to third parties.
46.
Spectre is here to stay: An analysis of side-channels and speculative execution (arxiv.org) Abstract ↓
The recent discovery of the Spectre and Meltdown attacks represents a watershed moment not just for the field of Computer Security, but also of Programming Languages. This paper explores speculative side-channel attacks and their implications for programming languages. These attacks leak information through micro-architectural side-channels which we show are not mere bugs, but in fact lie at the foundation of optimization. We identify three open problems, (1) finding side-channels, (2) understanding speculative vulnerabilities, and (3) mitigating them. For (1) we introduce a mathematical meta-model that clarifies the source of side-channels in simulations and CPUs. For (2) we introduce an architectural model with speculative semantics to study recently-discovered vulnerabilities. For (3) we explore and evaluate software mitigations and prove one correct for this model. Our analysis is informed by extensive offensive research and defensive implementation work for V8, the production JavaScript virtual machine in Chrome. Straightforward extensions to model real hardware suggest these vulnerabilities present formidable challenges for effective, efficient mitigation. As a result of our work, we now believe that speculative vulnerabilities on today's hardware defeat all language-enforced confidentiality with no known comprehensive software mitigations, as we have discovered that untrusted code can construct a universal read gadget to read all memory in the same address space through side-channels. In the face of this reality, we have shifted the security model of the Chrome web browser and V8 to process isolation.
47.
Deep Image Prior (arxiv.org) Abstract ↓
Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, super-resolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs. Apart from its diverse applications, our approach highlights the inductive bias captured by standard generator network architectures. It also bridges the gap between two very popular families of image restoration methods: learning-based methods using deep convolutional networks and learning-free methods based on handcrafted image priors such as self-similarity. Code and supplementary material are available at https://dmitryulyanov.github.io/deep_image_prior .
48.
FAQBism (arxiv.org) Abstract ↓
We answer several questions that have been Frequently Asked about QBism. These remarks (many of them lighthearted) should be considered supplements to more systematic treatments by the authors and others.
49.
Experimental loophole-free violation of a Bell inequality using entangled electron spins separated by 1.3 km (arxiv.org) Abstract ↓
For more than 80 years, the counterintuitive predictions of quantum theory have stimulated debate about the nature of reality. In his seminal work, John Bell proved that no theory of nature that obeys locality and realism can reproduce all the predictions of quantum theory. Bell showed that in any local realist theory the correlations between distant measurements satisfy an inequality and, moreover, that this inequality can be violated according to quantum theory. This provided a recipe for experimental tests of the fundamental principles underlying the laws of nature. In the past decades, numerous ingenious Bell inequality tests have been reported. However, because of experimental limitations, all experiments to date required additional assumptions to obtain a contradiction with local realism, resulting in loopholes. Here we report on a Bell experiment that is free of any such additional assumption and thus directly tests the principles underlying Bell's inequality. We employ an event-ready scheme that enables the generation of high-fidelity entanglement between distant electron spins. Efficient spin readout avoids the fair sampling assumption (detection loophole), while the use of fast random basis selection and readout combined with a spatial separation of 1.3 km ensure the required locality conditions. We perform 245 trials testing the CHSH-Bell inequality $S \leq 2$ and find $S = 2.42 \pm 0.20$. A null hypothesis test yields a probability of $p = 0.039$ that a local-realist model for space-like separated sites produces data with a violation at least as large as observed, even when allowing for memory in the devices. This result rules out large classes of local realist theories, and paves the way for implementing device-independent quantum-secure communication and randomness certification.
50.
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (arxiv.org) Abstract ↓
In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageous numbers of parameters -- but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs and training instability -- we address these with the Switch Transformer. We simplify the MoE routing algorithm and design intuitive improved models with reduced communication and computational costs. Our proposed training techniques help wrangle the instabilities and we show large sparse models may be trained, for the first time, with lower precision (bfloat16) formats. We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources. These improvements extend into multilingual settings where we measure gains over the mT5-Base version across all 101 languages. Finally, we advance the current scale of language models by pre-training up to trillion parameter models on the "Colossal Clean Crawled Corpus" and achieve a 4x speedup over the T5-XXL model.
51.
Gone in Six Characters: Short URLs Considered Harmful for Cloud Services (arxiv.org) Abstract ↓
Modern cloud services are designed to encourage and support collaboration. To help users share links to online documents, maps, etc., several services, including cloud storage providers such as Microsoft OneDrive and mapping services such as Google Maps, directly integrate URL shorteners that convert long, unwieldy URLs into short URLs, consisting of a domain such as 1drv.ms or goo.gl and a short token. In this paper, we demonstrate that the space of 5- and 6-character tokens included in short URLs is so small that it can be scanned using brute-force search. Therefore, all online resources that were intended to be shared with a few trusted friends or collaborators are effectively public and can be accessed by anyone. This leads to serious security and privacy vulnerabilities. In the case of cloud storage, we focus on Microsoft OneDrive. We show how to use short-URL enumeration to discover and read shared content stored in the OneDrive cloud, including even files for which the user did not generate a short URL. 7% of the OneDrive accounts exposed in this fashion allow anyone to write into them. Since cloud-stored files are automatically copied into users' personal computers and devices, this is a vector for large-scale, automated malware injection. In the case of online maps, we show how short-URL enumeration reveals the directions that users shared with each other. For many individual users, this enables inference of their residential addresses, true identities, and extremely sensitive locations they visited that, if publicly revealed, would violate medical and financial privacy.
52.
Possible superconductivity in brain (arxiv.org) Abstract ↓
The unprecedented power of the brain suggests that it may process information quantum-mechanically. Since quantum processing is already achieved in superconducting quantum computers, it may imply that superconductivity is the basis of quantum computation in brain too. Superconductivity could also be responsible for long-term memory. Following these ideas, the paper reviews the progress in the search for superconductors with high critical temperature and tries to answer the question about the superconductivity in brain. It focuses on recent electrical measurements of brain slices, in which graphene was used as a room-temperature quantum mediator, and argues that these measurements could be interpreted as providing evidence of superconductivity in the neural network of mammalian brains. The estimated critical temperature of superconducting network in brain is rather high: 2063 plus-minus 114 K. A similar critical temperature was predicted in the Little's model for one-dimensional organic chains linked to certain molecular complexes. A reasonable suggestion is that superconductivity develops in microtubules inside the neurons of brain.
53.
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World (arxiv.org) Abstract ↓
Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to $1.5$cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.
54.
Bitcoin, Currencies, and Fragility (arxiv.org) Abstract ↓
This discussion applies quantitative finance methods and economic arguments to cryptocurrencies in general and bitcoin in particular -- as there are about $10,000$ cryptocurrencies, we focus (unless otherwise specified) on the most discussed crypto of those that claim to hew to the original protocol (Nakamoto 2009) and the one with, by far, the largest market capitalization. In its current version, in spite of the hype, bitcoin failed to satisfy the notion of "currency without government" (it proved to not even be a currency at all), can be neither a short nor long term store of value (its expected value is no higher than $0$), cannot operate as a reliable inflation hedge, and, worst of all, does not constitute, not even remotely, a safe haven for one's investments, a shield against government tyranny, or a tail protection vehicle for catastrophic episodes. Furthermore, bitcoin promoters appear to conflate the success of a payment mechanism (as a decentralized mode of exchange), which so far has failed, with the speculative variations in the price of a zero-sum maximally fragile asset with massive negative externalities. Going through monetary history, we show how a true numeraire must be one of minimum variance with respect to an arbitrary basket of goods and services, how gold and silver lost their inflation hedge status during the Hunt brothers squeeze in the late 1970s and what would be required from a true inflation hedged store of value.
55.
Quantum principle of relativity (arxiv.org) Abstract ↓
We show that the local and deterministic mode of description is not only in conflict with the quantum theory, but also with relativity. We argue that elementary relativistic properties of spacetime lead to the emergence of a non-deterministic quantum-mechanical picture involving quantum superpositions and complex probability amplitudes.
56.
Ribbon filter: practically smaller than Bloom and Xor (arxiv.org) Abstract ↓
Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound. We introduce the Ribbon filter: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger $f \ge 2^{-7}$. In many cases, Ribbon is faster than existing filters for the same space overhead, or can achieve space overhead below 10% with some additional CPU time. An experimental Ribbon design with load balancing can even achieve space overheads below 1%. A Ribbon filter resembles an Xor filter modified to maximize locality and is constructed by solving a band-like linear system over Boolean variables. In previous work, Dietzfelbinger and Walzer describe this linear system and an efficient Gaussian solver. We present and analyze a faster, more adaptable solving process we call "Rapid Incremental Boolean Banding ON the fly," which resembles hash table construction. We also present and analyze an attractive Ribbon variant based on making the linear system homogeneous, and describe several more practical enhancements.
57.
One SQL to Rule Them All (arxiv.org) Abstract ↓
Real-time data analysis and management are increasingly critical for today`s businesses. SQL is the de facto lingua franca for these endeavors, yet support for robust streaming analysis and management with SQL remains limited. Many approaches restrict semantics to a reduced subset of features and/or require a suite of non-standard constructs. Additionally, use of event timestamps to provide native support for analyzing events according to when they actually occurred is not pervasive, and often comes with important limitations. We present a three-part proposal for integrating robust streaming into the SQL standard, namely: (1) time-varying relations as a foundation for classical tables as well as streaming data, (2) event time semantics, (3) a limited set of optional keyword extensions to control the materialization of time-varying query results. Motivated and illustrated using examples and lessons learned from implementations in Apache Calcite, Apache Flink, and Apache Beam, we show how with these minimal additions it is possible to utilize the complete suite of standard SQL semantics to perform robust stream processing.
58.
Zero-Shot Text-to-Image Generation (arxiv.org) Abstract ↓
Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
59.
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models (arxiv.org) Abstract ↓
Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir.
60.
Probing the Mystery of Cryptocurrency Theft: An Investigation into Methods for Taint Analysis (arxiv.org) Abstract ↓
Since the creation of Bitcoin, transaction tracking is one of the prominent means for following the movement of Bitcoins involved in illegal activities. Although every Bitcoin transaction is recorded in the blockchain database, which is transparent for anyone to observe and analyse, Bitcoin's pseudonymity system and transaction obscuring techniques still allow criminals to disguise their transaction trail. While there have been a few attempts to develop tracking methods, there is no accepted evaluation method to measure their accuracy. Therefore, this paper investigates strategies for transaction tracking by introducing two new tainting methods, and proposes an address profiling approach with a metrics-based evaluation framework. We use our approach and framework to compare the accuracy of our new tainting methods with the previous tainting techniques, using data from two real Bitcoin theft transactions and several related control transactions.
61.
Concrete Problems in AI Safety (arxiv.org) Abstract ↓
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.
62.
Direct Multipixel Imaging and Spectroscopy of an Exoplanet with a Solar Gravity Lens Mission (arxiv.org) Abstract ↓
The remarkable optical properties of the solar gravitational lens (SGL) include major brightness amplification (~1e11 at wavelength of 1 um) and extreme angular resolution (~1e-10 arcsec) in a narrow field of view. A mission to the SGL carrying a modest telescope and coronagraph opens up a possibility for direct megapixel imaging and high-resolution spectroscopy of a habitable Earth-like exoplanet at a distance of up to 100 light years. The entire image of such a planet is compressed by the SGL into a region with a diameter of ~1.3 km in the vicinity of the focal line. The telescope, acting as a single pixel detector while traversing this region, can build an image of the exoplanet with kilometer-scale resolution of its surface, enough to see its surface features and signs of habitability. We report here on the results of our initial study of a mission to the deep outer regions of our solar system, with the primary mission objective of conducting direct megapixel high-resolution imaging and spectroscopy of a potentially habitable exoplanet by exploiting the remarkable optical properties of the SGL. Our main goal was to investigate what it takes to operate spacecraft at such enormous distances with the needed precision. Specifically, we studied i) how a space mission to the focal region of the SGL may be used to obtain high-resolution direct imaging and spectroscopy of an exoplanet by detecting, tracking, and studying the Einstein ring around the Sun, and ii) how such information could be used to detect signs of life on another planet. Our results indicate that a mission to the SGL with an objective of direct imaging and spectroscopy of a distant exoplanet is challenging, but possible. We composed a list of recommendations on the mission architectures with risk and return tradeoffs and discuss an enabling technology development program.
63.
Do ImageNet Classifiers Generalize to ImageNet? (arxiv.org) Abstract ↓
We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
64.
Deep Residual Learning for Image Recognition (arxiv.org) Abstract ↓
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
65.
KML: Using Machine Learning to Improve Storage Systems (arxiv.org) Abstract ↓
Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users -- essentially burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O heavy applications, so even a small overall latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this paper, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two problems: optimal readahead and NFS read-size values. Our experiments show that KML consumes little OS resources, adds negligible latency, and yet can learn patterns that can improve I/O throughput by as much as 2.3x or 15x for the two use cases respectively -- even for complex, never-before-seen, concurrently running mixed workloads on different storage devices.
66.
On the power of Chatterjee rank correlation (arxiv.org) Abstract ↓
Chatterjee (2021) introduced a simple new rank correlation coefficient that has attracted much recent attention. The coefficient has the unusual appeal that it not only estimates a population quantity first proposed by Dette et al. (2013) that is zero if and only if the underlying pair of random variables is independent, but also is asymptotically normal under independence. This paper compares Chatterjee's new correlation coefficient to three established rank correlations that also facilitate consistent tests of independence, namely, Hoeffding's $D$, Blum-Kiefer-Rosenblatt's $R$, and Bergsma-Dassios-Yanagimoto's $\tau^*$. We contrast their computational efficiency in light of recent advances, and investigate their power against local rotation and mixture alternatives. Our main results show that Chatterjee's coefficient is unfortunately rate sub-optimal compared to $D$, $R$, and $\tau^*$. The situation is more subtle for a related earlier estimator of Dette et al. (2013). These results favor $D$, $R$, and $\tau^*$ over Chatterjee's new correlation coefficient for the purpose of testing independence.
67.
The Physics of Financial Networks (arxiv.org) Abstract ↓
The field of Financial Networks is a paramount example of the novel applications of Statistical Physics that have made possible by the present data revolution. As the total value of the global financial market has vastly outgrown the value of the real economy, financial institutions on this planet have created a web of interactions whose size and topology calls for a quantitative analysis by means of Complex Networks. Financial Networks are not only a playground for the use of basic tools of statistical physics as ensemble representation and entropy maximization; rather, their particular dynamics and evolution triggered theoretical advancements as the definition of DebtRank to measure the impact and diffusion of shocks in the whole systems. In this review we present the state of the art in this field, starting from the different definitions of financial networks (based either on loans, on assets ownership, on contracts involving several parties -- such as credit default swaps, to multiplex representation when firms are introduced in the game and a link with real economy is drawn) and then discussing the various dynamics of financial contagion as well as applications in financial network inference and validation. We believe that this analysis is particularly timely since financial stability as well as recent innovations in climate finance, once properly analysed and understood in terms of complex network theory, can play a pivotal role in the transformation of our society towards a more sustainable world.
68.
Flash Boys 2.0: Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges (arxiv.org) Abstract ↓
Blockchains, and specifically smart contracts, have promised to create fair and transparent trading ecosystems. Unfortunately, we show that this promise has not been met. We document and quantify the widespread and rising deployment of arbitrage bots in blockchain systems, specifically in decentralized exchanges (or "DEXes"). Like high-frequency traders on Wall Street, these bots exploit inefficiencies in DEXes, paying high transaction fees and optimizing network latency to frontrun, i.e., anticipate and exploit, ordinary users' DEX trades. We study the breadth of DEX arbitrage bots in a subset of transactions that yield quantifiable revenue to these bots. We also study bots' profit-making strategies, with a focus on blockchain-specific elements. We observe bots engage in what we call priority gas auctions (PGAs), competitively bidding up transaction fees in order to obtain priority ordering, i.e., early block position and execution, for their transactions. PGAs present an interesting and complex new continuous-time, partial-information, game-theoretic model that we formalize and study. We release an interactive web portal, http://frontrun.me/, to provide the community with real-time data on PGAs. We additionally show that high fees paid for priority transaction ordering poses a systemic risk to consensus-layer security. We explain that such fees are just one form of a general phenomenon in DEXes and beyond---what we call miner extractable value (MEV)---that poses concrete, measurable, consensus-layer security risks. We show empirically that MEV poses a realistic threat to Ethereum today. Our work highlights the large, complex risks created by transaction-ordering dependencies in smart contracts and the ways in which traditional forms of financial-market exploitation are adapting to and penetrating blockchain economies.
69.
Categorizing Variants of Goodhart's Law (arxiv.org) Abstract ↓
There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are "(at least) four different mechanisms" that relate to Goodhart's Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field.
70.
Test of lepton universality in beauty-quark decays (arxiv.org) Abstract ↓
The Standard Model of particle physics currently provides our best description of fundamental particles and their interactions. The theory predicts that the different charged leptons, the electron, muon and tau, have identical electroweak interaction strengths. Previous measurements have shown a wide range of particle decays are consistent with this principle of lepton universality. This article presents evidence for the breaking of lepton universality in beauty-quark decays, with a significance of 3.1 standard deviations, based on proton-proton collision data collected with the LHCb detector at CERN's Large Hadron Collider. The measurements are of processes in which a beauty meson transforms into a strange meson with the emission of either an electron and a positron, or a muon and an antimuon. If confirmed by future measurements, this violation of lepton universality would imply physics beyond the Standard Model, such as a new fundamental interaction between quarks and leptons.
71.
Organic single-photon switch (arxiv.org) Abstract ↓
The recent progress in nanotechnology [1,2] and single-molecule spectroscopy [3-5] paves the way for cost-effective organic quantum optical technologies emergent with a promise to real-life devices operating at ambient conditions. In this letter, we harness $\pi$-conjugated segments of an organic ladder-type polymer strongly coupled to a microcavity forming correlated collective dressed states of light, so-called of exciton-polariton condensates. We explore an efficient way for all-optical ultra-fast control over the macroscopic condensate wavefunction via a single photon. Obeying Bose statistics, exciton-polaritons exhibit an extreme nonlinearity undergoing bosonic stimulation [6] which we have managed to trigger at the single-photon level. Relying on the nature of organic matter to sustain stable excitons dressed with high energy molecular vibrations we have developed a principle that allows for single-photon nonlinearity operation at ambient conditions opening the door for practical implementations like sub-picosecond switching, amplification and all-optical logic at the fundamental limit of single light quanta.
72.
High-Performance Large-Scale Image Recognition Without Normalization (arxiv.org) Abstract ↓
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. Our code is available at https://github.com/deepmind/ deepmind-research/tree/master/nfnets
73.
Polar Stroking: New Theory and Methods for Stroking Paths (arxiv.org) Abstract ↓
Stroking and filling are the two basic rendering operations on paths in vector graphics. The theory of filling a path is well-understood in terms of contour integrals and winding numbers, but when path rendering standards specify stroking, they resort to the analogy of painting pixels with a brush that traces the outline of the path. This means important standards such as PDF, SVG, and PostScript lack a rigorous way to say what samples are inside or outside a stroked path. Our work fills this gap with a principled theory of stroking. Guided by our theory, we develop a novel polar stroking method to render stroked paths robustly with an intuitive way to bound the tessellation error without needing recursion. Because polar stroking guarantees small uniform steps in tangent angle, it provides an efficient way to accumulate arc length along a path for texturing or dashing. While this paper focuses on developing the theory of our polar stroking method, we have successfully implemented our methods on modern programmable GPUs.
74.
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone (arxiv.org) Abstract ↓
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.
75.
Deep Learning and Mathematical Intuition: A Review of (Davies et al. 2021) (arxiv.org) Abstract ↓
A recent paper by Davies et al (2021) describes how deep learning (DL) technology was used to find plausible hypotheses that have led to two original mathematical results: one in knot theory, one in representation theory. I argue here that the significance and novelty of this application of DL technology to mathematics is significantly overstated in the paper under review and has been wildly overstated in some of the accounts in the popular science press. In the knot theory result, the role of DL was small, and a conventional statistical analysis would probably have sufficed. In the representation theory result, the role of DL is much larger; however, it is not very different in kind from what has been done in experimental mathematics for decades. Moreover, it is not clear whether the distinctive features of DL that make it useful here will apply across a wide range of mathematical problems. Finally, I argue that the DL here "guides human intuition" is unhelpful and misleading; what the DL does primarily does is to mark many possible conjectures as false and a few others as possibly worthy of study. Certainly the representation theory result represents an original and interesting application of DL to mathematical research, but its larger significance is uncertain.
76.
Neural Language Modeling for Contextualized Temporal Graph Generation (arxiv.org) Abstract ↓
This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document. Despite the huge success of neural pre-training methods in NLP tasks, its potential for temporal reasoning over event graphs has not been sufficiently explored. Part of the reason is the difficulty in obtaining large training corpora with human-annotated events and temporal links. We address this challenge by using existing IE/NLP tools to automatically generate a large quantity (89,000) of system-produced document-graph pairs, and propose a novel formulation of the contextualized graph generation problem as a sequence-to-sequence mapping task. These strategies enable us to leverage and fine-tune pre-trained language models on the system-induced training data for the graph generation task. Our experiments show that our approach is highly effective in generating structurally and semantically valid graphs. Further, evaluation on a challenging hand-labeled, out-domain corpus shows that our method outperforms the closest existing method by a large margin on several metrics. Code and pre-trained models are available at https://github.com/madaan/temporal-graph-gen.
77.
Containment Control for a Social Network with State-Dependent Connectivity (arxiv.org) Abstract ↓
Social interactions influence our thoughts, opinions and actions. In this paper, social interactions are studied within a group of individuals composed of influential social leaders and followers. Each person is assumed to maintain a social state, which can be an emotional state or an opinion. Followers update their social states based on the states of local neighbors, while social leaders maintain a constant desired state. Social interactions are modeled as a general directed graph where each directed edge represents an influence from one person to another. Motivated by the non-local property of fractional-order systems, the social response of individuals in the network are modeled by fractional-order dynamics whose states depend on influences from local neighbors and past experiences. A decentralized influence method is then developed to maintain existing social influence between individuals (i.e., without isolating peers in the group) and to influence the social group to a common desired state (i.e., within a convex hull spanned by social leaders). Mittag-Leffler stability methods are used to prove asymptotic stability of the networked fractional-order system.
78.
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference (arxiv.org) Abstract ↓
A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. We hypothesize that statistical NLI models may adopt three fallible syntactic heuristics: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including BERT, a state-of-the-art model, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area
79.
A New Logic For Uncertainty (arxiv.org) Abstract ↓
Fuzziness and randomicity widespread exist in natural science, engineering, technology and social science. The purpose of this paper is to present a new logic - uncertain propositional logic which can deal with both fuzziness by taking truth value semantics and randomicity by taking probabilistic semantics or possibility semantics. As the first step for purpose of establishing a logic system which completely reflect the uncertainty of the objective world, this logic will lead to a set of logical foundations for uncertainty theory as what classical logic done in certain or definite situations or circumstances.
80.
Closing the "Quantum Supremacy" Gap: Achieving Real-Time Simulation of a Random Quantum Circuit Using a New Sunway Supercomputer (arxiv.org) Abstract ↓
We develop a high-performance tensor-based simulator for random quantum circuits(RQCs) on the new Sunway supercomputer. Our major innovations include: (1) a near-optimal slicing scheme, and a path-optimization strategy that considers both complexity and compute density; (2) a three-level parallelization scheme that scales to about 42 million cores; (3) a fused permutation and multiplication design that improves the compute efficiency for a wide range of tensor contraction scenarios; and (4) a mixed-precision scheme to further improve the performance. Our simulator effectively expands the scope of simulatable RQCs to include the 10*10(qubits)*(1+40+1)(depth) circuit, with a sustained performance of 1.2 Eflops (single-precision), or 4.4 Eflops (mixed-precision)as a new milestone for classical simulation of quantum circuits; and reduces the simulation sampling time of Google Sycamore to 304 seconds, from the previously claimed 10,000 years.
81.
Real numbers, data science and chaos: How to fit any dataset with a single parameter (arxiv.org) Abstract ↓
We show how any dataset of any modality (time-series, images, sound...) can be approximated by a well-behaved (continuous, differentiable...) scalar function with a single real-valued parameter. Building upon elementary concepts from chaos theory, we adopt a pedagogical approach demonstrating how to adjust this parameter in order to achieve arbitrary precision fit to all samples of the data. Targeting an audience of data scientists with a taste for the curious and unusual, the results presented here expand on previous similar observations regarding expressiveness power and generalization of machine learning models.
82.
Is 40 the new 60? How popular media portrays the employability of older software developers (arxiv.org) Abstract ↓
Alerted by our previous research as well as media reports and discussions in online forums about ageism in the software industry, we set out to study the public discourse around age and software development. With a focus on the USA, we analyzed popular online articles and related discussions on Hacker News through the lens of (perceived) employability issues and potential mitigation strategies. Besides rather controversial strategies such as disguising age-related aspects in r\'esum\'es or undergoing plastic surgeries to appear young, we highlight the importance of keeping up-to-date, specializing in certain tasks or technologies, and present role transitions as a way forward for veteran developers. With this article, we want to build awareness among decision makers in software projects to help them anticipate and mitigate challenges that their older employees may face.
83.
Neural Tangent Kernel: Convergence and Generalization in Neural Networks (arxiv.org) Abstract ↓
At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function $f_\theta$ (which maps input vectors to output vectors) follows the kernel gradient of the functional cost (which is convex, in contrast to the parameter cost) w.r.t. a new kernel: the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positive-definiteness of the limiting NTK. We prove the positive-definiteness of the limiting NTK when the data is supported on the sphere and the non-linearity is non-polynomial. We then focus on the setting of least-squares regression and show that in the infinite-width limit, the network function $f_\theta$ follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping. Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit.
84.
Unique on Facebook: Formulation and Evidence of (Nano)targeting Individual Users with non-PII Data (arxiv.org) Abstract ↓
The privacy of an individual is bounded by the ability of a third party to reveal their identity. Certain data items such as a passport ID or a mobile phone number may be used to uniquely identify a person. These are referred to as Personal Identifiable Information (PII) items. Previous literature has also reported that, in datasets including millions of users, a combination of several non-PII items (which alone are not enough to identify an individual) can uniquely identify an individual within the dataset. In this paper, we define a data-driven model to quantify the number of interests from a user that make them unique on Facebook. To the best of our knowledge, this represents the first study of individuals' uniqueness at the world population scale. Besides, users' interests are actionable non-PII items that can be used to define ad campaigns and deliver tailored ads to Facebook users. We run an experiment through 21 Facebook ad campaigns that target three of the authors of this paper to prove that, if an advertiser knows enough interests from a user, the Facebook Advertising Platform can be systematically exploited to deliver ads exclusively to a specific user. We refer to this practice as nanotargeting. Finally, we discuss the harmful risks associated with nanotargeting such as psychological persuasion, user manipulation, or blackmailing, and provide easily implementable countermeasures to preclude attacks based on nanotargeting campaigns on Facebook.
85.
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition (arxiv.org) Abstract ↓
We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.
86.
Score-Based Generative Modeling through Stochastic Differential Equations (arxiv.org) Abstract ↓
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
87.
Unexpected novel Merbecovirus discoveries in agricultural sequencing datasets from Wuhan, China (arxiv.org) Abstract ↓
In this study we document the unexpected discovery of multiple coronaviruses and a BSL-3 pathogen in agricultural cotton and rice sequencing datasets. In particular, we have identified a novel HKU5-related Merbecovirus in a cotton dataset sequenced by the Huazhong Agricultural University in 2017. We have also found an infectious clone sequence containing a novel HKU4-related Merbecovirus related to MERS coronavirus in a rice dataset sequenced by the Huazhong Agricultural University in early 2020. Another HKU5-related Merbecovirus, as well as Japanese encephalitis virus, were identified in a cotton dataset sequenced by the Huazhong Agricultural University in 2018. An HKU3-related Betacoronavirus was found in a Mus musculus sequencing dataset from the Wuhan Institute of Virology in 2017. Finally, a SARS-WIV1-like Betacoronavirus was found in a rice dataset sequenced by the Fujian Agriculture and Forestry University in 2017. Using the contaminating reads we have extracted from the above datasets, we were able to assemble complete genomes of two novel coronaviruses which we disclose herein. In light of our findings, we raise concerns about biosafety protocol breaches, as indicated by our discovery of multiple dangerous human pathogens in agricultural sequencing laboratories in Wuhan and Fouzou City, China.
88.
Thirty-six entangled officers of Euler: Quantum solution to a classically impossible problem (arxiv.org) Abstract ↓
The negative solution to the famous problem of $36$ officers of Euler implies that there are no two orthogonal Latin squares of order six. We show that the problem has a solution, provided the officers are entangled, and construct orthogonal quantum Latin squares of this size. As a consequence, we find an example of the long-elusive Absolutely Maximally Entangled state AME$(4,6)$ of four subsystems with six levels each, equivalently a $2$-unitary matrix of size $36$, which maximizes the entangling power among all bipartite unitary gates of this dimension, or a perfect tensor with four indices, each running from one to six. This special state deserves the appellation golden AME state as the golden ratio appears prominently in its elements. This result allows us to construct a pure nonadditive quhex quantum error detection code $(\!(3,6,2)\!)_6$, which saturates the Singleton bound and allows one to encode a $6$-level state into a triplet of such states.
89.
Remarkable Daytime Sub-ambient Radiative Cooling in BaSO4 Nanoparticle Films and Paints (arxiv.org) Abstract ↓
Radiative cooling is a passive cooling technology that offers great promises to reduce space cooling cost, combat the urban island effect and alleviate the global warming. To achieve passive daytime radiative cooling, current state-of-the-art solutions often utilize complicated multilayer structures or a reflective metal layer, limiting their applications in many fields. Attempts have been made to achieve passive daytime radiative cooling with single-layer paints, but they often require a thick coating or show partial daytime cooling. In this work, we experimentally demonstrate remarkable full daytime sub-ambient cooling performance with both BaSO4 nanoparticle films and BaSO4 nanocomposite paints. BaSO4 has a high electron bandgap for low solar absorptance and phonon resonance at 9 um for high sky window emissivity. With an appropriate particle size and a broad particle size distribution, BaSO4 nanoparticle film reaches an ultra-high solar reflectance of 97.6% and high sky window emissivity of 0.96. During field tests, BaSO4 film stays more than 4.5C below ambient temperature or achieves average cooling power of 117 W/m2. BaSO4-acrylic paint is developed with 60% volume concentration to enhance the reliability in outdoor applications, achieving solar reflectance of 98.1% and sky window emissivity of 0.95. Field tests indicate similar cooling performance to the BaSO4 films. Overall, our BaSO4-acrylic paint shows standard figure of merit of 0.77 which is among the highest of radiative cooling solutions, while providing great reliability, the convenient paint form, ease of use and the compatibility with commercial paint fabrication process.
90.
Energy and Policy Considerations for Deep Learning in NLP (arxiv.org) Abstract ↓
Recent progress in hardware and methodology for training neural networks has ushered in a new generation of large networks trained on abundant data. These models have obtained notable gains in accuracy across many NLP tasks. However, these accuracy improvements depend on the availability of exceptionally large computational resources that necessitate similarly substantial energy consumption. As a result these models are costly to train and develop, both financially, due to the cost of hardware and electricity or cloud compute time, and environmentally, due to the carbon footprint required to fuel modern tensor processing hardware. In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP. Based on these findings, we propose actionable recommendations to reduce costs and improve equity in NLP research and practice.
91.
The signature and cusp geometry of hyperbolic knots (arxiv.org) Abstract ↓
We introduce a new real-valued invariant called the natural slope of a hyperbolic knot in the 3-sphere, which is defined in terms of its cusp geometry. We show that twice the knot signature and the natural slope differ by at most a constant times the hyperbolic volume divided by the cube of the injectivity radius. This inequality was discovered using machine learning to detect relationships between various knot invariants. It has applications to Dehn surgery and to 4-ball genus. We also show a refined version of the inequality where the upper bound is a linear function of the volume, and the slope is corrected by terms corresponding to short geodesics that link the knot an odd number of times.
92.
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth (arxiv.org) Abstract ↓
Attention-based architectures have become ubiquitous in machine learning, yet our understanding of the reasons for their effectiveness remains limited. This work proposes a new way to understand self-attention networks: we show that their output can be decomposed into a sum of smaller terms, each involving the operation of a sequence of attention heads across layers. Using this decomposition, we prove that self-attention possesses a strong inductive bias towards "token uniformity". Specifically, without skip connections or multi-layer perceptrons (MLPs), the output converges doubly exponentially to a rank-1 matrix. On the other hand, skip connections and MLPs stop the output from degeneration. Our experiments verify the identified convergence phenomena on different variants of standard transformer architectures.
93.
Deep Shading: Convolutional Neural Networks for Screen-Space Shading (arxiv.org) Abstract ↓
In computer vision, convolutional neural networks (CNNs) have recently achieved new levels of performance for several inverse problems where RGB pixel appearance is mapped to attributes such as positions, normals or reflectance. In computer graphics, screen-space shading has recently increased the visual quality in interactive image synthesis, where per-pixel attributes such as positions, normals or reflectance of a virtual 3D scene are converted into RGB pixel appearance, enabling effects like ambient occlusion, indirect light, scattering, depth-of-field, motion blur, or anti-aliasing. In this paper we consider the diagonal problem: synthesizing appearance from given per-pixel attributes using a CNN. The resulting Deep Shading simulates various screen-space effects at competitive quality and speed while not being programmed by human experts but learned from example images.
94.
Perceiver IO: A General Architecture for Structured Inputs & Outputs (arxiv.org) Abstract ↓
The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of inputs, it can only produce very simple outputs such as class scores. Perceiver IO overcomes this limitation without sacrificing the original's appealing properties by learning to flexibly query the model's latent space to produce outputs of arbitrary size and semantics. Perceiver IO still decouples model depth from data size and still scales linearly with data size, but now with respect to both input and output sizes. The full Perceiver IO model achieves strong results on tasks with highly structured output spaces, such as natural language and visual understanding, StarCraft II, and multi-task and multi-modal domains. As highlights, Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark without the need for input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation.
95.
Symbolic Behaviour in Artificial Intelligence (arxiv.org) Abstract ↓
The ability to use symbols is the pinnacle of human intelligence, but has yet to be fully replicated in machines. Here we argue that the path towards symbolically fluent artificial intelligence (AI) begins with a reinterpretation of what symbols are, how they come to exist, and how a system behaves when it uses them. We begin by offering an interpretation of symbols as entities whose meaning is established by convention. But crucially, something is a symbol only for those who demonstrably and actively participate in this convention. We then outline how this interpretation thematically unifies the behavioural traits humans exhibit when they use symbols. This motivates our proposal that the field place a greater emphasis on symbolic behaviour rather than particular computational mechanisms inspired by more restrictive interpretations of symbols. Finally, we suggest that AI research explore social and cultural engagement as a tool to develop the cognitive machinery necessary for symbolic behaviour to emerge. This approach will allow for AI to interpret something as symbolic on its own rather than simply manipulate things that are only symbols to human onlookers, and thus will ultimately lead to AI with more human-like symbolic fluency.
96.
LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition (arxiv.org) Abstract ↓
Facial recognition systems are increasingly deployed by private corporations, government agencies, and contractors for consumer services and mass surveillance programs alike. These systems are typically built by scraping social media profiles for user images. Adversarial perturbations have been proposed for bypassing facial recognition systems. However, existing methods fail on full-scale systems and commercial APIs. We develop our own adversarial filter that accounts for the entire image processing pipeline and is demonstrably effective against industrial-grade pipelines that include face detection and large scale databases. Additionally, we release an easy-to-use webtool that significantly degrades the accuracy of Amazon Rekognition and the Microsoft Azure Face Recognition API, reducing the accuracy of each to below 1%.
97.
Ternary circuits: why R=3 is not the Optimal Radix for Computation (arxiv.org) Abstract ↓
A demonstration that e=2.718 rounded to 3 is the best radix for computation is disproved. The MOSFET-like CNTFET technology is used to compare inverters, Nand, adders, multipliers, D Flip-Flops and SRAM cells. The transistor count ratio between ternary and binary circuits is generally greater than the log(3)/log(2) information ratio. The only exceptions concern a circuit approach that combines two circuit drawbacks (an additional power supply and a circuit conflict between transistors) and only when it implements circuits based on the ternary inverter. For arithmetic circuits such as adders and multipliers, the ternary circuits are always outperformed by the binary ones using the same technology.
98.
Intriguing properties of neural networks (arxiv.org) Abstract ↓
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.
99.
Astronomical engineering: a strategy for modifying planetary orbits (arxiv.org) Abstract ↓
The Sun's gradual brightening will seriously compromise the Earth's biosphere within ~ 1E9 years. If Earth's orbit migrates outward, however, the biosphere could remain intact over the entire main-sequence lifetime of the Sun. In this paper, we explore the feasibility of engineering such a migration over a long time period. The basic mechanism uses gravitational assists to (in effect) transfer orbital energy from Jupiter to the Earth, and thereby enlarges the orbital radius of Earth. This transfer is accomplished by a suitable intermediate body, either a Kuiper Belt object or a main belt asteroid. The object first encounters Earth during an inward pass on its initial highly elliptical orbit of large (~ 300 AU) semimajor axis. The encounter transfers energy from the object to the Earth in standard gravity-assist fashion by passing close to the leading limb of the planet. The resulting outbound trajectory of the object must cross the orbit of Jupiter; with proper timing, the outbound object encounters Jupiter and picks up the energy it lost to Earth. With small corrections to the trajectory, or additional planetary encounters (e.g., with Saturn), the object can repeat this process over many encounters. To maintain its present flux of solar energy, the Earth must experience roughly one encounter every 6000 years (for an object mass of 1E22 g). We develop the details of this scheme and discuss its ramifications.
100.
Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks (arxiv.org) Abstract ↓
Modern design, control, and optimization often requires simulation of highly nonlinear models, leading to prohibitive computational costs. These costs can be amortized by evaluating a cheap surrogate of the full model. Here we present a general data-driven method, the continuous-time echo state network (CTESN), for generating surrogates of nonlinear ordinary differential equations with dynamics at widely separated timescales. We empirically demonstrate near-constant time performance using our CTESNs on a physically motivated scalable model of a heating system whose full execution time increases exponentially, while maintaining relative error of within 0.2 %. We also show that our model captures fast transients as well as slow dynamics effectively, while other techniques such as physics informed neural networks have difficulties trying to train and predict the highly nonlinear behavior of these models.


Last Update 2022-01-23 00:17


Not affiliated with Hacker News or YCombinator