ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

More data speeds up training time in learning halfspaces over sparse vectors
Daniely, Amit and Linial, Nati and Shalev-Shwartz, Shai
Neural Information Processing Systems Conference - 2013 via Local Bibsonomy
Keywords: dblp

[link] Summary by NIPS Conference Reviews 10 years ago

This paper provides one of the most natural examples of a learning problem for which the problem becomes computationally tractable when given a sufficient amount of data, but is computationally intractable (though still information theoretically tractable) when given a smaller quantity of data. This computational intractability is based on a complexity-theoretic assumption about the hardness of distinguishing satisfiable 3SAT formulas from random ones at a given clause density (more specifically, the 3MAJ variant of the conjecture). 

The specific problem considered by the authors is learning halfspaces over 3-sparse vectors. The authors complement their negative results with nearly matching positive results (if one believes a significantly stronger complexity theoretic conjecture-- that hardness persists even for random formulae whose density is $n^\mu$ over the satistfiability threshold). Sadly, the algorithmic results are described in the Appendix, and are not discussed. It seems like they are essentially modifications of Hazan et al.'s 2012, though it would be greatly appreciated if the authors included a high-level discussion of the algorithm. Even if no formal proofs of correctness will fit in the body, a description of the algorithm would be helpful.

papers.nips.cc
scholar.google.com

Teaching Machines to Read and Comprehend
Hermann, Karl Moritz and Kociský, Tomás and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil
Neural Information Processing Systems Conference - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by mashayekhi 8 years ago

https://i.imgur.com/mYFkCxk.png
Main contributions:
The paper proposed a new method to provide large scale supervised reading comprehension and also developing attention based deep neural networks that can answer complex questions from real documents.

Importance:
Obtaining supervised natural language comprehensive data in large scale is difficult. On the other hand, reading comprehension methods constructed based on synthetic data failed in real environment when facing real data. This work addresses lack of real supervised reading comprehension data. In addition, they build novel deep learning models for reading comprehension by incorporating attention mechanism into recurrent neural networks. Attention mechanism allows a model to focus on the parts of a document that it believes will help it answer a question.

Method:
First part, two machine reading corpora is created by exploiting CNN and Daily Mail articles along with their corresponding summaries in form of the bullet points. These bullet points are abstractive and they are paraphrasing important parts of the article rather than copying sentences from the text. The bullet points turn into Cloze type questions by replacing one entity at a time with an entity marker, for example, “producer X will not press charges against ent212 ,his lawyer says.”. All the entities are replaced by entity markers and using a coreference and also entity markers are permuted for each data points to avoid world knowledge and co-occurrence effects in the reading comprehension.

Second part, For the reading comprehension task, they used 2 simple base line models(A), 2 symbolic matching models(B), and 4 recurrent neural networks models(C):
A1) Majority Baseline: It picks the most frequently observed entity in the context document.

A2) Exclusive Majority: It picks the most frequently observed entity in the context document which is not observed in the query.

B1) Frame-Semantic Parsing: This method parses the sentence to find "who did what to whom" using state-of-the-art frame semantic parser on the anonymized data points.

B2) Word Distance Benchmark: It aligns placeholder of Cloze form questions with each possible entity in the context document and calculates the distance between the question and the context around the aligned entity. Then sum of the distance of every word in a query to their nearest aligned word in the document is calculated.

C1) Deep LSTM Reader (2-layer LSTM)
This model feeds the [document | query] pair separated by a delimiter as a single large document, one word at a time. LSTM cells have skip connections from input to hidden layers and hidden layer to output.

C2) Attentive Reader (bi-directional LSTM with attention)
This model employs attention mechanism to overcome the bottleneck of fixed width hidden vector. First, it encodes the document and the query using separate bi-directional single layer LSTM. Then, query encoding is obtained by concatenating the final forward and backwards outputs. Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs). The weights can be interpreted as the degree to which the network attends to a particular token in the document. Finally, the model is completed by defining a non-linear combination of document and query embedding.

C3) Uniform Reader (bi-directional LSTM)
It is Attentive Reader without attention mechanism, which is used here to see the effect of attention mechanism on the results.

C4) Impatient Reader (bi-directional LSTM with attention per each query token)
This one is similar to Attentive Reader except that the attention weights are computed per each query token. The intuition is that for each token the model finds which part of the context document is more relevant. The model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation using a non-linear combination of document embedding and query embedding.

Results:
As expected, Attentive and Impatient Readers outperform all other models which show the benefits of attention model. Also Uniform Reader supports this hypothesis. The accuracies on two datasets (CNN, Daily Mail) are Maximum Frequency: 33.2 / 25.5, Exclusive Frequency: 39.3 / 32.8, Frame-semantic model: 40.2 / 35.5, Word distance model: 50.9 / 55.5, Deep LSTM Reader: 57.0 / 62.2, Uniform Reader: 39.4 / 34.4, Attentive Reader: 63.0 / 69.0, Impatient Reader: 63.8 / 68.0.

papers.nips.cc
scholar.google.com

Training Very Deep Networks
Srivastava, Rupesh Kumar and Greff, Klaus and Schmidhuber, Jürgen
Neural Information Processing Systems Conference - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by NIPS Conference Reviews 10 years ago

Machine learning researchers frequently find that they get better results by adding more and more layers to their neural networks, but the difficulties of initialization and decaying/exploding gradients have been severely limiting. Indeed, the difficulties of getting information to flow through deep neural networks arguably kept them out of widespread use for 30 years. This paper addresses this problem head on and demonstrates one method for training 100 layer nets.

The paper describes an affective method to train very deep neural networks by means of 'information highways', or building direct connections to upper network layers. Although a generalization of prior techniques, such as cross-layer connections, the authors have shown this method to be effective by experimentation. The contributions are quite novel and well supported by experimental evidence.

papers.nips.cc
scholar.google.com

Deep ADMM-Net for Compressive Sensing MRI
Yang, Yan and Sun, Jian and Li, Huibin and Xu, Zongben
Neural Information Processing Systems Conference - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by NIPS Conference Reviews 9 years ago

The paper addresses the problem of compressive sensing MRI (CS-MRI) by proposing a "deep unfolding" approach (cf. http://arxiv.org/abs/1409.2574) with a sparsity-based data prior and inference via ADMM. All layers of the proposed ADMM-Net are based on a generalization of ADMM inference steps and are discriminatively trained to minimize a reconstruction error. In contrast to other methods for CS-MRI, the proposed approach offers both high reconstruction quality and fast run-time.

The basic idea is to convert the convention optimization based CS reconstruction algorithm into a fixed neural network learned with back-propagation algorithm. Specifically, the ADMM-based CS reconstruction is approximated with a deep neural network. Experimental results show that the approximated neural network outperforms several existing CS-MRI algorithms with less computational time.

The ADMM algorithm has proven to be useful for solving problems with differentiable and non-differentiable terms, and therefore has a clear link with compressed sensing. Experiments prove some gain in performance with respect to the state of the art, specially in terms of computational cost at test time.

doi.acm.org
sci-hub
scholar.google.com

Realtime Data Processing at Facebook
Chen, Guoqiang Jerry and Wiener, Janet L. and Iyer, Shridhar and Jaiswal, Anshul and Lei, Ran and Simha, Nikhil and Wang, Wei and Wilfong, Kevin and Williamson, Tim and Yilmaz, Serhat
International Conference on Management of Data (ACM SIGMOD) - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Michael Whittaker 8 years ago

There's an enormous number of stream (a.k.a. real-time, interactive) processing systems in the wild: Twitter Storm, Twitter Heron, Google Millwheel, LinkedIn Samza, Spark Streaming, Apache Flink, etc. While all similar, the stream processing systems differ in their ease of use, performance, fault-tolerance, scalability, correctness, etc. In this paper, Facebook discusses the design decisions that go into developing a stream processing system and discusses the decisions they made with three of their real-time processing systems: Puma, Swift, and Stylus.

Systems Overview. - Scribe. Scribe is a persistent messaging system for log data. Data is organized into categories, and Scribe buckets are the basic unit of stream processing. The data pushed into scribe is persisted in HDFS. - Puma. Puma is a real-time data processing system in which applications are written in a SQL-like language with user defined functions written in Java. The system is designed for compiled, rather than ad-hoc, queries. It used to compute aggregates and to filter Scribe streams. - Swift. Swift is used to checkpoint Scribe streams. Checkpoints are made every N strings or every B bytes. Swift is used for low-throughput applications. - Stylus. Stylus is a low-level general purpose stream processing system written in C++ and resembles Storm, Millwheel, etc. Stream processors are organized into DAGs and the system provides estimated low watermarks. - Laser. Laser is a high throughput, low latency key-value store built on RocksDB. - Scuba. Scuba supports ad-hoc queries for debugging. - Hive. Hive is a huge data warehouse which support SQL queries.

Example Application. Imagine a stream of events, where each event belongs to a single topic. Consider a streaming application which computes the top k events for each topic over 5 minute windows composed of four stages:

1. Filterer. The filterer filter events and shards events based on their dimension id.
2. Joiner. The joiner looks up dimension data by dimension id, infers the topic of the event, and shards output by (event, topic).
3. Scorer. The scorer maintains a recent history of event counts per topic as well as some long-term counts. It assigns a score for each event and shards output by topic.
4. Ranker. The ranker computes the top k events per topic.

The filterer and joiner are stateless; the scorer and ranker are stateful. The filterer and ranker can be implemented in Puma. All can be implemented in Stylus.

Language Paradigm. The choice of the language in which users write applications can greatly impact a system's ease of use:

- Declarative. SQL is declarative, expressive, and everyone knows it. However, not all computations can be expressed in SQL.
- Functional. Frameworks like Dryad and Spark provide users with a set of built-in operators which they chain together. This is more flexible that SQL.
- Procedural. Systems like Storm, Heron, and Samza allow users to form DAGs of arbitrary processing units.

Puma uses SQL, Swift uses Python, and Stylus uses C++.

Data Transfer. Data must be transferred between nodes in a DAG:

- Direct message transfer. Data can be transferred directly with something like RPCs or ZeroMQ. Millwheel, Flink, and Spark Streaming do this.
- Broker based message transfer. Instead of direct communication, a message broker can be placed between nodes. This allows an output to be multiplexed to multiple outputs. Moreover, brokers can implement back pressure. Heron does this.
- Persistent message based transfer. Storing messages to a persistent messaging layer allows data to be multiplexed, allows for different reader and writer speeds, allows data to be read again, and makes failures independent. Samza Puma, Swift, and Stylus do this.

Facebook connects its systems with Scribe for the following benefits:

- Fault Tolerance: If the producer of a stream fails, the consumer is not affected. This failure independence becomes increasingly useful at scale.
- Fault Tolerance: Recovery can be faster because only nodes need to be replaced.
- Fault Tolerance: Multiple identical downstream nodes can be run to improve fault-tolerance.
- Performance: Different nodes can have different read and write latencies. The system doesn't propagate back pressure to slow down the system.
- Ease of Use: The ability to replay messages makes debugging easier.
- Ease of Use: Storing messages in Scribe makes monitoring and alerting easier.
- Ease of Use: Having Scribe as a common substrate lets different frameworks communicate with one another.
- Scalability: Changing the number of Scribe buckets makes it easy to change the number of partitions.

Processing Semantics. Stream processors:

1. Proccess inputs,
2. Generate output, and
3. checkpoint state, stream offsets, and outputs for recovery.

Each node has

- state semantics: can each input affect state at least once, at most once, or exactly once.
- output semantics: can each output be produced at least once, at most once, or exactly once.

For state semantics, we can achieve

- at least once by saving state before saving stream offsets,
- at most once by saving stream offsets before saving state, and
- exactly once by saving both state and stream offsets atomically.

For output semantics, we can achieve

- at least once by saving output before offset/state,
- at most once by saving offset/state before output,
- exactly once by saving output and offset/state atomically.

At-least-once semantics is useful when low latency is more important than duplicate records. At most once is useful when loss is preferred over duplication. Puma guarantees at least once state and output semantics, and Stylus supports a whole bunch of combinations.

State-saving Mechanisms. Node state can be saved in one of many ways:

- Replication. Running multiple copies of a node provides fault tolerance.
- Local DB. Nodes can save their state to a local database like LevelDB or RocksDB.
- Remote DB. Nodes can save their state to remote databases. Millwheel does this.
- Upstream backup. Output messages can be buffered upstream in case downstream nodes fail.
- Global consistent snapshot. Flink maintains globally consistent snapshots.

Stylus can save to a local RocksDB instance with data asynchronously backed up to HDFS. Alternatively, it can store to a remote database. If a processing unit forms a monoid (identity element with associate operator), then input data can be processed and later merged into the remote DB.

Backfill Processing. Being able to re-run old jobs or run new jobs on old data is useful for a number of reasons:

- Running new jobs on old data is great for debugging.
- Sometimes, we need to run a new metric on old data to generate historical metrics.
- If a node has a bug, we'd like to re-run the node on the data.

To re-run processing on old data, we have three choices:

- Stream only.
- Separate batch and streaming systems. This can be very annoying and hard to manage.
- Combined batch and streaming system. This is what Spark streaming, Flink, and Facebook does.

Puma and Stylus code can be run as either streaming or batch applications.